Archive for the 'Daniel Tunkelang' Category

Picture this! The news graphed.

slate news dotsSlate added a curious addition to its site last week. Its heart is in the right place, and this is a good experiment, not a silly one.

I believe there’s extraordinary value to be unlocked by mapping a world of articles onto the social graph that they describe textually. I’ve written about graphing the news when I awkwardly described a scheme here and geeked out over a pretty picture here. Yes, there is a funny thing about social networks: they often describe the real world as well as sometimes exist as their own worlds. Done right, Slate’s graph could be an eye-opening mechanism for aggregating, sorting, discovering, following, sharing, and discussing the news.

But I don’t think Slate has quite done it right. In short, there’s too much information in the nodes, or the “dots,” as Slate calls them, and there’s too little information in the edges, or the links that connect up those dots. So permit me a little rambling.

We don’t really care all that much about the differences between a person, a group, and company—not at this top level of navigation, anyhow. There are too many dots, and it’s too hard to keep them all the colors straight. Of course, it’s not that hard, but if Slate’s project is fueled by bold ambition rather than fleeting plaudits, it’s just not easy enough. They’re actors. They’re newsmakers. They are entities that can be said to have a unified will or agency. And that’s enough. Make up a fancy blanket term, or just call them “people” and let smart, interested users figure out the details as they dive in.

Moreover, assistant editor Chris Wilson confuses his own term “topic.” At first, he writes, “News Dots visualizes the most recent topics in the news as a giant social network.” Also, “Like a human social network, the news tends to cluster around popular topics.” In this sense, a “topic” is an emergent property of Slate’s visualization. It’s the thing that becomes apparent to the pattern-seeking, sense-making eyes of users. So “one clump of dots might relate to a flavor-of-the-week tabloid story” or “might center on Afghanistan, Iraq, and the military.”

But then Wilson makes a subtle but ultimately very confusing shift. Explaining how to use the visualization, he writes, “click on a circle to see which stories mention that topic and which other topics it connects to in the network.” Problem is, these “topics” are what he has just called “subjects.” As emergent things, or “clumps,” his original “topics” can’t be clicked on. On the contrary, “subjects—represented by the circles below—are connected to one another,” and they’re what’s clickable.

To make matters worse, Wilson then, below the visualization, introduces more confusing terms, as he describes the role played by Open Calais (which is awesome). It “automatically ‘tags’ content with all the important keywords: people, places, companies, topics, and so forth.” The folks at Thomson Reuters didn’t invent the term “tag,” of course; it’s a long-standing if slippery term that I’m not even going to try to explain (because it really, really is just one of cases in which “the meaning of the word presupposes our ability to use it”). At any rate, Wilson seems like he’s using it because Open Calais uses it. That’s fine, but a bit more clarity would be nice, given the soup of terms already around. And there’s really little excuse for dropping in the term “keywords” because, with his technical hat on, it’s just wrong.

I’m terribly sorry to drag you, dear reader, through that intensely boring mud puddle of terminology. But it’s for good reason, I think. Graphing the news is supposed to be intuitive. The human mind just gets it. A picture is worth a thousand words. Taken seriously, that notion is powerful. At a very optimistic level, it encourages us to let visualizations speak for themselves, stripped of language all too ready to mediate them. But at a basic level, it warns us writers not to trample all over information expressed graphically with thousand textual words that add up to very little—or, worse, confusion.

But, yes, okay, about those prenominate the edges, or the links that connect up those dots! I wrote about this long ago, and my intuition tells me that it doesn’t make sense to leave edges without their own substance. They need to express more than similarity; they can do more than connect like things. If they were to express ideas or events or locations while the nodes expressed topics, it seems to me that the picture would be much more powerful. Those ideas, events, or locations wouldn’t sit in light blue “Other” nodes, as Slate has them; instead they would directly link up the people and organizations. The social network would be more richly expressed. And topics, in Wilson’s original sense, wouldn’t be emergent “clumps” but actually obvious connections.

All in all, the visualization is “depressingly static,” as a friend of mine remarked. There may be two levels of zoom, but there’s no diving. There’s no surfing, no seeing a list of stories that relate to both topic x AND topic y. There’s no navigation, no browsing. There’s no search—and especially none involving interaction between the human and computer. There’s no news judgment beyond what newspaper editors originally add. And the corpus is small—tiny, really, representing only 500 articles each day, which isn’t so far from being a human-scale challenge. Visualizations hold the most promise for helping us grapple with truly internet-scale data sets—not 500 news articles a day but 500,000 news articles and blog posts.

It seems unfair to hold Slate to such a high standard, though. It’s very clear that they were shooting for something much more modest. All the same, maybe modesty isn’t what’s called for.

Not by Links Alone

At this unthinkably late hour, many of even the most recalcitrant journalists and newsy curmudgeons have given themselves over, painfully, to the fundamentally important fact that the economics of abundance now govern their world.

For many, of course, stemming that tide is still paramount. Their goal, as David Carr writes, is to squelch the “new competition for ads and minds.” Thus Walter Isaacson’s “E-ZPass digital wallet” and Alan Mutter’s “Original Sin.” Thus Michael Moran’s obnoxious “NOPEC.” Thus Journalism Online. And, of course, thus we have David Simon’s recent call for Congress to “consider relaxing certain anti-trust prohibitions” or this call in the Washington Post to rework fair use. I wish them all good luck, but mostly good night.

There are others, though, who think it’s great that the Internet and Google are opening up the news to competition. In fact, “Google is good” strikes me as nearly orthodox among the basically Internet-savvy set of news talkers. Marissa Mayer crows about how Google delivers newspapers’ Web sites one billion clicks a month, and Arianna Huffington insists that the future of news is to be found in a “linked economy” and “search engines” like Google.

In this narrative, Google’s the great leveler, ushering the world of journalism out of the dark, dank ages of monopoly and into the light, bright days of competition, where all news articles and blog posts stand on their own pagerank before the multitude of users who judge with their links and their clicks. Its ablest defender is probably Jeff Jarvis, author of What Would Google Do? Jarvis was relatively early in pointing out that “Google commodifies the world’s content by making it all available on a level playing field in its search.” In that and other posts at Buzz Machine, his widely read blog, Jarvis allows that Google “can make life difficult” but insists, “that’s not Google’s fault.” The reverence for Google is thick: “The smart guys are hiring search-engine optimization experts and trying to figure out how to get more people to their stuff thanks to Google.”

But defenders of Google’s influence on the broader market for news and newspapers themselves make a striking error in believing that the market for content is competitive. That belief is wrong—not just a little bit or on the margin, but fundamentally, and importantly, wrong.

Which is not to say that news publishers aren’t competing for readers’ eyeballs and attention. Publishers compete with one another all day long, every day—with some local exceptions, the news has always been competitive like a race, and is now more competitive like a market than ever before. But the market for that news—the place where consumers decide what to read, paying with their attention—is not competitive. Google may well be the great leveler, but down to how low a field?

To be very clear, this is far from a neo-classical purist’s critique that picks nits by abusing uselessly theoretical definitions. I am not a purist, an economist, or a jerk. This is reality, as best as I know it. Nevertheless, to say that the market for content is competitive is just to misunderstand what a competitive market actually entails. The market for news content as it currently stands, with Google in the middle, is a profoundly blurry, deeply uncompetitive space.

*    *    *

“The difficulty of distinguishing good quality from bad is inherent in the business world,” Nobel laureate George Akerlof wrote in the kicker of his most famous paper, published in 1970. “This may indeed explain many economic institutions and may in fact be one of the more important aspects of uncertainty.”

Akerlof fired an early shot in a scholarly marathon to study the effects of asymmetric information in markets. What do parties to a potential transaction do when they know different sets of facts? Maybe that seems like an obvious question, but economists in the middle of the twentieth century had been pretty busy worrying about perfecting complicated models despite their grossly simplistic assumptions.

So Akerlof set about to write about how markets can fail when some of those assumptions turn out to be bunk. The assumption he tested first, in “The Market for ‘Lemons,'” was certainty, and he showed that when sellers know more about the goods being sold than the buyers do, sellers abuse their privileged position and buyers leave the market.

Writing in the same year, the economist Phillip Nelson studied the differences between what he called “search goods” and “experience goods.” Search goods and experience goods express a certain kind of asymmetry. For search goods, consumers can overcome the asymmetry before the point of purchase by doing their homework, while for experience goods, consumers must take their time and invest.

A pair of pants, for instance, is a search good—you can try before you buy, and shop around for the pants that fit you best. An apple, on the other hand, is an experience good—you don’t know whether you’ll like one until you consume it, and you can’t really try before you buy.

News articles are experience goods. Just as with an apple, you need to consume the story, reading the article or watching the video or so on, in order to judge its quality. “Stories can vary in length, accuracy, style of presentation, and focus,” writes economist James Hamilton in All the News That’s Fit to Sell. “For a given day’s events, widely divergent news products are offered to answer the questions of who, what, where, when, and why.” We can’t know which one’s best till we’ve read them all, and who’s got time for that?

Moreover, a multitude of subjective editorial decisions produce the news. Each reporter’s practices and habits influence what’s news and what’s not. Their learned methods, their assigned beats, and even their inverted pyramids shape what we read and how. Reporters’ and editors’ tastes, their histories, or their cultures matter, as do their professional ethics. Each article of news is a nuanced human document—situated aesthetically, historically, culturally, and ethically.

Ultimately, the news is afflicted with the problem of being an experience good more than even apples are. At least Granny Smiths don’t vary wildly from farmer to farmer or from produce bin to produce bin. Sure, some may be organic, while others are conventional. One may be tarter or crispier than another, but tremendous differences from the mean are very unlikely. With the news, though, it’s hard even to think of what the mean might be. It may seem obvious, but articles, essays, and reports are complex products of complex writerly psychologies.

For a long time, however, as readers, we were unaware of these nuances of production. That was, in some sense, the upshot: our experience of this journalism was relatively uncomplicated. This profound lack of context mattered much less.

Call it the myth of objectivity maybe, but what NYU professor Jay Rosen has labeled the “mask of professional distance” meant that we didn’t have much of a chance to bother with a whole world complexities. Because everyone usually wore a mask, and because everyone’s masked looked about the same, we ignored—indeed, we were largely necessarily ignorant of—all the unique faces.

For a long time, therefore, the orthodox goal of American newspapers virtually everywhere was news that really wasn’t an experience good. When news existed only on paper, it hardly mattered what news was, because we had so few seemingly monochrome choices about what to read. We returned to the same newspapers and reporters behind the same masks over and over again, and through that repetition, we came subtly to understand the meaning and implications of their limited degrees of “length, accuracy, style of presentation, and focus.”

As a result, we often grew to love our newspaper—or to love to hate it. But even if we didn’t like our newspaper, it was ours, and we accepted it, surrendering our affection either way, even begrudgingly. The world of news was just much simpler, a more homogeneous, predictable place—there were fewer thorny questions, fewer observable choices. There was less risk by design. Our news was simpler, or it seemed to be, and we had little choice but to become familiar with it anyhow. One benefit of the View from Nowhere, after all, is that basically everyone adopted it—that it basically became a standard, reducing risk.

But a funny thing happened in this cloistered world. Because it seemed only natural, we didn’t realize the accidental nature of the understanding and affection between readers and their newspapers. If, as the economists would have it, the cost of a thing is what we’ve sacrificed in order to achieve it, then our understanding and affection were free. We gave nothing up for them—for there was scarcely another alternative. As a result, both readers and publishers took those things for granted. This point is important because publishers are still taking those things for granted, assuming that all people of good faith still appreciate and love all the good things that a newspaper puts on offer.

*    *    *

But when our informational options explode, we can plainly, and sometimes painfully, see that our newspapers aren’t everything. Different newspapers are better at answering different questions, and some answers—some as simple as what we should talk about at work tomorrow—don’t come from newspapers at all. So we go hunting on the Internet. So we gather. So we Google.

We have now spent about a decade Googling. We have spent years indulging in information, and they have been wonderful years. We are overawed by our ability to answer questions online. Wikipedia has helped immensely in our efforts to answer those questions, but pagerank elevated even it. Newspapers compose just one kind of Web site to have plunged into the scrum of search engine optimization. Everyone’s hungry for links and clicks.

And Google represents the Internet at large for two reasons. For one, the engine largely structures our experience of the overall vehicle. More importantly, though, Google’s organization of the Internet changes the Internet itself. The Search Engine Marketing Professional Organization estimates, in this PDF report, that North American spending on organic SEO in 2008 was about $1.5 billion. But that number is surely just the tip of the iceberg. Google wields massive power over the shape and structure of the Internet’s general landscape of Web pages, Web applications, and the links among them. Virtually no one builds even a semi-serious Web site without considering whether it will be indexed optimally. For journalism, most of the time, the effects are either irrelevant or benign.

But think about Marissa Mayer’s Senate testimony about the “living story.” Newspaper Web sites, she said, “frequently publish several articles on the same topic, sometimes with identical or closely related content.” Because those similar pages share links from around the Web, neither one has the pagerank that a single one would have. Mayer would have news Web sites structure their content more like Wikipedia: “Consider how the authoritativeness of news articles might grow if an evolving story were published under a permanent, single URL as a living, changing, updating entity.”

Setting aside for the moment whatever merits Mayer’s idea might have, imagine the broader implications. She’s encouraging newspapers to change not just their marketing or distribution strategies but their journalism because Google doesn’t have an algorithm smart enough to determine that they should share the “authoritativeness.”

At Talking Points Memo, Josh Marshall’s style of following a story over a string of blog posts, poking and prodding an issue from multiple angles, publishing those posts in a stream, and letting the story grow incrementally, cumulatively might be disadvantaged because those posts are, naturally, found at different URLs. His posts would compete for pagerank.

And maybe it would be better for journalism if bloggers adopted the “living story” model of reporting. Maybe journalism schools should start teaching it. Or maybe not—maybe there is something important about what the structure of content means for context. The point here isn’t to offer substantive answer to this question, but rather to point out that Mayer seems unaware of the question in the first place. It’s natural that Mayer would think that what’s good for Google is good for Internet users at large. For most domestic Internet users, after all, Google, which serves about two-thirds of all searches, essentially is their homepage for news.

But most news articles, of course, simply aren’t like entries in an encyclopedia. An article of news—in both senses of the term—is substantially deeper than the facts it contains. An article of news, a human document, means substantially more to us than its literal words—or the pageranked bag of words that Google more or less regards it as.

Google can shine no small amount of light on whether we want to read an article of news. And, importantly, Google’s great at telling you when others have found an article of news to be valuable. But the tastes of anonymous crowds—of everyone—are not terribly good at determining whether we want to read some particular article of news, particularly situated, among all the very many alternatives, each particularly situated unto itself.

Maybe it all comes down to a battle between whether Google encourages “hit-and-run” visits or “qualified leads.” I don’t doubt that searchers from Google often stick around after they alight on a page. But I doubt they stick around sufficiently often. In that sense, I think Daniel Tunkelang is precisely correct: “Google’s approach to content aggregation and search encourages people to see news…through a very narrow lens in which it’s hard to tell things apart. The result is ultimately self-fulfilling: it becomes more important to publications to invest in search engine optimization than to create more valuable content.”

*    *    *

The future-of-news doomsayers are so often wrong. A lot of what they said at Kerry’s hearing was wrong. It’s woefully wrongheaded to call Google parasitic simply because it the Internet without it would be a distinctly worse place. There would be, I suspect, seriously fewer net pageviews for news. And so it’s easy to think that they’re wrong about everything—because it seems that they fundamentally misunderstand the Internet.

But they don’t hold a monopoly on misunderstanding. “When Google News lists one of ours stories in a prominent position,” writes Henry Blodget, “we don’t wail and moan about those sleazy thieves at Google. We shout, ‘Yeah, baby,’ and start high-fiving all around.” To Blodget, “Google is advertising our stories for free.”

But life is about alternatives. There’s what is, and there’s what could be. And sometimes what could be is better than what is—sometimes realistically so. So however misguided some news executives may have been or may still be about their paywalls and buyouts, they also sense that Google’s approach to the Web can’t reproduce the important connection the news once had with readers. Google just doesn’t fit layered, subtle, multi-dimensional products—experience goods—like articles of serious journalism. Because news is an experience good, we need really good recommendations about whether we’re going to enjoy it. And the Google-centered link economy just won’t do. It doesn’t add quite enough value. We need to know more about the news before we sink our time into reading it than pagerank can tell us. We need the news organized not by links alone.

What we need is a search experience that let’s us discover the news in ways that fit why we actually care about it. We need a search experience built around concretely identifiable sources and writers. We need a search experience built around our friends and, lest we dwell too snugly in our own comfort zones, other expert readers we trust. These are all people—and their reputations or degrees of authority matter to us in much the same ways.

We need a search experience built around beats and topics that are concrete—not hierarchical, but miscellaneous and semantically well defined. We need a search experience built around dates, events, and locations. We need a search experience that’s multi-faceted and persistent, a stream of news. Ultimately, we need a powerful, flexible search experience that merges automatization and human judgment—that is sensitive to the very particular and personal reasons we care about news in the first place.

The people at Senator Kerry’s hearing last week seemed either to want to dam the river and let nothing through or to whip its flow up into a tidal wave. But the real problem is that they’re both talking about the wrong river. News has changed its course, to be sure, so in most cases, dams are moot at best. At the same time, though, chasing links and clicks, with everyone pouring scarce resources into an arms race of pagerank while aggregators direct traffic and skim a few page views, isn’t sufficiently imaginative either.

UPDATE: This post originally slipped out the door before it was fully dressed. Embarrassing, yes. My apologies to those who read the original draft of this thing and were frustrated by the unfinished sentences and goofy notes to self, and my thanks to those who read it all it the same.

Taking Twitter Seriously: What if it really were a really big deal?

Maybe @davewiner does wring his hands too violently about twitter’s recommended users. Maybe it is too early to worry about unintended consequences.

But maybe not. Either way, if we take a slightly different view of his worries, I think we can take them to heart much more easily. If we can shift tenses, it might help.

While @davewiner talks about twitter, he may be talking about it now in the present tense. Let’s try another: a kind of conditional tense. Let’s try a counterfactual conditional: Would this thing work if it were the case that…?

After all, to detect a problem in any system, we’ve got to imagine that system working at full scale. Whether it’s a database, a message board, or a social network like twitter, we’ve got to imagine its ideal—when everyone’s using it for any purpose that’s difficult to police cheaply.

When @davewiner worries about twitter’s editorial adventures, as he does here and here in conversation with @jayrosen_nyu, he’s taking it extraordinarily seriously. It’s a great compliment, I think. He’s sees a twitter that’s currently critical to very many people. That’s the present tense.

OK, so some of us don’t yet share that view. But I bet we can offer our own great compliment and imagine very many people using it—or maybe even virtually everyone using it. At the end of every day, I think many of us have less and less trouble imagining that.

So, if virtually everyone were using twitter—if it really were the “Future News System of the World,” again, as difficult as that might be to imagine—we might really insist that it refrain from the editorial business. If twitter really were that big, then it really would be critical. And if it really were critical, its closed nature would probably violate all kinds of praise-Murphy rules about leaving our data, our businesses, and our lives in the hands of a for-profit company, its secret business plan, and its fallible servers.

We’re not casting aspersions at what most everyone regards as an essentially fair and just company. Of course, that goes for @me too; I love twitter.

This is simply why we have the notion of a “common carriage.” For centuries, we’ve demanded ultra-reliable commodity transportation services. We’ve been so insistent on the reliability and the even-handedness of transportation that we’ve often saddled the carrier with the de facto burden of liability for losses, which raises its price to us. This is why we care about network neutrality.

If we really take twitter seriously, then we think it’s possible that twitter could be the next big deal. The trouble is that—at scale—big deals attract all manner of mischief—with potentially everyone using them for all things selfish and spammy.

If twitter could be the next big deal, we need to start thinking about safeguarding it now.

PS. That’s what tunkrank, which was conceived by @dtunkelang, is for.

What the Structure of Content Means for Context

My hero was perched high up in journalism. The writing he left behind is deep and broad. In so many ways, to read his writing is just to think and see more clearly.

Journalists, it seems to me, fancy themselves explainers. They are great synthesizers of the world—at length. It is a wonderful calling for those who choose it.

Print was a great boon to that self-image. Print may even have allowed it. It was a fine world, mostly.

But then came the interwebs and google and adversarial search, which foisted on journalists the great tyranny of pageviews. Click.

Suddenly, it seems, the world moves faster. Its pieces are smaller. Its harried citizens’ attentions are diced or crushed or pointed only inward. We are distracted.

Chasing after readers as only they know how, cheered along by SEOs, journalists and publishers of news are looking for content that fits the new us, distracted. It’s an arms race to the bottom.

But we are not at the bottom. Nor are we at the top. For the news doesn’t so simply fit us, as we don’t so simply fit the news or so thoroughly morph our minds to information or its forms or media. Were it only that simple!

*     *     *

The Least Publishable Unit is funny thing. The concept refers to a thing that’s in fact publishable—but only barely.

Here’s the contrasting picture, set up by Michael Scherer of TIME: “Once upon a time, the incentive of a print reporter at a major news organization was to create a comprehensive, incisive account of an event.” Again, that was their calling, enabled by print.

What matters now, however, is “the news nugget, the blurb, the linkable atom of information.” Why? Because “a click is a click, after all.” News “is increasingly no longer consumed in the context of a full article, or even a full accounting of an event, but rather as Twitter-sized feeds.”

Are the interwebs, ineluctably, making the news shallow and narrow? The answer is unequivocally yes and also no. We now have more choice, a vastly wider, and growing, array of options for publishing. Our once-private gossip, carried in spoken words from neighbor to neighbor, is now online, in text, inviting misinterpretation from strangers. This song is not about you.

As certain as humans are petty, narcissistic beings, so impressed with their own lives and confident in their supreme ability to take it all so seriously, the news will be shallow and narrow. Please don’t read it, unless its brevity is the soul of wit.

But so, too, as certain as humans are profound, altruistic beings, so inspired by the world around them and hopeful of their modest ability to take it all so seriously, the news will be deep and broad. Please do read it, unless its length is the apppetite of self-infatuation.

Here’s the nut: The news will also be deep and narrow. And it will be shallow and broad.

The interwebs give us those options too. Let’s not forget about them, or forget that they are different from their purer counterparts of longing and loathing.

*     *     *

I’ve been thinking about this for a while, inspired by @mthomps and this and other posts at newsless.org and by this post of @jayrosen_nyu‘s. Of course, the critical piece of the backdrop is a spectacular story by This American Life, called The Giant Pool of Money.

I agree with Jay that “Explanation leads to information, not the other way around.” I certainly agree that news often misses the forest for the trees. If I were a student at j-school, I’d want my profs showing me how to create omnibus stories like this. All writers—no, many writers!—pine for that awesome control over structure and narrative.

Cutting somewhat against the grain, however, I don’t think “Giant Pool of Money” should be the aim of all our ambitions. Which is certainly not to belittle it. Quite to the contrary, its status as masterwork is what makes it really, really hard for us to emulate. That’s asking too much—being a “national explainer” is too tough. Even the brightest among us, in memoriam, perform such dazzling feats of synthesis only occasionally. That’s not good enough for those of us who like important, responsible, thoughtful news all the time. And even This American Life’s story came after the disaster. Warning of the dangers of wildly complex securities and derivatives before they come crashing down is an even taller order—on the level of the GAO, for instance.

Here’s my chart illustrating why “national explainer” is really hard.

newsmatrix

“Deep” and “shallow”? “Broad” and “narrow”? Huh? We’re talking scope here, folks.

“The Giant Pool of Money,” in the lower-right quadrant, is “broad” in its subject and “deep” in its container.

When something is “broad” in subject, it engages a complex, multi-faceted, sweeping subject. It’s a work of synthesis, taking multiple angles on and bridging between and weaving different constituent subjects. It aims to be comprehensive—the stuff of the glory days, however real or imagined they may be, of print journalism.

When something is “deep” in its container, there’s something a bit more prosaic going on. Essentially, each discrete work is thorough unto itself. One document—whether it’s text, audio, or video—aims to say more or less all there is to be said about its subject—to connect all the dots in one place. If there’s very much to be said about a subject, as in “The Giant Pool of Money,” the document will be long.

Consider the alternative: shallow containers. They’re not an insult! When we say something is “shallow” in its container, we mean simply that one document doesn’t attempt to say all there is to be said about a subject. Josh Marshall’s reporting, especially on the US Attorney Scandal, is a high-profile example that bloggers invented.

“We have kind of broken free of the model of discrete articles that have a beginning and end,” Marshall said, talking to the New York Times about the Polk Award. “Instead, there are an ongoing series of dispatches.”

Each dispatch isn’t comprehensive. They catch the reader up on past reporting with a few links to previous posts. Or they start off with a link or two to others’ posts or articles, promising to pick up the issue where they left off. Then they take a deep look at a small set of questions, teasing out contradictions, and end up with a set of conclusions or a new, more pointed set of questions for the next post.

The point is that the containers are small—shallow in the sense that they’re often only exposing a few dots at a time and not necessarily always trying to connect them all up as they go along. These posts don’t feign omniscience the way some, though certainly not all, traditional journalistic pieces do; they admit doubt and highlight confusion. The goal is to isolate facts, issues, and relationships, not always synthesize them.

But a critical characteristic of the form is that Josh Marshall’s dispatches on fired USAs compose a series. Each post extends previous ones or adds more to the same canvas. They’re all part of some bigger picture; they’re cumulative. And that is why, taken together, they amount to journalism that’s broad in subject. The bits of content may be fractured over author, space, and perspective, but they’re one work—one “text” in the fancy sense. Josh Marshall’s infusion of himself and his joys and outrages into his blog do the human work of pulling together the moral logic that invites readers to be patient while he unfolds the political logic one small piece at a time.

The last of the three interesting quadrants contains Wikipedia. Here again, “narrow” is not an insult. More than anyone, Wikipedians know “What Wikipedia Is Not.” It’s not for original research or reporting. It’s not for opinion or analysis. It’s for documenting these things. It’s domain is facts—but not nearly all facts. It’s not a directory or a guidebook or a textbook. Wikipedia works because it factors out, as much as possible, the kind of human reason that we colloquially call “wisdom” or “insight.”

As Farhad Manjoo explains in his Slate piece, “perspective and style don’t scale.” So you may “learn much more from David Foster Wallace’s appreciation of the star athlete than from the Wikipedia entry” on Roger Federer, but “writing is hard even for the world’s greatest wordsmiths.” Metaphorical reasoning, subtle thought, subjective analysis, and artful synthesis—these are happily banned from Wikipedia.

For Wikipedia, NPOV is hard enough to enforce. Disputes over NPOV erupt every day, probably many times a day. Multiple people collaborating, mostly strangers, often anonymous, are woefully inefficient writers of an encyclopedia. Revert wars abound. So does self-promotion. Vandalism is rampant. All manner of muddy, crummy, and scattered contributions insist their way into Wikipedia, every day, thousands upon thousands of times a day. There is a popular myth, too, that Wikipedia is a flat organization that reaches consensus among co-equal members. In fact, Wikipedia has a wildly complex hierarchy of admins, mediators, and an arbitration committee. It’s not hard to get lost extraordinarily quickly poking around the various administrative, advisory, and community groups, like the now-inactive Esperanza.

And yet, as Manjoo writes, the Wikipedia whose fluid articles we know so well “works amazingly well.” I hope that’s not controversial. Wikipedia is a profoundly inspiring testament to human knowledge, warts and all. Hierarchies haven’t vanished, squabbles have multiplied, and all the messiness may be incredibly salient to the average person who pays a bit of attention to Wikipedia. Aside from the inventing a technology that makes cleaning up vandalism cheaper than to creating it, Wikipedia’s central success is discovering both that its subjects must be wickedly narrow and that wickedly narrow articles are wildly informative. As we’ve found with twitter, sometimes constraints set us free.

*     *     *

It’s extraordinarily important to remember the virtues of the deep and narrow and the shallow and broad. The Politico’s snack-sized news may be cheaper than the New Yorker’s longer fare. But the Politico can’t compete on price with Wikipedia or on community with Josh Marshall. It turns out, as well, that there’s more than one way to put an explanation on offer to the world. The fact that we associate the role of the “great expainer” with the long-form narrative, contra the Least Publishable Unit, grows out of the fact that we overlook hybrid forms.

Josh Marshall’s won’t be the last shallow and broad news. Storymaps and the Las Vegas Sun’s topic page on water are experiments. Wikipedia won’t be the last we hear of narrow and deep news and content.

A Modest News Aggregator for the Win

To the extent that sites or services that present professional and amateur content together emerge and become successful, they will do so only after they figure out a way to give users simple, intuitive, and powerful filters that are themselves the channels that carry our conversation and shape our communities.

We will tolerate only the writing we love. Discovering what we love is a job to distribute across very large groups of users with weak ties and small groups of users with strong ties, all empowered by tools far more subtle than those that characterize current state of search. We will act mostly self-interestedly, choosing by facets, sifting, sorting, sharing, appropriating, connected to one another asymmetrically, mostly pulling not pushing, trusting when trustful. We will participate in a gift economy. Reputation will count. Attention is scarce. Something like tunkrank will help, I’m sure.

The nodes are people because people and other actors are central to what it means to be human regardless of whether we’re reading the news, writing the news, starring in it, or all of the above. The edges are the ideas that capture our common interest over time, location, and predilection. It is beautiful, Doc.

Age-Old Questions about BWBX

What’s BWBX? It’s Business Week’s new social network for users to discover and share business-related content. It resembles web services like socialmedian and twine.

As Paul Miller explains, “Members can access background material on stories, submit additional resources of their own, and comment on the content they find.” The central unit of organization is the “topic,” which both the BX staff and members of the community can create. Miller writes that he gets “the impression that topics tend to be approved” if they’re “in-scope” and “actively discussed out on the open Web.”

Given that these are the interwebs we’re talking about here, my mind immediately races to worries about spam. Does BWBX have controls to disincentivize and sideline spam? How do they work? Are they effective?

I’ve had these questions for a while now, but I’ve kept them to myself while observing BWBX’s initial growth. Today, I saw that Paul Miller, the widely respected Semantic Web evangelist, wrote a post praising the news platform. So I pinged him on twitter:

@PaulMiller Great write-up of #bxbw! Curious about how articles get assigned to topics. Users push articles to topics? Isn’t that spammy?

Then he forwarded the question:

@jny2cornell Thanks Joshua. :-) Yes, users assign articles to topics. COULD be spammy. Doesn’t seem to be. Comment, @bwbx @roncasalotti

The folks as BWBX tweeted that they answered the question in the comments on Miller’s post. I’ve excerpted the relevant parts of the comment:

We track several user actions on each item and use a weighted algorithm to score both users and the articles/blog posts. We monitor those scores to not only determine top users or most valuable items in a topic … but also to determine gaming within the system. We also crowd-source user activity via a full reporting system and back-office moderation team.

Now, I’m no expert on “back-office moderation,” but that answer left me scratching my head. So I pinged again:

@PaulMiller What do you make of @bwbx’s comment on your post? http://bit.ly/hTL1 I must admit, I’m having a difficult time parsing it.

Miller answered my question quite aptly, I think:

@jny2cornell seems clear… “back office magic keeps it clean”… ;-) You should try #BWBX, and see how the site performs to your own needs

Yes, it does seem clear—clear as mud. And that strikes me as a problem. If I’m thinking about joining BWBX, I’d like some assurance that all my effort poured into it isn’t going to go to waste as usage scales up and inevitable abuse creeps, or floods, in. I’d be worried, for instance, if I knew that the “back office moderation” is mostly human. Of course, I’d also obviously be worried if I knew that the automated processes were quite simply unfit for the job.

Peer-to-peer moderation doesn’t work magically. Take the quintessential case of wikipedia. It’s got a small and hierarchical army of editors. Perhaps more importantly, though, it’s perhaps the first human community in which vandalism is cheaper to clean up than it is to create. That ain’t trivial. It’s arguably not just important but an utterly critical disincentive against spam.

I wouldn’t have this level of concern were it not apparent that “push” logic drives BWBX. Consider a contrasting example: twitter works by “pull” logic and is therefore mercifully free of spam. I don’t worry about spammy content ending up wasting my attention because you can’t get content before me unless I invite it. And I can un-invite, or un-follow, very easy. This isn’t earth-shattering thinking here; it’s virtually as old as the internet—as old as spam itself.

So if we’re still getting it wrong, why? And if we’re getting it right, why can’t we be more transparent about it? We know how pagerank is the beating heart of google’s effort to out-engineer spam, and some argue that’s not even enough.

In fact, I encourage the folks at BWBX to give a close to read Daniel Tunkelang’s post, which asks, “Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?” What goes for search engines ought to go for back office magicians as well.

Obstreperous Minnesota

Every once in a while—and maybe more often than I’d like to admit—I re-read Clay Shirky. Today, I re-read “Ontology Is Overrated.”

And today, I’m ready to disagree with it around the margins.

On fortune telling. Yes, Shirky’s correct that we will sometimes mis-predict the future, as when we infer that some text about Dresden is also about East Germany and will be forever. But, no, that doesn’t have to be a very strong reason for us not to have some lightweight ontology that then inferred something about a city and its country. We can just change the ontology when the Berlin Wall falls. It’s much easier than re-shelving books, after all; it’s just rewriting a little OWL.

On mind reading. Yes, Shirky’s correct that we will lose some signal—or increase entropy—when we mistake the degree to which users agree and mistakenly collapse categories. And, yes, it might be generally true about the world that we tend to “underestimate the loss from erasing difference of expression” and “overestimate loss from the lack of a thesaurus.” But it doesn’t have to be that way, and for two reasons.

First, why can’t we just get our estimations tuned? I’d think that the presumption would be that we could at least give a go and, otherwise, that the burden of demonstrating that we just cannot for some really deep reason falls on Shirky.

Second, we don’t actually need to collapse categories; we just need to build web services that recognize synonymy—and don’t shove them down our users’ throats. I take it to be a fact about the world that there are a non-trivial number of people in the world for whom ‘film’ and ‘movies’ and ‘cinema’ are just about perfect synonyms. At the risk of revealing some pretty embarrassing philistinism, I offer that I’m one of them, and I want my web service to let me know that I might care about this thing called ‘cinema’ when I show an interest in ‘film’ or ‘movies.’ I agree with Shirky that we can do this based solely on the fact that “tag overlap is in the system” while “the tag semantics are in the users” only. But why not also make put the semantics in the machine? Ultimately, both are amenable to probabilistic logic.

Google showed it is the very best at serving us information when we know we care about something fuzzy and obscure—like “obstreperous minnesota.” I don’t think Shirky would dispute this, but it’s important to bear in mind that we also want our web services to serve us really well when we don’t know we care about something (see especially Daniel Tunkelang on HCIR (@dtunkelang)). That something might be fuzzy or specific, obscure or popular, subject to disagreement or perfectly unambiguous.

People and organizations tend to be unambiguous. No one says this fine fellow Clay Shirky (@cshirky) is actually Jay Rosen (@jayrosen_nyu). That would be such a strange statement that many people wouldn’t even understand it in order to declare it false. No one says the National Basketball Association means the National Football League them. Or if someone were to say that J.P. Morgan is the same company as Morgan Stanley, we could correct him and explain how they’re similar but not identical.

Some facts about people and organization can be unambiguous some of the time, too. Someone could argue that President Obama’s profession is sports, but we could correct her and explain how it’s actually politics, which maybe sometimes works metaphorically like sports. That doesn’t mean that Obama doesn’t like basketball or that no one will ever talk about him in the context of basketball. There may be more than a few contexts in which many people think it makes little sense to think of him as a politician, like when he’s playing a game of pick-up ball. But I think we can infer pretty well ex ante that it makes lots of sense to think of Obama as a politician when he’s giving a big televised speech, signing legislation, or meeting with foreign leaders. After all, what’s the likelihood that Silvio Berlusconi or Hu Jintao would let himself get schooled on the court? Context isn’t always that dependent.


Josh Young's Facebook profile

What I’m thinking

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.

Follow

Get every new post delivered to your Inbox.