Picture this! The news graphed.

slate news dotsSlate added a curious addition to its site last week. Its heart is in the right place, and this is a good experiment, not a silly one.

I believe there’s extraordinary value to be unlocked by mapping a world of articles onto the social graph that they describe textually. I’ve written about graphing the news when I awkwardly described a scheme here and geeked out over a pretty picture here. Yes, there is a funny thing about social networks: they often describe the real world as well as sometimes exist as their own worlds. Done right, Slate’s graph could be an eye-opening mechanism for aggregating, sorting, discovering, following, sharing, and discussing the news.

But I don’t think Slate has quite done it right. In short, there’s too much information in the nodes, or the “dots,” as Slate calls them, and there’s too little information in the edges, or the links that connect up those dots. So permit me a little rambling.

We don’t really care all that much about the differences between a person, a group, and company—not at this top level of navigation, anyhow. There are too many dots, and it’s too hard to keep them all the colors straight. Of course, it’s not that hard, but if Slate’s project is fueled by bold ambition rather than fleeting plaudits, it’s just not easy enough. They’re actors. They’re newsmakers. They are entities that can be said to have a unified will or agency. And that’s enough. Make up a fancy blanket term, or just call them “people” and let smart, interested users figure out the details as they dive in.

Moreover, assistant editor Chris Wilson confuses his own term “topic.” At first, he writes, “News Dots visualizes the most recent topics in the news as a giant social network.” Also, “Like a human social network, the news tends to cluster around popular topics.” In this sense, a “topic” is an emergent property of Slate’s visualization. It’s the thing that becomes apparent to the pattern-seeking, sense-making eyes of users. So “one clump of dots might relate to a flavor-of-the-week tabloid story” or “might center on Afghanistan, Iraq, and the military.”

But then Wilson makes a subtle but ultimately very confusing shift. Explaining how to use the visualization, he writes, “click on a circle to see which stories mention that topic and which other topics it connects to in the network.” Problem is, these “topics” are what he has just called “subjects.” As emergent things, or “clumps,” his original “topics” can’t be clicked on. On the contrary, “subjects—represented by the circles below—are connected to one another,” and they’re what’s clickable.

To make matters worse, Wilson then, below the visualization, introduces more confusing terms, as he describes the role played by Open Calais (which is awesome). It “automatically ‘tags’ content with all the important keywords: people, places, companies, topics, and so forth.” The folks at Thomson Reuters didn’t invent the term “tag,” of course; it’s a long-standing if slippery term that I’m not even going to try to explain (because it really, really is just one of cases in which “the meaning of the word presupposes our ability to use it”). At any rate, Wilson seems like he’s using it because Open Calais uses it. That’s fine, but a bit more clarity would be nice, given the soup of terms already around. And there’s really little excuse for dropping in the term “keywords” because, with his technical hat on, it’s just wrong.

I’m terribly sorry to drag you, dear reader, through that intensely boring mud puddle of terminology. But it’s for good reason, I think. Graphing the news is supposed to be intuitive. The human mind just gets it. A picture is worth a thousand words. Taken seriously, that notion is powerful. At a very optimistic level, it encourages us to let visualizations speak for themselves, stripped of language all too ready to mediate them. But at a basic level, it warns us writers not to trample all over information expressed graphically with thousand textual words that add up to very little—or, worse, confusion.

But, yes, okay, about those prenominate the edges, or the links that connect up those dots! I wrote about this long ago, and my intuition tells me that it doesn’t make sense to leave edges without their own substance. They need to express more than similarity; they can do more than connect like things. If they were to express ideas or events or locations while the nodes expressed topics, it seems to me that the picture would be much more powerful. Those ideas, events, or locations wouldn’t sit in light blue “Other” nodes, as Slate has them; instead they would directly link up the people and organizations. The social network would be more richly expressed. And topics, in Wilson’s original sense, wouldn’t be emergent “clumps” but actually obvious connections.

All in all, the visualization is “depressingly static,” as a friend of mine remarked. There may be two levels of zoom, but there’s no diving. There’s no surfing, no seeing a list of stories that relate to both topic x AND topic y. There’s no navigation, no browsing. There’s no search—and especially none involving interaction between the human and computer. There’s no news judgment beyond what newspaper editors originally add. And the corpus is small—tiny, really, representing only 500 articles each day, which isn’t so far from being a human-scale challenge. Visualizations hold the most promise for helping us grapple with truly internet-scale data sets—not 500 news articles a day but 500,000 news articles and blog posts.

It seems unfair to hold Slate to such a high standard, though. It’s very clear that they were shooting for something much more modest. All the same, maybe modesty isn’t what’s called for.

Not by Links Alone

At this unthinkably late hour, many of even the most recalcitrant journalists and newsy curmudgeons have given themselves over, painfully, to the fundamentally important fact that the economics of abundance now govern their world.

For many, of course, stemming that tide is still paramount. Their goal, as David Carr writes, is to squelch the “new competition for ads and minds.” Thus Walter Isaacson’s “E-ZPass digital wallet” and Alan Mutter’s “Original Sin.” Thus Michael Moran’s obnoxious “NOPEC.” Thus Journalism Online. And, of course, thus we have David Simon’s recent call for Congress to “consider relaxing certain anti-trust prohibitions” or this call in the Washington Post to rework fair use. I wish them all good luck, but mostly good night.

There are others, though, who think it’s great that the Internet and Google are opening up the news to competition. In fact, “Google is good” strikes me as nearly orthodox among the basically Internet-savvy set of news talkers. Marissa Mayer crows about how Google delivers newspapers’ Web sites one billion clicks a month, and Arianna Huffington insists that the future of news is to be found in a “linked economy” and “search engines” like Google.

In this narrative, Google’s the great leveler, ushering the world of journalism out of the dark, dank ages of monopoly and into the light, bright days of competition, where all news articles and blog posts stand on their own pagerank before the multitude of users who judge with their links and their clicks. Its ablest defender is probably Jeff Jarvis, author of What Would Google Do? Jarvis was relatively early in pointing out that “Google commodifies the world’s content by making it all available on a level playing field in its search.” In that and other posts at Buzz Machine, his widely read blog, Jarvis allows that Google “can make life difficult” but insists, “that’s not Google’s fault.” The reverence for Google is thick: “The smart guys are hiring search-engine optimization experts and trying to figure out how to get more people to their stuff thanks to Google.”

But defenders of Google’s influence on the broader market for news and newspapers themselves make a striking error in believing that the market for content is competitive. That belief is wrong—not just a little bit or on the margin, but fundamentally, and importantly, wrong.

Which is not to say that news publishers aren’t competing for readers’ eyeballs and attention. Publishers compete with one another all day long, every day—with some local exceptions, the news has always been competitive like a race, and is now more competitive like a market than ever before. But the market for that news—the place where consumers decide what to read, paying with their attention—is not competitive. Google may well be the great leveler, but down to how low a field?

To be very clear, this is far from a neo-classical purist’s critique that picks nits by abusing uselessly theoretical definitions. I am not a purist, an economist, or a jerk. This is reality, as best as I know it. Nevertheless, to say that the market for content is competitive is just to misunderstand what a competitive market actually entails. The market for news content as it currently stands, with Google in the middle, is a profoundly blurry, deeply uncompetitive space.

*    *    *

“The difficulty of distinguishing good quality from bad is inherent in the business world,” Nobel laureate George Akerlof wrote in the kicker of his most famous paper, published in 1970. “This may indeed explain many economic institutions and may in fact be one of the more important aspects of uncertainty.”

Akerlof fired an early shot in a scholarly marathon to study the effects of asymmetric information in markets. What do parties to a potential transaction do when they know different sets of facts? Maybe that seems like an obvious question, but economists in the middle of the twentieth century had been pretty busy worrying about perfecting complicated models despite their grossly simplistic assumptions.

So Akerlof set about to write about how markets can fail when some of those assumptions turn out to be bunk. The assumption he tested first, in “The Market for ‘Lemons,'” was certainty, and he showed that when sellers know more about the goods being sold than the buyers do, sellers abuse their privileged position and buyers leave the market.

Writing in the same year, the economist Phillip Nelson studied the differences between what he called “search goods” and “experience goods.” Search goods and experience goods express a certain kind of asymmetry. For search goods, consumers can overcome the asymmetry before the point of purchase by doing their homework, while for experience goods, consumers must take their time and invest.

A pair of pants, for instance, is a search good—you can try before you buy, and shop around for the pants that fit you best. An apple, on the other hand, is an experience good—you don’t know whether you’ll like one until you consume it, and you can’t really try before you buy.

News articles are experience goods. Just as with an apple, you need to consume the story, reading the article or watching the video or so on, in order to judge its quality. “Stories can vary in length, accuracy, style of presentation, and focus,” writes economist James Hamilton in All the News That’s Fit to Sell. “For a given day’s events, widely divergent news products are offered to answer the questions of who, what, where, when, and why.” We can’t know which one’s best till we’ve read them all, and who’s got time for that?

Moreover, a multitude of subjective editorial decisions produce the news. Each reporter’s practices and habits influence what’s news and what’s not. Their learned methods, their assigned beats, and even their inverted pyramids shape what we read and how. Reporters’ and editors’ tastes, their histories, or their cultures matter, as do their professional ethics. Each article of news is a nuanced human document—situated aesthetically, historically, culturally, and ethically.

Ultimately, the news is afflicted with the problem of being an experience good more than even apples are. At least Granny Smiths don’t vary wildly from farmer to farmer or from produce bin to produce bin. Sure, some may be organic, while others are conventional. One may be tarter or crispier than another, but tremendous differences from the mean are very unlikely. With the news, though, it’s hard even to think of what the mean might be. It may seem obvious, but articles, essays, and reports are complex products of complex writerly psychologies.

For a long time, however, as readers, we were unaware of these nuances of production. That was, in some sense, the upshot: our experience of this journalism was relatively uncomplicated. This profound lack of context mattered much less.

Call it the myth of objectivity maybe, but what NYU professor Jay Rosen has labeled the “mask of professional distance” meant that we didn’t have much of a chance to bother with a whole world complexities. Because everyone usually wore a mask, and because everyone’s masked looked about the same, we ignored—indeed, we were largely necessarily ignorant of—all the unique faces.

For a long time, therefore, the orthodox goal of American newspapers virtually everywhere was news that really wasn’t an experience good. When news existed only on paper, it hardly mattered what news was, because we had so few seemingly monochrome choices about what to read. We returned to the same newspapers and reporters behind the same masks over and over again, and through that repetition, we came subtly to understand the meaning and implications of their limited degrees of “length, accuracy, style of presentation, and focus.”

As a result, we often grew to love our newspaper—or to love to hate it. But even if we didn’t like our newspaper, it was ours, and we accepted it, surrendering our affection either way, even begrudgingly. The world of news was just much simpler, a more homogeneous, predictable place—there were fewer thorny questions, fewer observable choices. There was less risk by design. Our news was simpler, or it seemed to be, and we had little choice but to become familiar with it anyhow. One benefit of the View from Nowhere, after all, is that basically everyone adopted it—that it basically became a standard, reducing risk.

But a funny thing happened in this cloistered world. Because it seemed only natural, we didn’t realize the accidental nature of the understanding and affection between readers and their newspapers. If, as the economists would have it, the cost of a thing is what we’ve sacrificed in order to achieve it, then our understanding and affection were free. We gave nothing up for them—for there was scarcely another alternative. As a result, both readers and publishers took those things for granted. This point is important because publishers are still taking those things for granted, assuming that all people of good faith still appreciate and love all the good things that a newspaper puts on offer.

*    *    *

But when our informational options explode, we can plainly, and sometimes painfully, see that our newspapers aren’t everything. Different newspapers are better at answering different questions, and some answers—some as simple as what we should talk about at work tomorrow—don’t come from newspapers at all. So we go hunting on the Internet. So we gather. So we Google.

We have now spent about a decade Googling. We have spent years indulging in information, and they have been wonderful years. We are overawed by our ability to answer questions online. Wikipedia has helped immensely in our efforts to answer those questions, but pagerank elevated even it. Newspapers compose just one kind of Web site to have plunged into the scrum of search engine optimization. Everyone’s hungry for links and clicks.

And Google represents the Internet at large for two reasons. For one, the engine largely structures our experience of the overall vehicle. More importantly, though, Google’s organization of the Internet changes the Internet itself. The Search Engine Marketing Professional Organization estimates, in this PDF report, that North American spending on organic SEO in 2008 was about $1.5 billion. But that number is surely just the tip of the iceberg. Google wields massive power over the shape and structure of the Internet’s general landscape of Web pages, Web applications, and the links among them. Virtually no one builds even a semi-serious Web site without considering whether it will be indexed optimally. For journalism, most of the time, the effects are either irrelevant or benign.

But think about Marissa Mayer’s Senate testimony about the “living story.” Newspaper Web sites, she said, “frequently publish several articles on the same topic, sometimes with identical or closely related content.” Because those similar pages share links from around the Web, neither one has the pagerank that a single one would have. Mayer would have news Web sites structure their content more like Wikipedia: “Consider how the authoritativeness of news articles might grow if an evolving story were published under a permanent, single URL as a living, changing, updating entity.”

Setting aside for the moment whatever merits Mayer’s idea might have, imagine the broader implications. She’s encouraging newspapers to change not just their marketing or distribution strategies but their journalism because Google doesn’t have an algorithm smart enough to determine that they should share the “authoritativeness.”

At Talking Points Memo, Josh Marshall’s style of following a story over a string of blog posts, poking and prodding an issue from multiple angles, publishing those posts in a stream, and letting the story grow incrementally, cumulatively might be disadvantaged because those posts are, naturally, found at different URLs. His posts would compete for pagerank.

And maybe it would be better for journalism if bloggers adopted the “living story” model of reporting. Maybe journalism schools should start teaching it. Or maybe not—maybe there is something important about what the structure of content means for context. The point here isn’t to offer substantive answer to this question, but rather to point out that Mayer seems unaware of the question in the first place. It’s natural that Mayer would think that what’s good for Google is good for Internet users at large. For most domestic Internet users, after all, Google, which serves about two-thirds of all searches, essentially is their homepage for news.

But most news articles, of course, simply aren’t like entries in an encyclopedia. An article of news—in both senses of the term—is substantially deeper than the facts it contains. An article of news, a human document, means substantially more to us than its literal words—or the pageranked bag of words that Google more or less regards it as.

Google can shine no small amount of light on whether we want to read an article of news. And, importantly, Google’s great at telling you when others have found an article of news to be valuable. But the tastes of anonymous crowds—of everyone—are not terribly good at determining whether we want to read some particular article of news, particularly situated, among all the very many alternatives, each particularly situated unto itself.

Maybe it all comes down to a battle between whether Google encourages “hit-and-run” visits or “qualified leads.” I don’t doubt that searchers from Google often stick around after they alight on a page. But I doubt they stick around sufficiently often. In that sense, I think Daniel Tunkelang is precisely correct: “Google’s approach to content aggregation and search encourages people to see news…through a very narrow lens in which it’s hard to tell things apart. The result is ultimately self-fulfilling: it becomes more important to publications to invest in search engine optimization than to create more valuable content.”

*    *    *

The future-of-news doomsayers are so often wrong. A lot of what they said at Kerry’s hearing was wrong. It’s woefully wrongheaded to call Google parasitic simply because it the Internet without it would be a distinctly worse place. There would be, I suspect, seriously fewer net pageviews for news. And so it’s easy to think that they’re wrong about everything—because it seems that they fundamentally misunderstand the Internet.

But they don’t hold a monopoly on misunderstanding. “When Google News lists one of ours stories in a prominent position,” writes Henry Blodget, “we don’t wail and moan about those sleazy thieves at Google. We shout, ‘Yeah, baby,’ and start high-fiving all around.” To Blodget, “Google is advertising our stories for free.”

But life is about alternatives. There’s what is, and there’s what could be. And sometimes what could be is better than what is—sometimes realistically so. So however misguided some news executives may have been or may still be about their paywalls and buyouts, they also sense that Google’s approach to the Web can’t reproduce the important connection the news once had with readers. Google just doesn’t fit layered, subtle, multi-dimensional products—experience goods—like articles of serious journalism. Because news is an experience good, we need really good recommendations about whether we’re going to enjoy it. And the Google-centered link economy just won’t do. It doesn’t add quite enough value. We need to know more about the news before we sink our time into reading it than pagerank can tell us. We need the news organized not by links alone.

What we need is a search experience that let’s us discover the news in ways that fit why we actually care about it. We need a search experience built around concretely identifiable sources and writers. We need a search experience built around our friends and, lest we dwell too snugly in our own comfort zones, other expert readers we trust. These are all people—and their reputations or degrees of authority matter to us in much the same ways.

We need a search experience built around beats and topics that are concrete—not hierarchical, but miscellaneous and semantically well defined. We need a search experience built around dates, events, and locations. We need a search experience that’s multi-faceted and persistent, a stream of news. Ultimately, we need a powerful, flexible search experience that merges automatization and human judgment—that is sensitive to the very particular and personal reasons we care about news in the first place.

The people at Senator Kerry’s hearing last week seemed either to want to dam the river and let nothing through or to whip its flow up into a tidal wave. But the real problem is that they’re both talking about the wrong river. News has changed its course, to be sure, so in most cases, dams are moot at best. At the same time, though, chasing links and clicks, with everyone pouring scarce resources into an arms race of pagerank while aggregators direct traffic and skim a few page views, isn’t sufficiently imaginative either.

UPDATE: This post originally slipped out the door before it was fully dressed. Embarrassing, yes. My apologies to those who read the original draft of this thing and were frustrated by the unfinished sentences and goofy notes to self, and my thanks to those who read it all it the same.

Whither Tag Clouds?

A few weeks ago, one could do relatively little clicking around the interwebs and notice the tear of pretty tag clouds powered by wordle. Bloggers of all stripes posted a wordle of their blog. Some, like Jeff Jarvis, mused about how the visualizations represent “another way way to see hot topics and another path to them.”

For as long as tag clouds have been a feature of the web, they’ve also been an object of futurist optimism, kindling images of Edward Tufte and notions that if someone could just unlock all those dense far-flung pages of information, just present them correctly, illumed, people everywhere would nod and understand. Their eyes would grow bright, and they would smile at the sheer sense it all makes. The headiness of a folksonomy is sweet for an information junkie.

It’s in that vein that ReadWriteWeb mythologizes the tag cloud as “buffalo on the pre-Columbian plains of North America.” A reader willing to cock his head and squint hard enough at the image of tag clouds “roaming the social web” as “huge, thundering herds of keywords of all shades and sizes” realizes that the Rob Cottingham would have us believe that tag clouds were graceful and defenseless beasts—and also now on the verge of extinction. He’s more or less correct.

I used to mythologize the tag cloud, but let’s be honest. They were never actually useful. You could never drag and drop one word in a tag cloud onto another to get the intersection or union of pages with those two tags. You could never really use a tag cloud to subscribe to RSS feeds of only the posts with a given set of tags.

A tag also never told you whether J.P. Morgan was a person or a bank. A tag cloud on a blog was never dynamic, never interactive. The tag cloud on one person’s blog never talked to the tag cloud on anyone else’s. I could never click on one tag and watch the cloud reform and show me only related tags, all re-sized and -colored to indicate their frequency or importance only in the part of the corpus in which the tag I clicked on is relevant.

But there’re also a cool-headed thoughts to have here. If tag clouds don’t work, what will? What is the best way to navigate around those groups of relatively many words called articles or posts? In the comments to Jarvis’s post, I asked a set of questions:

How will we know when we meet a visualization of the news that’s actually really useful? Can some visualization of the news lay not just another path to the “hot topic” but a better one? Or will headlines make a successful transition from the analog past of news to its digital future as the standard way we find what we want to read?

I believe the gut-level interest in tag clouds comes in part from the sense that headlines aren’t the best way to navigate around groups of articles much bigger than the number in a newspaper. There’s a real pain point there: scanning headlines doesn’t scale. Abstracting away from them, however, and focusing on topics and newsmakers in order to find what’s best to read or watch just might work.

I think there’s a very substantial market for a smarter tag cloud. They might look very different from what we’ve seen, but they will let us see at a glance lots of information and help us get to the best stuff faster. After all, the articles we want to read, the videos we want to watch, and the conversations we want to have around them are what’s actually important.

