Archive for the 'visualization' Category

Picture this! The news graphed.

slate news dotsSlate added a curious addition to its site last week. Its heart is in the right place, and this is a good experiment, not a silly one.

I believe there’s extraordinary value to be unlocked by mapping a world of articles onto the social graph that they describe textually. I’ve written about graphing the news when I awkwardly described a scheme here and geeked out over a pretty picture here. Yes, there is a funny thing about social networks: they often describe the real world as well as sometimes exist as their own worlds. Done right, Slate’s graph could be an eye-opening mechanism for aggregating, sorting, discovering, following, sharing, and discussing the news.

But I don’t think Slate has quite done it right. In short, there’s too much information in the nodes, or the “dots,” as Slate calls them, and there’s too little information in the edges, or the links that connect up those dots. So permit me a little rambling.

We don’t really care all that much about the differences between a person, a group, and company—not at this top level of navigation, anyhow. There are too many dots, and it’s too hard to keep them all the colors straight. Of course, it’s not that hard, but if Slate’s project is fueled by bold ambition rather than fleeting plaudits, it’s just not easy enough. They’re actors. They’re newsmakers. They are entities that can be said to have a unified will or agency. And that’s enough. Make up a fancy blanket term, or just call them “people” and let smart, interested users figure out the details as they dive in.

Moreover, assistant editor Chris Wilson confuses his own term “topic.” At first, he writes, “News Dots visualizes the most recent topics in the news as a giant social network.” Also, “Like a human social network, the news tends to cluster around popular topics.” In this sense, a “topic” is an emergent property of Slate’s visualization. It’s the thing that becomes apparent to the pattern-seeking, sense-making eyes of users. So “one clump of dots might relate to a flavor-of-the-week tabloid story” or “might center on Afghanistan, Iraq, and the military.”

But then Wilson makes a subtle but ultimately very confusing shift. Explaining how to use the visualization, he writes, “click on a circle to see which stories mention that topic and which other topics it connects to in the network.” Problem is, these “topics” are what he has just called “subjects.” As emergent things, or “clumps,” his original “topics” can’t be clicked on. On the contrary, “subjects—represented by the circles below—are connected to one another,” and they’re what’s clickable.

To make matters worse, Wilson then, below the visualization, introduces more confusing terms, as he describes the role played by Open Calais (which is awesome). It “automatically ‘tags’ content with all the important keywords: people, places, companies, topics, and so forth.” The folks at Thomson Reuters didn’t invent the term “tag,” of course; it’s a long-standing if slippery term that I’m not even going to try to explain (because it really, really is just one of cases in which “the meaning of the word presupposes our ability to use it”). At any rate, Wilson seems like he’s using it because Open Calais uses it. That’s fine, but a bit more clarity would be nice, given the soup of terms already around. And there’s really little excuse for dropping in the term “keywords” because, with his technical hat on, it’s just wrong.

I’m terribly sorry to drag you, dear reader, through that intensely boring mud puddle of terminology. But it’s for good reason, I think. Graphing the news is supposed to be intuitive. The human mind just gets it. A picture is worth a thousand words. Taken seriously, that notion is powerful. At a very optimistic level, it encourages us to let visualizations speak for themselves, stripped of language all too ready to mediate them. But at a basic level, it warns us writers not to trample all over information expressed graphically with thousand textual words that add up to very little—or, worse, confusion.

But, yes, okay, about those prenominate the edges, or the links that connect up those dots! I wrote about this long ago, and my intuition tells me that it doesn’t make sense to leave edges without their own substance. They need to express more than similarity; they can do more than connect like things. If they were to express ideas or events or locations while the nodes expressed topics, it seems to me that the picture would be much more powerful. Those ideas, events, or locations wouldn’t sit in light blue “Other” nodes, as Slate has them; instead they would directly link up the people and organizations. The social network would be more richly expressed. And topics, in Wilson’s original sense, wouldn’t be emergent “clumps” but actually obvious connections.

All in all, the visualization is “depressingly static,” as a friend of mine remarked. There may be two levels of zoom, but there’s no diving. There’s no surfing, no seeing a list of stories that relate to both topic x AND topic y. There’s no navigation, no browsing. There’s no search—and especially none involving interaction between the human and computer. There’s no news judgment beyond what newspaper editors originally add. And the corpus is small—tiny, really, representing only 500 articles each day, which isn’t so far from being a human-scale challenge. Visualizations hold the most promise for helping us grapple with truly internet-scale data sets—not 500 news articles a day but 500,000 news articles and blog posts.

It seems unfair to hold Slate to such a high standard, though. It’s very clear that they were shooting for something much more modest. All the same, maybe modesty isn’t what’s called for.

Advertisements

A Modest News Aggregator for the Win

To the extent that sites or services that present professional and amateur content together emerge and become successful, they will do so only after they figure out a way to give users simple, intuitive, and powerful filters that are themselves the channels that carry our conversation and shape our communities.

We will tolerate only the writing we love. Discovering what we love is a job to distribute across very large groups of users with weak ties and small groups of users with strong ties, all empowered by tools far more subtle than those that characterize current state of search. We will act mostly self-interestedly, choosing by facets, sifting, sorting, sharing, appropriating, connected to one another asymmetrically, mostly pulling not pushing, trusting when trustful. We will participate in a gift economy. Reputation will count. Attention is scarce. Something like tunkrank will help, I’m sure.

The nodes are people because people and other actors are central to what it means to be human regardless of whether we’re reading the news, writing the news, starring in it, or all of the above. The edges are the ideas that capture our common interest over time, location, and predilection. It is beautiful, Doc.

Getting in metadata game: Oh, money, that’s why!

Who’s mentioned in your article? What organizations does it talk about? Or what zip codes?

Answering these simple questions—in ways notoriously inflexible computers understand—can be like putting handles on your articles. It means aggregators and filterers like EveryBlock can grab on and give readers one more way to find what you have to say.

That’s what the New York Times is doing—in two stages, it appears. First its librarians encode the elected officials mentioned in its articles; mentioning them in the regular text of the article doesn’t cut it. Then its newly built web service, called Represent, figures out the geographic locations those officials represent. Meanwhile, Represent is also taking a computerized look at Congressional votes. When a politician votes, Represent says something like, “Oh, a person just voted in geographic area Y, and that person’s name is X.”

EveryBlock isn’t built for understanding much about people or names, but it is built for understanding locations and geographic areas. So Represent’s job is to translate from X to Y—from names to places.

Which brings us at long last to the metadata game. The historical problem is the way you have to answer these questions has been interminably dull and technical. So the historical result has been one big shoulder shrug: “Why bother?”

Well, people like Adrian Holovaty are starting to envision on answer “We have a number of ideas for sustaining our project,” he writes, “like building a local advertising engine.” That kind of engine might share ad revenue with the newspapers whose articles it incorporates. In order to claim a share, each newspaper must diligently prepare its articles for EveryBlock: there much be location handles that EveryBlock can grab. It’s highly unclear how much money EveryBlock’s hyperlocal ad targeting could generate, but if it’s enough, it will provide the kind of incentive publishers need to make boring metadata worth their while. EveryBlock might just unlock the ‘R’ in ROI. That could very well be a great reason to bother.

Epilogue    It’s notable that grant monies have helped solve this chicken-and-egg problem. I may have personal issues with the Knight News Challenge—I didn’t win and didn’t receive feedback promised on multiple occasions—but EveryBlock is quite justifiably the darling of the news innovation set.

What if the news looked like this?

news graph

See here (PDF). HT here.

PS. I wonder what enthusiastic answer David Weinberger might have to this question.

LATER: I’ve talked about a “news graph” before, mentioning “edges and nodes,” but I wasn’t yet thinking this fantastically. Still, I think it’s interesting stuff: “This kind of news graph would, at long last, make the bits of content contigent on the people and the issues they discuss. It’s the elegant organization for news.”

The Great Unbundling: A Reprise

This piece by Nick Carr, the author of the recently popular “Is Google Making Us Stupid?” in the Atlantic, is fantastic.

My summary: A print newspaper or magazine provides an array of content in one bundle. People buy the bundle, and advertisers pay to catch readers’ eyes as they thumb through the pages. But when a publication moves online, the bundle falls apart, and what’s left are just the stories.

This may no longer be revolutionary thought to anyone who knows that google is their new homepage, from which people enter their site laterally through searches. But that doesn’t mean it’s not the new gospel for digital content.

There’s only one problem with Carr’s argument, though. By focusing on the economics of production, I don’t think its observation of unbundling goes far enough. Looked at another way—from the economics of consumption and attention—not even stories are left. In actuality, there are just keywords entered into google searches. That’s increasingly how people find content, and in an age of abundance of content, finding it is what matters.

That’s where our under-wraps project comes into play. We formalize the notion of people finding content through simple abstractions of it. Fundamentally, from the user’s perspective, the value proposition lies with the keywords, or the persons of interest, not the piece of content, which is now largely commodified.

That’s why we think it’s a pretty big idea to shift the information architecture of the news away from focusing on documents and headlines and toward focusing on the newsmakers and tags. (What’s a newsmaker? A person, corporation, government body, etc. What’s a tag? A topic, a location, a brand, etc.)

The kicker is that, once content is distilled into a simpler information architecture like ours, we can do much more exciting things with it. We can extract much more interesting information from it, make much more valuable conclusions about it, and ultimately build a much more naturally social platform.

People will no longer have to manage their intake of news. Our web application will filter the flow of information based on their interests and the interests of their friends and trusted experts, allowing them to allocate their scarce attention most efficiently.

It comes down to this: Aggregating documents gets you something like Digg or Google News—great for attracting passive users who want to be spoon fed what’s important. But few users show up at Digg with a predetermined interest, and that predetermined interest is how google monetized search ads over display ads to bring yahoo to its knees. Aggregating documents make sense in a document-scarce world; aggregating the metadata of those documents makes sense in an attention-scarce world. When it comes to the news, newsmakers and tags comprise the crucially relevant metadata, which can be rendered in a rich, intuitive visualization.

Which isn’t to say that passive users who crave spoon-fed documents aren’t valuable. We can monetize those users too—by aggregating the interests of our active users and reverse-mapping them, so to speak, back onto a massive set of documents in order to find the most popular ones.

Whither Tag Clouds?

A few weeks ago, one could do relatively little clicking around the interwebs and notice the tear of pretty tag clouds powered by wordle. Bloggers of all stripes posted a wordle of their blog. Some, like Jeff Jarvis, mused about how the visualizations represent “another way way to see hot topics and another path to them.”

For as long as tag clouds have been a feature of the web, they’ve also been an object of futurist optimism, kindling images of Edward Tufte and notions that if someone could just unlock all those dense far-flung pages of information, just present them correctly, illumed, people everywhere would nod and understand. Their eyes would grow bright, and they would smile at the sheer sense it all makes. The headiness of a folksonomy is sweet for an information junkie.

It’s in that vein that ReadWriteWeb mythologizes the tag cloud as “buffalo on the pre-Columbian plains of North America.” A reader willing to cock his head and squint hard enough at the image of tag clouds “roaming the social web” as “huge, thundering herds of keywords of all shades and sizes” realizes that the Rob Cottingham would have us believe that tag clouds were graceful and defenseless beasts—and also now on the verge of extinction. He’s more or less correct.

I used to mythologize the tag cloud, but let’s be honest. They were never actually useful. You could never drag and drop one word in a tag cloud onto another to get the intersection or union of pages with those two tags. You could never really use a tag cloud to subscribe to RSS feeds of only the posts with a given set of tags.

A tag also never told you whether J.P. Morgan was a person or a bank. A tag cloud on a blog was never dynamic, never interactive. The tag cloud on one person’s blog never talked to the tag cloud on anyone else’s. I could never click on one tag and watch the cloud reform and show me only related tags, all re-sized and -colored to indicate their frequency or importance only in the part of the corpus in which the tag I clicked on is relevant.

But there’re also a cool-headed thoughts to have here. If tag clouds don’t work, what will? What is the best way to navigate around those groups of relatively many words called articles or posts? In the comments to Jarvis’s post, I asked a set of questions:

How will we know when we meet a visualization of the news that’s actually really useful? Can some visualization of the news lay not just another path to the “hot topic” but a better one? Or will headlines make a successful transition from the analog past of news to its digital future as the standard way we find what we want to read?

I believe the gut-level interest in tag clouds comes in part from the sense that headlines aren’t the best way to navigate around groups of articles much bigger than the number in a newspaper. There’s a real pain point there: scanning headlines doesn’t scale. Abstracting away from them, however, and focusing on topics and newsmakers in order to find what’s best to read or watch just might work.

I think there’s a very substantial market for a smarter tag cloud. They might look very different from what we’ve seen, but they will let us see at a glance lots of information and help us get to the best stuff faster. After all, the articles we want to read, the videos we want to watch, and the conversations we want to have around them are what’s actually important.

Spiffy Concept

Caveat user: RSS lava lamps

So be good at long-term trends, not just short-term ones. And situate your visualization in the user’s context—different users see different visualizations depending on their differences. Also, make it easy, not difficult, to combine different data sources. Finally, make them actually social and easy to share.

Wow, sounds hard.


Josh Young's Facebook profile

What I’m thinking

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.