Archive for the 'HCIR' Category

Picture this! The news graphed.

slate news dotsSlate added a curious addition to its site last week. Its heart is in the right place, and this is a good experiment, not a silly one.

I believe there’s extraordinary value to be unlocked by mapping a world of articles onto the social graph that they describe textually. I’ve written about graphing the news when I awkwardly described a scheme here and geeked out over a pretty picture here. Yes, there is a funny thing about social networks: they often describe the real world as well as sometimes exist as their own worlds. Done right, Slate’s graph could be an eye-opening mechanism for aggregating, sorting, discovering, following, sharing, and discussing the news.

But I don’t think Slate has quite done it right. In short, there’s too much information in the nodes, or the “dots,” as Slate calls them, and there’s too little information in the edges, or the links that connect up those dots. So permit me a little rambling.

We don’t really care all that much about the differences between a person, a group, and company—not at this top level of navigation, anyhow. There are too many dots, and it’s too hard to keep them all the colors straight. Of course, it’s not that hard, but if Slate’s project is fueled by bold ambition rather than fleeting plaudits, it’s just not easy enough. They’re actors. They’re newsmakers. They are entities that can be said to have a unified will or agency. And that’s enough. Make up a fancy blanket term, or just call them “people” and let smart, interested users figure out the details as they dive in.

Moreover, assistant editor Chris Wilson confuses his own term “topic.” At first, he writes, “News Dots visualizes the most recent topics in the news as a giant social network.” Also, “Like a human social network, the news tends to cluster around popular topics.” In this sense, a “topic” is an emergent property of Slate’s visualization. It’s the thing that becomes apparent to the pattern-seeking, sense-making eyes of users. So “one clump of dots might relate to a flavor-of-the-week tabloid story” or “might center on Afghanistan, Iraq, and the military.”

But then Wilson makes a subtle but ultimately very confusing shift. Explaining how to use the visualization, he writes, “click on a circle to see which stories mention that topic and which other topics it connects to in the network.” Problem is, these “topics” are what he has just called “subjects.” As emergent things, or “clumps,” his original “topics” can’t be clicked on. On the contrary, “subjects—represented by the circles below—are connected to one another,” and they’re what’s clickable.

To make matters worse, Wilson then, below the visualization, introduces more confusing terms, as he describes the role played by Open Calais (which is awesome). It “automatically ‘tags’ content with all the important keywords: people, places, companies, topics, and so forth.” The folks at Thomson Reuters didn’t invent the term “tag,” of course; it’s a long-standing if slippery term that I’m not even going to try to explain (because it really, really is just one of cases in which “the meaning of the word presupposes our ability to use it”). At any rate, Wilson seems like he’s using it because Open Calais uses it. That’s fine, but a bit more clarity would be nice, given the soup of terms already around. And there’s really little excuse for dropping in the term “keywords” because, with his technical hat on, it’s just wrong.

I’m terribly sorry to drag you, dear reader, through that intensely boring mud puddle of terminology. But it’s for good reason, I think. Graphing the news is supposed to be intuitive. The human mind just gets it. A picture is worth a thousand words. Taken seriously, that notion is powerful. At a very optimistic level, it encourages us to let visualizations speak for themselves, stripped of language all too ready to mediate them. But at a basic level, it warns us writers not to trample all over information expressed graphically with thousand textual words that add up to very little—or, worse, confusion.

But, yes, okay, about those prenominate the edges, or the links that connect up those dots! I wrote about this long ago, and my intuition tells me that it doesn’t make sense to leave edges without their own substance. They need to express more than similarity; they can do more than connect like things. If they were to express ideas or events or locations while the nodes expressed topics, it seems to me that the picture would be much more powerful. Those ideas, events, or locations wouldn’t sit in light blue “Other” nodes, as Slate has them; instead they would directly link up the people and organizations. The social network would be more richly expressed. And topics, in Wilson’s original sense, wouldn’t be emergent “clumps” but actually obvious connections.

All in all, the visualization is “depressingly static,” as a friend of mine remarked. There may be two levels of zoom, but there’s no diving. There’s no surfing, no seeing a list of stories that relate to both topic x AND topic y. There’s no navigation, no browsing. There’s no search—and especially none involving interaction between the human and computer. There’s no news judgment beyond what newspaper editors originally add. And the corpus is small—tiny, really, representing only 500 articles each day, which isn’t so far from being a human-scale challenge. Visualizations hold the most promise for helping us grapple with truly internet-scale data sets—not 500 news articles a day but 500,000 news articles and blog posts.

It seems unfair to hold Slate to such a high standard, though. It’s very clear that they were shooting for something much more modest. All the same, maybe modesty isn’t what’s called for.

Advertisements

A Modest News Aggregator for the Win

To the extent that sites or services that present professional and amateur content together emerge and become successful, they will do so only after they figure out a way to give users simple, intuitive, and powerful filters that are themselves the channels that carry our conversation and shape our communities.

We will tolerate only the writing we love. Discovering what we love is a job to distribute across very large groups of users with weak ties and small groups of users with strong ties, all empowered by tools far more subtle than those that characterize current state of search. We will act mostly self-interestedly, choosing by facets, sifting, sorting, sharing, appropriating, connected to one another asymmetrically, mostly pulling not pushing, trusting when trustful. We will participate in a gift economy. Reputation will count. Attention is scarce. Something like tunkrank will help, I’m sure.

The nodes are people because people and other actors are central to what it means to be human regardless of whether we’re reading the news, writing the news, starring in it, or all of the above. The edges are the ideas that capture our common interest over time, location, and predilection. It is beautiful, Doc.

Obstreperous Minnesota

Every once in a while—and maybe more often than I’d like to admit—I re-read Clay Shirky. Today, I re-read “Ontology Is Overrated.”

And today, I’m ready to disagree with it around the margins.

On fortune telling. Yes, Shirky’s correct that we will sometimes mis-predict the future, as when we infer that some text about Dresden is also about East Germany and will be forever. But, no, that doesn’t have to be a very strong reason for us not to have some lightweight ontology that then inferred something about a city and its country. We can just change the ontology when the Berlin Wall falls. It’s much easier than re-shelving books, after all; it’s just rewriting a little OWL.

On mind reading. Yes, Shirky’s correct that we will lose some signal—or increase entropy—when we mistake the degree to which users agree and mistakenly collapse categories. And, yes, it might be generally true about the world that we tend to “underestimate the loss from erasing difference of expression” and “overestimate loss from the lack of a thesaurus.” But it doesn’t have to be that way, and for two reasons.

First, why can’t we just get our estimations tuned? I’d think that the presumption would be that we could at least give a go and, otherwise, that the burden of demonstrating that we just cannot for some really deep reason falls on Shirky.

Second, we don’t actually need to collapse categories; we just need to build web services that recognize synonymy—and don’t shove them down our users’ throats. I take it to be a fact about the world that there are a non-trivial number of people in the world for whom ‘film’ and ‘movies’ and ‘cinema’ are just about perfect synonyms. At the risk of revealing some pretty embarrassing philistinism, I offer that I’m one of them, and I want my web service to let me know that I might care about this thing called ‘cinema’ when I show an interest in ‘film’ or ‘movies.’ I agree with Shirky that we can do this based solely on the fact that “tag overlap is in the system” while “the tag semantics are in the users” only. But why not also make put the semantics in the machine? Ultimately, both are amenable to probabilistic logic.

Google showed it is the very best at serving us information when we know we care about something fuzzy and obscure—like “obstreperous minnesota.” I don’t think Shirky would dispute this, but it’s important to bear in mind that we also want our web services to serve us really well when we don’t know we care about something (see especially Daniel Tunkelang on HCIR (@dtunkelang)). That something might be fuzzy or specific, obscure or popular, subject to disagreement or perfectly unambiguous.

People and organizations tend to be unambiguous. No one says this fine fellow Clay Shirky (@cshirky) is actually Jay Rosen (@jayrosen_nyu). That would be such a strange statement that many people wouldn’t even understand it in order to declare it false. No one says the National Basketball Association means the National Football League them. Or if someone were to say that J.P. Morgan is the same company as Morgan Stanley, we could correct him and explain how they’re similar but not identical.

Some facts about people and organization can be unambiguous some of the time, too. Someone could argue that President Obama’s profession is sports, but we could correct her and explain how it’s actually politics, which maybe sometimes works metaphorically like sports. That doesn’t mean that Obama doesn’t like basketball or that no one will ever talk about him in the context of basketball. There may be more than a few contexts in which many people think it makes little sense to think of him as a politician, like when he’s playing a game of pick-up ball. But I think we can infer pretty well ex ante that it makes lots of sense to think of Obama as a politician when he’s giving a big televised speech, signing legislation, or meeting with foreign leaders. After all, what’s the likelihood that Silvio Berlusconi or Hu Jintao would let himself get schooled on the court? Context isn’t always that dependent.


Josh Young's Facebook profile

What I’m thinking

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.