Programmable Information

From Tim O’Reilly:

But professional publishers definitely have an incentive to add semantics if their ultimate consumer is not just reading what they produce, but processing it in increasingly sophisticated ways.

In the past and present days of the web and media, publishers competed on price. If your newspaper or book or cd was the cheapest, that was a reason for someone to buy it. As information becomes digital, and the friction of exchange wears away, information will tend to be free. (See here, here, and here—and about a million other places.) That makes competing on price pretty tough.

Of course, publishers also competed, and still do, on quality. As they should. I suspect that readers will never stop wanting their newspapers articles well sourced, well argued, and well written. Partisan readers will never stop wanting their news to make the good guys look good and the bad guys look bad. That’s all in the data.

The nature of digital information, however, changes the what information consumers will find high-quality. Now readers want much more: they want metadata. That’s what O’Reilly’s talking about. That’s what Reuters was thinking when it acquired ClearForest.

Readers won’t necessarily look at all the metadata the way they theoretically read an entire article. Instead readers might find the article because of its metadata, e.g., its issues, characters, organizations, or the neighborhood it was written about. Or they might find another article because it shares a given metadatum or because its set of metadata is similar. Or, another step out, they might find another reader who’s enjoyed lots of similar articles.

The point is that, if your newspaper has metadata that I can use, that is a reason for someone to buy (or look at the ad next to it).

Actually, it’s not that simple. The New York Times annotates its articles with a few tags hidden in the html, and almost no one pays any attention to those tags. Few would even if the tags were surfaced on the page. Blogs have had tags for years, and no one’s really using that metadata, however meager, to great effect.

When blogs do have systematic tags, the way I take advantage of them is by way of an unrelated web application, namely, Google Reader. I can, for instance, subscribe to the RSS feed on this page, which aggregates all the posts tagged “Semantic Web” across ZD Net’s family of blogs. Without RSS and Google Reader, the tags just aren’t that useful. The metadata tells me something, but RSS and a feed reader allow me to lump and split accordingly.

Google Reader allows consumers to process ZDNet’s metadata in “sophisticated ways.” Consumers can’t do it alone, and there’s real opportunity in building the tools to process the metadata.

Without the tools to process the metadata, the added information isn’t terribly useful. That’s why it’s big deal that Reuters has faith that, if it brings forth the metadata, someone will build an application that exploits them—or that slices and dices interestingly.

In fact, ClearForest already tried to entice developers with a contest in 2006. The winner was a web application called Optevi News Tracker, which isn’t very exciting to me for a number of reasons. Among them is that I don’t think it’s a good tool for exploiting metadata. I just don’t really get much more out the news, although that might change if it used more than MSNBC’s feed of news.

My gut tells me that what lies at the heart of News Tracker’s lackluster operation is that it just doesn’t do enough with its metadata. I can’t really put my finger on it, and I could be wrong. Am I? Or should I trust my gut?

So what is the killer metadata-driven news application going to look like? What metadata are important, and what are not? How do we want to interact with our metadata?


1 Response to “Programmable Information”

  1. 1 ttague 2008 March 8 at 2:08 pm


    Tom Tague, head of the Reuters Calais project here.

    I think you and I are struggling with some of the same questions – OK so we add of this wonderful metadata to a news article or group of news articles – what’s next? The mind immediately goes to some flavor of faceted navigation or enhanced search and then tends to run dry of additional ideas.

    Over the last few weeks I’ve come to the conclusion (obvious in hindsight) that we run into this conceptual wall because we’re not thinking big enough. We’re stopping at “what do we do with metadata + the news” as if those two things existed in a vacuum.

    The next big step is, I believe, linked data. If we can extract high fidelity semantic metadata from the news and – rather than stopping there – go on to link those metadata elements (people, places, organizations, etc) unambiguously to other rich machine-accessible data sources then we have a whole new toolkit.

    In this scenario I can – programmatically – read a news item, extract a person, go to Freebase and find that person’s educational affiliations, hit a federal funding database to look up grant information for those institutions. Then I can mash it all together to show politician’s impact on federal finding for their favorite schools. Or about 10,000 other things. What’s missing in all of this today are 1) adequate machine-accessible (e.g. API based) data sources, 2) a great card catalog of those assets. Over the coming year you’ll see the Calais initiative starting to whittle away at those barriers.

    So, to summarize – we need to get out of the “news box” and start thinking about how we can dramatically enhance a piece of news by providing the full contextual landscape around it. A news article should catalyze our learning and exploration – not contain it.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Josh Young's Facebook profile

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.

%d bloggers like this: