From Tim O’Reilly:
But professional publishers definitely have an incentive to add semantics if their ultimate consumer is not just reading what they produce, but processing it in increasingly sophisticated ways.
In the past and present days of the web and media, publishers competed on price. If your newspaper or book or cd was the cheapest, that was a reason for someone to buy it. As information becomes digital, and the friction of exchange wears away, information will tend to be free. (See here, here, and here—and about a million other places.) That makes competing on price pretty tough.
Of course, publishers also competed, and still do, on quality. As they should. I suspect that readers will never stop wanting their newspapers articles well sourced, well argued, and well written. Partisan readers will never stop wanting their news to make the good guys look good and the bad guys look bad. That’s all in the data.
The nature of digital information, however, changes the what information consumers will find high-quality. Now readers want much more: they want metadata. That’s what O’Reilly’s talking about. That’s what Reuters was thinking when it acquired ClearForest.
Readers won’t necessarily look at all the metadata the way they theoretically read an entire article. Instead readers might find the article because of its metadata, e.g., its issues, characters, organizations, or the neighborhood it was written about. Or they might find another article because it shares a given metadatum or because its set of metadata is similar. Or, another step out, they might find another reader who’s enjoyed lots of similar articles.
The point is that, if your newspaper has metadata that I can use, that is a reason for someone to buy (or look at the ad next to it).
Actually, it’s not that simple. The New York Times annotates its articles with a few tags hidden in the html, and almost no one pays any attention to those tags. Few would even if the tags were surfaced on the page. Blogs have had tags for years, and no one’s really using that metadata, however meager, to great effect.
When blogs do have systematic tags, the way I take advantage of them is by way of an unrelated web application, namely, Google Reader. I can, for instance, subscribe to the RSS feed on this page, which aggregates all the posts tagged “Semantic Web” across ZD Net’s family of blogs. Without RSS and Google Reader, the tags just aren’t that useful. The metadata tells me something, but RSS and a feed reader allow me to lump and split accordingly.
Google Reader allows consumers to process ZDNet’s metadata in “sophisticated ways.” Consumers can’t do it alone, and there’s real opportunity in building the tools to process the metadata.
Without the tools to process the metadata, the added information isn’t terribly useful. That’s why it’s big deal that Reuters has faith that, if it brings forth the metadata, someone will build an application that exploits them—or that slices and dices interestingly.
In fact, ClearForest already tried to entice developers with a contest in 2006. The winner was a web application called Optevi News Tracker, which isn’t very exciting to me for a number of reasons. Among them is that I don’t think it’s a good tool for exploiting metadata. I just don’t really get much more out the news, although that might change if it used more than MSNBC’s feed of news.
My gut tells me that what lies at the heart of News Tracker’s lackluster operation is that it just doesn’t do enough with its metadata. I can’t really put my finger on it, and I could be wrong. Am I? Or should I trust my gut?
So what is the killer metadata-driven news application going to look like? What metadata are important, and what are not? How do we want to interact with our metadata?