Archive for the 'semanticweb' Category

Not by Links Alone

At this unthinkably late hour, many of even the most recalcitrant journalists and newsy curmudgeons have given themselves over, painfully, to the fundamentally important fact that the economics of abundance now govern their world.

For many, of course, stemming that tide is still paramount. Their goal, as David Carr writes, is to squelch the “new competition for ads and minds.” Thus Walter Isaacson’s “E-ZPass digital wallet” and Alan Mutter’s “Original Sin.” Thus Michael Moran’s obnoxious “NOPEC.” Thus Journalism Online. And, of course, thus we have David Simon’s recent call for Congress to “consider relaxing certain anti-trust prohibitions” or this call in the Washington Post to rework fair use. I wish them all good luck, but mostly good night.

There are others, though, who think it’s great that the Internet and Google are opening up the news to competition. In fact, “Google is good” strikes me as nearly orthodox among the basically Internet-savvy set of news talkers. Marissa Mayer crows about how Google delivers newspapers’ Web sites one billion clicks a month, and Arianna Huffington insists that the future of news is to be found in a “linked economy” and “search engines” like Google.

In this narrative, Google’s the great leveler, ushering the world of journalism out of the dark, dank ages of monopoly and into the light, bright days of competition, where all news articles and blog posts stand on their own pagerank before the multitude of users who judge with their links and their clicks. Its ablest defender is probably Jeff Jarvis, author of What Would Google Do? Jarvis was relatively early in pointing out that “Google commodifies the world’s content by making it all available on a level playing field in its search.” In that and other posts at Buzz Machine, his widely read blog, Jarvis allows that Google “can make life difficult” but insists, “that’s not Google’s fault.” The reverence for Google is thick: “The smart guys are hiring search-engine optimization experts and trying to figure out how to get more people to their stuff thanks to Google.”

But defenders of Google’s influence on the broader market for news and newspapers themselves make a striking error in believing that the market for content is competitive. That belief is wrong—not just a little bit or on the margin, but fundamentally, and importantly, wrong.

Which is not to say that news publishers aren’t competing for readers’ eyeballs and attention. Publishers compete with one another all day long, every day—with some local exceptions, the news has always been competitive like a race, and is now more competitive like a market than ever before. But the market for that news—the place where consumers decide what to read, paying with their attention—is not competitive. Google may well be the great leveler, but down to how low a field?

To be very clear, this is far from a neo-classical purist’s critique that picks nits by abusing uselessly theoretical definitions. I am not a purist, an economist, or a jerk. This is reality, as best as I know it. Nevertheless, to say that the market for content is competitive is just to misunderstand what a competitive market actually entails. The market for news content as it currently stands, with Google in the middle, is a profoundly blurry, deeply uncompetitive space.

*    *    *

“The difficulty of distinguishing good quality from bad is inherent in the business world,” Nobel laureate George Akerlof wrote in the kicker of his most famous paper, published in 1970. “This may indeed explain many economic institutions and may in fact be one of the more important aspects of uncertainty.”

Akerlof fired an early shot in a scholarly marathon to study the effects of asymmetric information in markets. What do parties to a potential transaction do when they know different sets of facts? Maybe that seems like an obvious question, but economists in the middle of the twentieth century had been pretty busy worrying about perfecting complicated models despite their grossly simplistic assumptions.

So Akerlof set about to write about how markets can fail when some of those assumptions turn out to be bunk. The assumption he tested first, in “The Market for ‘Lemons,’” was certainty, and he showed that when sellers know more about the goods being sold than the buyers do, sellers abuse their privileged position and buyers leave the market.

Writing in the same year, the economist Phillip Nelson studied the differences between what he called “search goods” and “experience goods.” Search goods and experience goods express a certain kind of asymmetry. For search goods, consumers can overcome the asymmetry before the point of purchase by doing their homework, while for experience goods, consumers must take their time and invest.

A pair of pants, for instance, is a search good—you can try before you buy, and shop around for the pants that fit you best. An apple, on the other hand, is an experience good—you don’t know whether you’ll like one until you consume it, and you can’t really try before you buy.

News articles are experience goods. Just as with an apple, you need to consume the story, reading the article or watching the video or so on, in order to judge its quality. “Stories can vary in length, accuracy, style of presentation, and focus,” writes economist James Hamilton in All the News That’s Fit to Sell. “For a given day’s events, widely divergent news products are offered to answer the questions of who, what, where, when, and why.” We can’t know which one’s best till we’ve read them all, and who’s got time for that?

Moreover, a multitude of subjective editorial decisions produce the news. Each reporter’s practices and habits influence what’s news and what’s not. Their learned methods, their assigned beats, and even their inverted pyramids shape what we read and how. Reporters’ and editors’ tastes, their histories, or their cultures matter, as do their professional ethics. Each article of news is a nuanced human document—situated aesthetically, historically, culturally, and ethically.

Ultimately, the news is afflicted with the problem of being an experience good more than even apples are. At least Granny Smiths don’t vary wildly from farmer to farmer or from produce bin to produce bin. Sure, some may be organic, while others are conventional. One may be tarter or crispier than another, but tremendous differences from the mean are very unlikely. With the news, though, it’s hard even to think of what the mean might be. It may seem obvious, but articles, essays, and reports are complex products of complex writerly psychologies.

For a long time, however, as readers, we were unaware of these nuances of production. That was, in some sense, the upshot: our experience of this journalism was relatively uncomplicated. This profound lack of context mattered much less.

Call it the myth of objectivity maybe, but what NYU professor Jay Rosen has labeled the “mask of professional distance” meant that we didn’t have much of a chance to bother with a whole world complexities. Because everyone usually wore a mask, and because everyone’s masked looked about the same, we ignored—indeed, we were largely necessarily ignorant of—all the unique faces.

For a long time, therefore, the orthodox goal of American newspapers virtually everywhere was news that really wasn’t an experience good. When news existed only on paper, it hardly mattered what news was, because we had so few seemingly monochrome choices about what to read. We returned to the same newspapers and reporters behind the same masks over and over again, and through that repetition, we came subtly to understand the meaning and implications of their limited degrees of “length, accuracy, style of presentation, and focus.”

As a result, we often grew to love our newspaper—or to love to hate it. But even if we didn’t like our newspaper, it was ours, and we accepted it, surrendering our affection either way, even begrudgingly. The world of news was just much simpler, a more homogeneous, predictable place—there were fewer thorny questions, fewer observable choices. There was less risk by design. Our news was simpler, or it seemed to be, and we had little choice but to become familiar with it anyhow. One benefit of the View from Nowhere, after all, is that basically everyone adopted it—that it basically became a standard, reducing risk.

But a funny thing happened in this cloistered world. Because it seemed only natural, we didn’t realize the accidental nature of the understanding and affection between readers and their newspapers. If, as the economists would have it, the cost of a thing is what we’ve sacrificed in order to achieve it, then our understanding and affection were free. We gave nothing up for them—for there was scarcely another alternative. As a result, both readers and publishers took those things for granted. This point is important because publishers are still taking those things for granted, assuming that all people of good faith still appreciate and love all the good things that a newspaper puts on offer.

*    *    *

But when our informational options explode, we can plainly, and sometimes painfully, see that our newspapers aren’t everything. Different newspapers are better at answering different questions, and some answers—some as simple as what we should talk about at work tomorrow—don’t come from newspapers at all. So we go hunting on the Internet. So we gather. So we Google.

We have now spent about a decade Googling. We have spent years indulging in information, and they have been wonderful years. We are overawed by our ability to answer questions online. Wikipedia has helped immensely in our efforts to answer those questions, but pagerank elevated even it. Newspapers compose just one kind of Web site to have plunged into the scrum of search engine optimization. Everyone’s hungry for links and clicks.

And Google represents the Internet at large for two reasons. For one, the engine largely structures our experience of the overall vehicle. More importantly, though, Google’s organization of the Internet changes the Internet itself. The Search Engine Marketing Professional Organization estimates, in this PDF report, that North American spending on organic SEO in 2008 was about $1.5 billion. But that number is surely just the tip of the iceberg. Google wields massive power over the shape and structure of the Internet’s general landscape of Web pages, Web applications, and the links among them. Virtually no one builds even a semi-serious Web site without considering whether it will be indexed optimally. For journalism, most of the time, the effects are either irrelevant or benign.

But think about Marissa Mayer’s Senate testimony about the “living story.” Newspaper Web sites, she said, “frequently publish several articles on the same topic, sometimes with identical or closely related content.” Because those similar pages share links from around the Web, neither one has the pagerank that a single one would have. Mayer would have news Web sites structure their content more like Wikipedia: “Consider how the authoritativeness of news articles might grow if an evolving story were published under a permanent, single URL as a living, changing, updating entity.”

Setting aside for the moment whatever merits Mayer’s idea might have, imagine the broader implications. She’s encouraging newspapers to change not just their marketing or distribution strategies but their journalism because Google doesn’t have an algorithm smart enough to determine that they should share the “authoritativeness.”

At Talking Points Memo, Josh Marshall’s style of following a story over a string of blog posts, poking and prodding an issue from multiple angles, publishing those posts in a stream, and letting the story grow incrementally, cumulatively might be disadvantaged because those posts are, naturally, found at different URLs. His posts would compete for pagerank.

And maybe it would be better for journalism if bloggers adopted the “living story” model of reporting. Maybe journalism schools should start teaching it. Or maybe not—maybe there is something important about what the structure of content means for context. The point here isn’t to offer substantive answer to this question, but rather to point out that Mayer seems unaware of the question in the first place. It’s natural that Mayer would think that what’s good for Google is good for Internet users at large. For most domestic Internet users, after all, Google, which serves about two-thirds of all searches, essentially is their homepage for news.

But most news articles, of course, simply aren’t like entries in an encyclopedia. An article of news—in both senses of the term—is substantially deeper than the facts it contains. An article of news, a human document, means substantially more to us than its literal words—or the pageranked bag of words that Google more or less regards it as.

Google can shine no small amount of light on whether we want to read an article of news. And, importantly, Google’s great at telling you when others have found an article of news to be valuable. But the tastes of anonymous crowds—of everyone—are not terribly good at determining whether we want to read some particular article of news, particularly situated, among all the very many alternatives, each particularly situated unto itself.

Maybe it all comes down to a battle between whether Google encourages “hit-and-run” visits or “qualified leads.” I don’t doubt that searchers from Google often stick around after they alight on a page. But I doubt they stick around sufficiently often. In that sense, I think Daniel Tunkelang is precisely correct: “Google’s approach to content aggregation and search encourages people to see news…through a very narrow lens in which it’s hard to tell things apart. The result is ultimately self-fulfilling: it becomes more important to publications to invest in search engine optimization than to create more valuable content.”

*    *    *

The future-of-news doomsayers are so often wrong. A lot of what they said at Kerry’s hearing was wrong. It’s woefully wrongheaded to call Google parasitic simply because it the Internet without it would be a distinctly worse place. There would be, I suspect, seriously fewer net pageviews for news. And so it’s easy to think that they’re wrong about everything—because it seems that they fundamentally misunderstand the Internet.

But they don’t hold a monopoly on misunderstanding. “When Google News lists one of ours stories in a prominent position,” writes Henry Blodget, “we don’t wail and moan about those sleazy thieves at Google. We shout, ‘Yeah, baby,’ and start high-fiving all around.” To Blodget, “Google is advertising our stories for free.”

But life is about alternatives. There’s what is, and there’s what could be. And sometimes what could be is better than what is—sometimes realistically so. So however misguided some news executives may have been or may still be about their paywalls and buyouts, they also sense that Google’s approach to the Web can’t reproduce the important connection the news once had with readers. Google just doesn’t fit layered, subtle, multi-dimensional products—experience goods—like articles of serious journalism. Because news is an experience good, we need really good recommendations about whether we’re going to enjoy it. And the Google-centered link economy just won’t do. It doesn’t add quite enough value. We need to know more about the news before we sink our time into reading it than pagerank can tell us. We need the news organized not by links alone.

What we need is a search experience that let’s us discover the news in ways that fit why we actually care about it. We need a search experience built around concretely identifiable sources and writers. We need a search experience built around our friends and, lest we dwell too snugly in our own comfort zones, other expert readers we trust. These are all people—and their reputations or degrees of authority matter to us in much the same ways.

We need a search experience built around beats and topics that are concrete—not hierarchical, but miscellaneous and semantically well defined. We need a search experience built around dates, events, and locations. We need a search experience that’s multi-faceted and persistent, a stream of news. Ultimately, we need a powerful, flexible search experience that merges automatization and human judgment—that is sensitive to the very particular and personal reasons we care about news in the first place.

The people at Senator Kerry’s hearing last week seemed either to want to dam the river and let nothing through or to whip its flow up into a tidal wave. But the real problem is that they’re both talking about the wrong river. News has changed its course, to be sure, so in most cases, dams are moot at best. At the same time, though, chasing links and clicks, with everyone pouring scarce resources into an arms race of pagerank while aggregators direct traffic and skim a few page views, isn’t sufficiently imaginative either.

UPDATE: This post originally slipped out the door before it was fully dressed. Embarrassing, yes. My apologies to those who read the original draft of this thing and were frustrated by the unfinished sentences and goofy notes to self, and my thanks to those who read it all it the same.

Questions for Open Calais?

So I’m interviewing the folks over Thomson Reuters on Thursday for a piece that should be published at CJR. We’ll being talking about a relatively new service they’re providing freely. That service is called Open Calais, and it does some fancy stuff to plain text.

What fancy stuff? If you send it a news article, Open Calais will give you back the deets—and, way more importantly, it will make them obvious to your computer as well. That’s my description inspired by the Idiot’s guide, anyhow. (Yes, “deets” means “details” to cool kids, so get on board.)

<digression>Basically, the whole point of the semantic web is to make what’s obvious to you also obvious to your computer. For people who have always anthropomorphized their every laptop and piece of software—loved them when they just work, coaxed them when they slow to a crawl, and yelled at them when they grind to a halt—this can be a serious head-scratcher and a boring one at that. I blame Clippy the Microsoft Office Assistant. I also blame super-futuristic sci-fi movies that give us sugar-plum images of computers as pals—bright, sophisticated, and in possession of a knowledge like we epistemologically gifted humans have. Screw Threepio. Finally, I blame that jerk Alan Turing, who fed us the unintuitive half-truth that a computer could be conscious.

So it feels really silly so to say, again, but computers are ones and zeroes, NAND gates and NOR gates. They called computers because they do computation. They don’t do meaning as such. (Oh boy do I hope I get flamed in the comments by someone who knows his way around BsIV way better than I do.)</digression>

Open Calais will pick out people, companies, and places—these are called “named entities.” It will also identify facts and events in articles. Because Thomson Reuters is finance-focused information provider, many of the facts and events it can recognize are about business relationships like subsidiaries and events like bankruptcies, acquisitions, and IPOs. The list goes on and on. Finally, Open Calais will identify very broad categories like politics, business, sports, or entertainment.

Open Calais will also associate these deets with more further information on teh interwebs. So just for instance, if the web service identifies a person in your article, it will give you and your finicky, picky, and ultimately dumb computer a nice pointer to this computer-friendly version of wikipedia called dbpedia. Or if Calais identifies a movie, it will offer a pointer to linked data, as far as I can tell, is still a pretty vague notion. It promises to deliver more than it has to date, and that’s not a derogation.

But why freely—or essentially so in most cases? If you keep within liberal limits, you owe Thomson Reuters no money in exchange. Correct me if I’m wrong, but all they want, more or less, is that you offer them attribution and use their linked-data pointers (they call them URIs). Ken Ellis, chief scientist at Daylife, which may be best known to journalists through its association with Jeff Jarvis, took a stab at answering the “why free?” question:

Thomson Reuters has a large collection of subscription data services. They eventually want to link to these services. Widespread use of Calais increases the ease with which customers can access these subscription data services, ultimately increasing their ability to extract revenue from them.

That sounds to me like Thomson Reuters is interested in making its standards the standards. And that bargain really does sound reasonable. I guess.

But journalists are a wildly skeptical bunch. They’re skeptical—aloof even, way too cool for school and ideology. Journalists have a pretty acute and chronic deficiency in a little thing called trust. Maybe it’s justified, or maybe it’s not. Maybe it’s mostly justified, or maybe it’s mostly unjustified.

Either way, my gut’s telling me that journalists are going to need a fuller narrative from Thomson Reuters about why they should rely on another news and information company. When I talk to Tom and Krista, that’s what I’ll be largely interested in.

And you? What do you want to know about Open Calais. Leave your questions in the comments, and I’ll be sure to ask them.

Getting in metadata game: Oh, money, that’s why!

Who’s mentioned in your article? What organizations does it talk about? Or what zip codes?

Answering these simple questions—in ways notoriously inflexible computers understand—can be like putting handles on your articles. It means aggregators and filterers like EveryBlock can grab on and give readers one more way to find what you have to say.

That’s what the New York Times is doing—in two stages, it appears. First its librarians encode the elected officials mentioned in its articles; mentioning them in the regular text of the article doesn’t cut it. Then its newly built web service, called Represent, figures out the geographic locations those officials represent. Meanwhile, Represent is also taking a computerized look at Congressional votes. When a politician votes, Represent says something like, “Oh, a person just voted in geographic area Y, and that person’s name is X.”

EveryBlock isn’t built for understanding much about people or names, but it is built for understanding locations and geographic areas. So Represent’s job is to translate from X to Y—from names to places.

Which brings us at long last to the metadata game. The historical problem is the way you have to answer these questions has been interminably dull and technical. So the historical result has been one big shoulder shrug: “Why bother?”

Well, people like Adrian Holovaty are starting to envision on answer “We have a number of ideas for sustaining our project,” he writes, “like building a local advertising engine.” That kind of engine might share ad revenue with the newspapers whose articles it incorporates. In order to claim a share, each newspaper must diligently prepare its articles for EveryBlock: there much be location handles that EveryBlock can grab. It’s highly unclear how much money EveryBlock’s hyperlocal ad targeting could generate, but if it’s enough, it will provide the kind of incentive publishers need to make boring metadata worth their while. EveryBlock might just unlock the ‘R’ in ROI. That could very well be a great reason to bother.

Epilogue    It’s notable that grant monies have helped solve this chicken-and-egg problem. I may have personal issues with the Knight News Challenge—I didn’t win and didn’t receive feedback promised on multiple occasions—but EveryBlock is quite justifiably the darling of the news innovation set.

Obstreperous Minnesota

Every once in a while—and maybe more often than I’d like to admit—I re-read Clay Shirky. Today, I re-read “Ontology Is Overrated.”

And today, I’m ready to disagree with it around the margins.

On fortune telling. Yes, Shirky’s correct that we will sometimes mis-predict the future, as when we infer that some text about Dresden is also about East Germany and will be forever. But, no, that doesn’t have to be a very strong reason for us not to have some lightweight ontology that then inferred something about a city and its country. We can just change the ontology when the Berlin Wall falls. It’s much easier than re-shelving books, after all; it’s just rewriting a little OWL.

On mind reading. Yes, Shirky’s correct that we will lose some signal—or increase entropy—when we mistake the degree to which users agree and mistakenly collapse categories. And, yes, it might be generally true about the world that we tend to “underestimate the loss from erasing difference of expression” and “overestimate loss from the lack of a thesaurus.” But it doesn’t have to be that way, and for two reasons.

First, why can’t we just get our estimations tuned? I’d think that the presumption would be that we could at least give a go and, otherwise, that the burden of demonstrating that we just cannot for some really deep reason falls on Shirky.

Second, we don’t actually need to collapse categories; we just need to build web services that recognize synonymy—and don’t shove them down our users’ throats. I take it to be a fact about the world that there are a non-trivial number of people in the world for whom ‘film’ and ‘movies’ and ‘cinema’ are just about perfect synonyms. At the risk of revealing some pretty embarrassing philistinism, I offer that I’m one of them, and I want my web service to let me know that I might care about this thing called ‘cinema’ when I show an interest in ‘film’ or ‘movies.’ I agree with Shirky that we can do this based solely on the fact that “tag overlap is in the system” while “the tag semantics are in the users” only. But why not also make put the semantics in the machine? Ultimately, both are amenable to probabilistic logic.

Google showed it is the very best at serving us information when we know we care about something fuzzy and obscure—like “obstreperous minnesota.” I don’t think Shirky would dispute this, but it’s important to bear in mind that we also want our web services to serve us really well when we don’t know we care about something (see especially Daniel Tunkelang on HCIR (@dtunkelang)). That something might be fuzzy or specific, obscure or popular, subject to disagreement or perfectly unambiguous.

People and organizations tend to be unambiguous. No one says this fine fellow Clay Shirky (@cshirky) is actually Jay Rosen (@jayrosen_nyu). That would be such a strange statement that many people wouldn’t even understand it in order to declare it false. No one says the National Basketball Association means the National Football League them. Or if someone were to say that J.P. Morgan is the same company as Morgan Stanley, we could correct him and explain how they’re similar but not identical.

Some facts about people and organization can be unambiguous some of the time, too. Someone could argue that President Obama’s profession is sports, but we could correct her and explain how it’s actually politics, which maybe sometimes works metaphorically like sports. That doesn’t mean that Obama doesn’t like basketball or that no one will ever talk about him in the context of basketball. There may be more than a few contexts in which many people think it makes little sense to think of him as a politician, like when he’s playing a game of pick-up ball. But I think we can infer pretty well ex ante that it makes lots of sense to think of Obama as a politician when he’s giving a big televised speech, signing legislation, or meeting with foreign leaders. After all, what’s the likelihood that Silvio Berlusconi or Hu Jintao would let himself get schooled on the court? Context isn’t always that dependent.

Why Socialmedian, Twine, and Others Don’t Get the News

More than a year ago, I asked, “What Is Networked News?” I was thinking about how people really, actually want to get their news, and my answer came in three parts.

Let’s focus briefly on the first two. (1) People care about who writes it or creates it. In other words, people want their news from trusted publishers. (2) People also care about who likes it. In other words, people want their news from trusted consumers—their “friends.”

News in the modern era has naturally revolved around publishers. That part’s old-hat, and so people need little help from innovators in getting their news from publishers. But innovators have made tremendous accomplishments in helping people get their news from their friends. This is largely the story of the success of Web 2.0 so far, and many startups have engineered ingenious systems for delivering news to people because their friends like it.

FriendFeed is one such awesome story. Twitter’s another. Google Reader’s “share” feature and its openness, which has allowed others to build applications on top of it, make for another perfect example. The ethic of the link among bloggers is, in a very real way, central to this concept: one person referring others to someone else’s thoughts.

But I also wrote about a third way. (3) People want their news about what interests them. This may seem like a trivial statement, but it is deeply important. There is still tons of work to be done by innovators in engineering systems for actually delivering news to people because they want exactly what they want and don’t want any of the rest.

Twine‘s “twines” come close. Socialmedian‘s “news networks” come close. They’re both examples of innovation moving in the right direction.

But they don’t go nearly far enough. Twine looks like it’s got significant horsepower under the hood, but it lacks the intuitive tools to deliver. Frankly, it’s badly burdened by its overblown vision of a tricked-out Semantic Web application that’s everything to all people all the time. Twine is, as a result, an overcomplicated mess.

Socialmedian’s problem are worse, however. It’s simply underpowered. Nothing I’ve read, including its press release reproduced here, indicates the kind of truly innovative back-end that can revolutionize the news. Socialmedian wraps a stale social donut around Digg, and I’m afraid that’s about it.

When it comes to the news, people demand (1), (2), and (3). They want their most trusted publishers and their most trusted friends, and they want to personalize their interests with radical granularity. That takes an intense back-end, which Socialmedian simply lacks. That also takes an elegant user-facing information architecture, which Twine lacks.

We’ve had (1) for years, and I’m thrilled at the advances I see made seemingly every day toward a more perfect (2). But a killer news web application has yet to deliver on (3). When it does, we’ll have something that’s social and powerful and dead-simple too.

Give me tags, Calais!

Who needs to think about buying tags when Reuters and its newly acquired company are giving them away?

The web service is free for commercial and non-commercial use. We’ve sized the initial release to handle millions of requests per day and will scale it as necessary to support our users.

I mean, Jesus, it’s so exciting and scary (!) all at once:

This metadata gives you the ability to build maps (or graphs or networks) linking documents to people to companies to places to products to events to geographies to … whatever. You can use those maps to improve site navigation, provide contextual syndication, tag and organize your content, create structured folksonomies, filter and de-duplicate news feeds or analyze content to see if it contains what you care about. And, you can share those maps with anyone else in the content ecosystem.

More: “What Calais does sounds simple—what you do with it could be simply amazing.”

If the world were smart, there would be a gold rush to be first to build the killer app. Mine will be for serving the information needs of communities in a democracy—in a word, news. Who’s coming with me?

PS. Good for Reuters. May its bid to locate itself at the pulsing informational center of the semantic web and the future of news prove as ultimately lucrative as it is profoundly socially benevolent.

Sell me tags, Twine!

How much would, say, the New York Times have to pay to have the entirety of its newspaper analyzed and annotated every day?

The question is not hypothetical.

The librarians could go home, and fancy machine learning and natural language processing could step in and start extracting entities and tagging content. Hi, did you know Bill Clinton is William Jefferson Clinton but not Senator Clinton?! Hey there, eh, did you know that Harlem is in New York City?! Oh, ya, did you know that Republicans and Democrats are politicians, who are the silly people running around playing something called politics?!

Twine could tell you all that. Well, they say they can, but they won’t invite me to their private party! And maybe the librarians wouldn’t have to go home. Maybe they could monitor (weave?) the Twine and help it out when it falls down (frays?).

I want to buy Twine’s smarts, its fun tags. I’d pay a heckuva lot for really precociously smart annotation! They say, after all, that it will be an open platfrom from which we can all export our data. Just, please, bloat out all my content with as much metadata as you can smartly muster! Por favor, sir! You are my tagging engine—now get running!

What if Twine could tag all the news that’s fit to read? It would be a fun newspaper. Maybe I’d subscribe to all the little bits of content tagged both “Barack Obama” and “president.” Or maybe I’d subscribe to all the local blog posts and newspaper articles and videos tagged “Harlem” and “restaurant”—but only if those bits of content were already enjoyed by one of my two hundred closest friends in the world.

I’d need a really smart and intuitive interface to make sense of this new way of approaching the news. Some online form of newsprint just wouldn’t cut it. I’d need a news graph, for sure.

See TechCrunch’s write-up, Read/Write Web’s, and Nick Carr’s too.

PS. Or I’ll just build my own tagging engine. It’ll probably be better because I can specifically build it to reflect the nature of news.


Josh Young's Facebook profile

What I’m thinking

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.

Follow

Get every new post delivered to your Inbox.