Archive for the 'folksonomy' Category

Obstreperous Minnesota

Every once in a while—and maybe more often than I’d like to admit—I re-read Clay Shirky. Today, I re-read “Ontology Is Overrated.”

And today, I’m ready to disagree with it around the margins.

On fortune telling. Yes, Shirky’s correct that we will sometimes mis-predict the future, as when we infer that some text about Dresden is also about East Germany and will be forever. But, no, that doesn’t have to be a very strong reason for us not to have some lightweight ontology that then inferred something about a city and its country. We can just change the ontology when the Berlin Wall falls. It’s much easier than re-shelving books, after all; it’s just rewriting a little OWL.

On mind reading. Yes, Shirky’s correct that we will lose some signal—or increase entropy—when we mistake the degree to which users agree and mistakenly collapse categories. And, yes, it might be generally true about the world that we tend to “underestimate the loss from erasing difference of expression” and “overestimate loss from the lack of a thesaurus.” But it doesn’t have to be that way, and for two reasons.

First, why can’t we just get our estimations tuned? I’d think that the presumption would be that we could at least give a go and, otherwise, that the burden of demonstrating that we just cannot for some really deep reason falls on Shirky.

Second, we don’t actually need to collapse categories; we just need to build web services that recognize synonymy—and don’t shove them down our users’ throats. I take it to be a fact about the world that there are a non-trivial number of people in the world for whom ‘film’ and ‘movies’ and ‘cinema’ are just about perfect synonyms. At the risk of revealing some pretty embarrassing philistinism, I offer that I’m one of them, and I want my web service to let me know that I might care about this thing called ‘cinema’ when I show an interest in ‘film’ or ‘movies.’ I agree with Shirky that we can do this based solely on the fact that “tag overlap is in the system” while “the tag semantics are in the users” only. But why not also make put the semantics in the machine? Ultimately, both are amenable to probabilistic logic.

Google showed it is the very best at serving us information when we know we care about something fuzzy and obscure—like “obstreperous minnesota.” I don’t think Shirky would dispute this, but it’s important to bear in mind that we also want our web services to serve us really well when we don’t know we care about something (see especially Daniel Tunkelang on HCIR (@dtunkelang)). That something might be fuzzy or specific, obscure or popular, subject to disagreement or perfectly unambiguous.

People and organizations tend to be unambiguous. No one says this fine fellow Clay Shirky (@cshirky) is actually Jay Rosen (@jayrosen_nyu). That would be such a strange statement that many people wouldn’t even understand it in order to declare it false. No one says the National Basketball Association means the National Football League them. Or if someone were to say that J.P. Morgan is the same company as Morgan Stanley, we could correct him and explain how they’re similar but not identical.

Some facts about people and organization can be unambiguous some of the time, too. Someone could argue that President Obama’s profession is sports, but we could correct her and explain how it’s actually politics, which maybe sometimes works metaphorically like sports. That doesn’t mean that Obama doesn’t like basketball or that no one will ever talk about him in the context of basketball. There may be more than a few contexts in which many people think it makes little sense to think of him as a politician, like when he’s playing a game of pick-up ball. But I think we can infer pretty well ex ante that it makes lots of sense to think of Obama as a politician when he’s giving a big televised speech, signing legislation, or meeting with foreign leaders. After all, what’s the likelihood that Silvio Berlusconi or Hu Jintao would let himself get schooled on the court? Context isn’t always that dependent.

Advertisements

Whither Tag Clouds?

A few weeks ago, one could do relatively little clicking around the interwebs and notice the tear of pretty tag clouds powered by wordle. Bloggers of all stripes posted a wordle of their blog. Some, like Jeff Jarvis, mused about how the visualizations represent “another way way to see hot topics and another path to them.”

For as long as tag clouds have been a feature of the web, they’ve also been an object of futurist optimism, kindling images of Edward Tufte and notions that if someone could just unlock all those dense far-flung pages of information, just present them correctly, illumed, people everywhere would nod and understand. Their eyes would grow bright, and they would smile at the sheer sense it all makes. The headiness of a folksonomy is sweet for an information junkie.

It’s in that vein that ReadWriteWeb mythologizes the tag cloud as “buffalo on the pre-Columbian plains of North America.” A reader willing to cock his head and squint hard enough at the image of tag clouds “roaming the social web” as “huge, thundering herds of keywords of all shades and sizes” realizes that the Rob Cottingham would have us believe that tag clouds were graceful and defenseless beasts—and also now on the verge of extinction. He’s more or less correct.

I used to mythologize the tag cloud, but let’s be honest. They were never actually useful. You could never drag and drop one word in a tag cloud onto another to get the intersection or union of pages with those two tags. You could never really use a tag cloud to subscribe to RSS feeds of only the posts with a given set of tags.

A tag also never told you whether J.P. Morgan was a person or a bank. A tag cloud on a blog was never dynamic, never interactive. The tag cloud on one person’s blog never talked to the tag cloud on anyone else’s. I could never click on one tag and watch the cloud reform and show me only related tags, all re-sized and -colored to indicate their frequency or importance only in the part of the corpus in which the tag I clicked on is relevant.

But there’re also a cool-headed thoughts to have here. If tag clouds don’t work, what will? What is the best way to navigate around those groups of relatively many words called articles or posts? In the comments to Jarvis’s post, I asked a set of questions:

How will we know when we meet a visualization of the news that’s actually really useful? Can some visualization of the news lay not just another path to the “hot topic” but a better one? Or will headlines make a successful transition from the analog past of news to its digital future as the standard way we find what we want to read?

I believe the gut-level interest in tag clouds comes in part from the sense that headlines aren’t the best way to navigate around groups of articles much bigger than the number in a newspaper. There’s a real pain point there: scanning headlines doesn’t scale. Abstracting away from them, however, and focusing on topics and newsmakers in order to find what’s best to read or watch just might work.

I think there’s a very substantial market for a smarter tag cloud. They might look very different from what we’ve seen, but they will let us see at a glance lots of information and help us get to the best stuff faster. After all, the articles we want to read, the videos we want to watch, and the conversations we want to have around them are what’s actually important.


Josh Young's Facebook profile

What I’m thinking

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.