Obstreperous Minnesota

Every once in a while—and maybe more often than I’d like to admit—I re-read Clay Shirky. Today, I re-read “Ontology Is Overrated.”

And today, I’m ready to disagree with it around the margins.

On fortune telling. Yes, Shirky’s correct that we will sometimes mis-predict the future, as when we infer that some text about Dresden is also about East Germany and will be forever. But, no, that doesn’t have to be a very strong reason for us not to have some lightweight ontology that then inferred something about a city and its country. We can just change the ontology when the Berlin Wall falls. It’s much easier than re-shelving books, after all; it’s just rewriting a little OWL.

On mind reading. Yes, Shirky’s correct that we will lose some signal—or increase entropy—when we mistake the degree to which users agree and mistakenly collapse categories. And, yes, it might be generally true about the world that we tend to “underestimate the loss from erasing difference of expression” and “overestimate loss from the lack of a thesaurus.” But it doesn’t have to be that way, and for two reasons.

First, why can’t we just get our estimations tuned? I’d think that the presumption would be that we could at least give a go and, otherwise, that the burden of demonstrating that we just cannot for some really deep reason falls on Shirky.

Second, we don’t actually need to collapse categories; we just need to build web services that recognize synonymy—and don’t shove them down our users’ throats. I take it to be a fact about the world that there are a non-trivial number of people in the world for whom ‘film’ and ‘movies’ and ‘cinema’ are just about perfect synonyms. At the risk of revealing some pretty embarrassing philistinism, I offer that I’m one of them, and I want my web service to let me know that I might care about this thing called ‘cinema’ when I show an interest in ‘film’ or ‘movies.’ I agree with Shirky that we can do this based solely on the fact that “tag overlap is in the system” while “the tag semantics are in the users” only. But why not also make put the semantics in the machine? Ultimately, both are amenable to probabilistic logic.

Google showed it is the very best at serving us information when we know we care about something fuzzy and obscure—like “obstreperous minnesota.” I don’t think Shirky would dispute this, but it’s important to bear in mind that we also want our web services to serve us really well when we don’t know we care about something (see especially Dan Tunkelang on HCIR (@dtunkelang)). That something might be fuzzy or specific, obscure or popular, subject to disagreement or perfectly unambiguous.

People and organizations tend to be unambiguous. No one says this fine fellow Clay Shirky (@cshirky) is actually Jay Rosen (@jayrosen_nyu). That would be such a strange statement that many people wouldn’t even understand it in order to declare it false. No one says the National Basketball Association means the National Football League them. Or if someone were to say that J.P. Morgan is the same company as Morgan Stanley, we could correct him and explain how they’re similar but not identical.

Some facts about people and organization can be unambiguous some of the time, too. Someone could argue that President Obama’s profession is sports, but we could correct her and explain how it’s actually politics, which maybe sometimes works metaphorically like sports. That doesn’t mean that Obama doesn’t like basketball or that no one will ever talk about him in the context of basketball. There may be more than a few contexts in which many people think it makes little sense to think of him as a politician, like when he’s playing a game of pick-up ball. But I think we can infer pretty well ex ante that it makes lots of sense to think of Obama as a politician when he’s giving a big televised speech, signing legislation, or meeting with foreign leaders. After all, what’s the likelihood that Silvio Berlusconi or Hu Jintao would let himself get schooled on the court? Context isn’t always that dependent.

2 Responses to “Obstreperous Minnesota”


  1. 1 Daniel Tunkelang 2009 January 25 at 10:11 pm

    Nice post, and thanks for the link. But I have mixed feelings in this debate.

    Shirky is right to point out the failure of traditional taxonomic categorization (which is what I believe he means by ontology)–in fact, his argument is a key part of the reason for using a faceted classification scheme rather than a single hierarchical taxonomy. I’m surprised he doesn’t talk about faceted classification, especially in the context of library catalogs.

    But he takes his argument too far. It may be hard to represent meaning, but that doesn’t mean there’s no value in doing so. And he overstates the value of del.icio.us, which, at least as far as I can tell, hasn’t gained much traction. I think collaborative tagging is a great idea, much like the collaborative editing of Wikipedia, but it only works if there is a mechanism for harmonize the multitude of individual efforts. A personal information management system to create personal tags is cute, but the killer app I want is one that crowd-sources tagging to help me explore the world’s information.

    And here I strongly agree with you–a lot of reality is objective and can be represented in a tagging system. In my experience, subjective tags are the exception, not the rule.

  2. 2 Joshua Young 2009 January 26 at 1:06 am

    Thanks for stopping by, Daniel. Let’s leave aside for the moment whether delicious has gained sufficient traction or reached sufficient size. I’m a decently serious user of delicious; it’s where I save my bookmarks, with mostly tidy summaries and decent attempts at tags that I return to again and again. But the broader view of delicious always strikes me as mostly noise. I don’t ever explore popular bookmarks, and I rarely explore other users’ bookmarks and only do so in cases where I have some context about them upfront. I don’t know precisely why it doesn’t seem to work, though inordinate amounts of synonymy and polysemy seem like serious (obvious and natural) culprits.

    I also really appreciate your referring to wikipedia’s kind of collaborative editing as supplying the harmonizing that delicious lacks. Corralling might be more like it. That’s certainly why delicious recommends tags, which is why I found the conclusion of “Understanding the Efficiency of Social Tagging Systems Using Information Theory” pretty surprising and unintuitive.


Leave a Reply




Josh Young's Facebook profile

What I’m thinking

What I'm saving.

RSS What I’m reading.

  • Diderot on Information Overload 2009 September 3
    Denis Diderot, "Encyclopédie" (1755) As long as the centuries continue to unfold, the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concea […]
    Stowe Boyd
  • OPENING ALERT: Macbar 2009 September 23
    Shared by joshyoung Why? Why the macaroni-shaped bowls?! With duck confit mac and cheese, things were looking so bright! And now I'm crestfallen, repelled by the kitsch. 54 Prince St., Soho Phone: 212-226-8877 Status: Open now. PR reps announce that the long delayed macaroni shaped sister to Soho scenster restaurant Delicatessen opens today on 54 Prince […]
    (author unknown)
  • Facebook release Tornado and it’s not based on Twisted? 2009 September 12
    Image: Jay Smith To their great credit, Facebook have just open-sourced more of their core software. This time it’s Tornado, an asynchronous web server written in Python. Surely that can only mean one thing: Tornado is based on Twisted. Right? Incredibly, no. Words fail me on this one. I’ve spent some hours today trying to put my thoughts into order so I cou […]
    terry
  • Calling All Librarians 2009 September 19
    I just received a copy of Inside Larry and Sergey's Brain by Richard Brandt. I'm generally pretty bad at reading this type of book and getting a review out in a timely manner, so this time I'm going to try my best to write a review of the first chapter.However, before I get that far, I've formed an impression based on the opening analogy. […]
    Matthew Hurst
  • Zakta – a new way to organize web knowledge 2009 September 14
    After the WhizBang!Labs implosion, I worked for Intelliseek (BuzzMetrics, Neilsen) where Sundar Kadayam was the CTO. Since leaving Nielsen, Sundar has been busy working on a new idea called Zakta. The site combines the functionality of web search, wiki and social features with the goal of simplifying the discovery, extraction and maintenance of knowledge dis […]
    Matthew Hurst
  • Tornado powering this blog 2009 September 15
    This blog is now running off of Tornado on App Engine. Tornado is an open source version of the web server and tools that power FriendFeed. I'm really excited that this code was open sourced, working with this server has been a pleasure at FriendFeed and I'm looking forward to seeing how developers will use it and contribute to it. I haven't p […]
    benjamin.golub
  • Workshop on Information in Networks (WIN) 2009 August 27
    For those of you interested in the study of networked data, I would like to bring your attention to the "Workshop on Information in Networks (WIN)", a workshop organized by my colleagues Sinan Aral, Foster Provost, and Arun Sundararajan. It will take place on September 25-26, 2009. From the description:The purpose of WIN is to bring together leadin […]
    Panos Ipeirotis
  • How Twitter works in theory 2009 August 15
    It is said that an economist is someone who sees something that works in practice and wonders whether it works in theory. Twitter clearly works in practice - and if you want practical advice, watch Laura Fitton's Tech talk at Google, or read her Twitter for Dummies. I've learned a lot from talking to her and others about this phenomenon, and I want […]
    Kevin Marks
  • FluidDB has launched! 2009 August 25
    In case you missed it, FluidDB has (finally) launched. I wont be blogging here about FluidDB or Fluidinfo, though will continue to post personal things and of course random bits of code that seem interesting (and small) enough to warrant mention. I have yet another Twisted snippet coming up, though I’m not sure when I’ll get there. We’re all exhausted and th […]
    terry
  • The Raging Debate Over The Link Economy 2009 August 16
    Arnon Mishkin wrote a post last Thursday on paidContent called “The Fallacy Of The Link Economy” that has been generating a lot of discussion, so I figured I’d join in the free-for-all. First, let me try to reduce each person’s argument to a direct quote that best sums up his position. Arnon Mishkin: The vast majority of the value gets captured by aggregator […]
    Daniel Tunkelang