Archive for January, 2009

What we talk about when we talk about community news

I hate to bicker with someone who’s obviously on the good guys’ team, but I also find that being clear, and only as clear as the facts allow, is an pretty full-on insatiable desire of mine.

And thus I submit that using a term like “intimacy” doesn’t work as a term to describe what geography-based community news sites need to work or what they need to aim to achieve. I further submit that this isn’t just an academic exercise. It matters because folks into the future of news have a tough enough time relating the importance of what we’re talking about. It’s easy for curmudgeons to brush off “intimacy” because it rings of hype to them; it fits neatly into their mindset of techno-utopians offering false promises.

I’m not nit-picking words here, either, or being uncharitable. I don’t take intimacy to mean something super deep or profound like love or friendship or piety. Following @lisawilliams, I take it to mean something like the what the members of an Elks lodge share. They all know one another. They all share a pretty serious mission. Leaving aside the standard language used by fraternal organizations—”inculcation” and “indoctrination”—members of the Elks share a significantly thicker bond than neighbors who vote in the same mayoral race, cheer for the same sports teams, suffer the same air pollution, enjoy the same parks, or whose kids go to the same school.

Members of a geography-based news community don’t need to know one another. They just need to believe that there’s a pretty good chance that they know someone who knows them both. Or they need to believe that there’s a pretty good chance that they might want to know one another. (These beliefs probably don’t even need to be justified or true, in the epistemic jargon.)

Members of a geography-based community need only trust and have some regard for one another. I suspect that this trust and regard require more than a thin explanation like liberal cosmopolitanism’s basic respect for persons. There’s got to be a neighborly connection—a sense of common civic purpose and a sense of shared space, resources, and destiny. But I do not believe that this trust and regard require a thick explanation as implied by a notion of “intimacy.”

Getting in metadata game: Oh, money, that’s why!

Who’s mentioned in your article? What organizations does it talk about? Or what zip codes?

Answering these simple questions—in ways notoriously inflexible computers understand—can be like putting handles on your articles. It means aggregators and filterers like EveryBlock can grab on and give readers one more way to find what you have to say.

That’s what the New York Times is doing—in two stages, it appears. First its librarians encode the elected officials mentioned in its articles; mentioning them in the regular text of the article doesn’t cut it. Then its newly built web service, called Represent, figures out the geographic locations those officials represent. Meanwhile, Represent is also taking a computerized look at Congressional votes. When a politician votes, Represent says something like, “Oh, a person just voted in geographic area Y, and that person’s name is X.”

EveryBlock isn’t built for understanding much about people or names, but it is built for understanding locations and geographic areas. So Represent’s job is to translate from X to Y—from names to places.

Which brings us at long last to the metadata game. The historical problem is the way you have to answer these questions has been interminably dull and technical. So the historical result has been one big shoulder shrug: “Why bother?”

Well, people like Adrian Holovaty are starting to envision on answer “We have a number of ideas for sustaining our project,” he writes, “like building a local advertising engine.” That kind of engine might share ad revenue with the newspapers whose articles it incorporates. In order to claim a share, each newspaper must diligently prepare its articles for EveryBlock: there much be location handles that EveryBlock can grab. It’s highly unclear how much money EveryBlock’s hyperlocal ad targeting could generate, but if it’s enough, it will provide the kind of incentive publishers need to make boring metadata worth their while. EveryBlock might just unlock the ‘R’ in ROI. That could very well be a great reason to bother.

Epilogue    It’s notable that grant monies have helped solve this chicken-and-egg problem. I may have personal issues with the Knight News Challenge—I didn’t win and didn’t receive feedback promised on multiple occasions—but EveryBlock is quite justifiably the darling of the news innovation set.

Age-Old Questions about BWBX

What’s BWBX? It’s Business Week’s new social network for users to discover and share business-related content. It resembles web services like socialmedian and twine.

As Paul Miller explains, “Members can access background material on stories, submit additional resources of their own, and comment on the content they find.” The central unit of organization is the “topic,” which both the BX staff and members of the community can create. Miller writes that he gets “the impression that topics tend to be approved” if they’re “in-scope” and “actively discussed out on the open Web.”

Given that these are the interwebs we’re talking about here, my mind immediately races to worries about spam. Does BWBX have controls to disincentivize and sideline spam? How do they work? Are they effective?

I’ve had these questions for a while now, but I’ve kept them to myself while observing BWBX’s initial growth. Today, I saw that Paul Miller, the widely respected Semantic Web evangelist, wrote a post praising the news platform. So I pinged him on twitter:

@PaulMiller Great write-up of #bxbw! Curious about how articles get assigned to topics. Users push articles to topics? Isn’t that spammy?

Then he forwarded the question:

@jny2cornell Thanks Joshua. :-) Yes, users assign articles to topics. COULD be spammy. Doesn’t seem to be. Comment, @bwbx @roncasalotti

The folks as BWBX tweeted that they answered the question in the comments on Miller’s post. I’ve excerpted the relevant parts of the comment:

We track several user actions on each item and use a weighted algorithm to score both users and the articles/blog posts. We monitor those scores to not only determine top users or most valuable items in a topic … but also to determine gaming within the system. We also crowd-source user activity via a full reporting system and back-office moderation team.

Now, I’m no expert on “back-office moderation,” but that answer left me scratching my head. So I pinged again:

@PaulMiller What do you make of @bwbx’s comment on your post? http://bit.ly/hTL1 I must admit, I’m having a difficult time parsing it.

Miller answered my question quite aptly, I think:

@jny2cornell seems clear… “back office magic keeps it clean”… ;-) You should try #BWBX, and see how the site performs to your own needs

Yes, it does seem clear—clear as mud. And that strikes me as a problem. If I’m thinking about joining BWBX, I’d like some assurance that all my effort poured into it isn’t going to go to waste as usage scales up and inevitable abuse creeps, or floods, in. I’d be worried, for instance, if I knew that the “back office moderation” is mostly human. Of course, I’d also obviously be worried if I knew that the automated processes were quite simply unfit for the job.

Peer-to-peer moderation doesn’t work magically. Take the quintessential case of wikipedia. It’s got a small and hierarchical army of editors. Perhaps more importantly, though, it’s perhaps the first human community in which vandalism is cheaper to clean up than it is to create. That ain’t trivial. It’s arguably not just important but an utterly critical disincentive against spam.

I wouldn’t have this level of concern were it not apparent that “push” logic drives BWBX. Consider a contrasting example: twitter works by “pull” logic and is therefore mercifully free of spam. I don’t worry about spammy content ending up wasting my attention because you can’t get content before me unless I invite it. And I can un-invite, or un-follow, very easy. This isn’t earth-shattering thinking here; it’s virtually as old as the internet—as old as spam itself.

So if we’re still getting it wrong, why? And if we’re getting it right, why can’t we be more transparent about it? We know how pagerank is the beating heart of google’s effort to out-engineer spam, and some argue that’s not even enough.

In fact, I encourage the folks at BWBX to give a close to read Daniel Tunkelang’s post, which asks, “Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?” What goes for search engines ought to go for back office magicians as well.

Obstreperous Minnesota

Every once in a while—and maybe more often than I’d like to admit—I re-read Clay Shirky. Today, I re-read “Ontology Is Overrated.”

And today, I’m ready to disagree with it around the margins.

On fortune telling. Yes, Shirky’s correct that we will sometimes mis-predict the future, as when we infer that some text about Dresden is also about East Germany and will be forever. But, no, that doesn’t have to be a very strong reason for us not to have some lightweight ontology that then inferred something about a city and its country. We can just change the ontology when the Berlin Wall falls. It’s much easier than re-shelving books, after all; it’s just rewriting a little OWL.

On mind reading. Yes, Shirky’s correct that we will lose some signal—or increase entropy—when we mistake the degree to which users agree and mistakenly collapse categories. And, yes, it might be generally true about the world that we tend to “underestimate the loss from erasing difference of expression” and “overestimate loss from the lack of a thesaurus.” But it doesn’t have to be that way, and for two reasons.

First, why can’t we just get our estimations tuned? I’d think that the presumption would be that we could at least give a go and, otherwise, that the burden of demonstrating that we just cannot for some really deep reason falls on Shirky.

Second, we don’t actually need to collapse categories; we just need to build web services that recognize synonymy—and don’t shove them down our users’ throats. I take it to be a fact about the world that there are a non-trivial number of people in the world for whom ‘film’ and ‘movies’ and ‘cinema’ are just about perfect synonyms. At the risk of revealing some pretty embarrassing philistinism, I offer that I’m one of them, and I want my web service to let me know that I might care about this thing called ‘cinema’ when I show an interest in ‘film’ or ‘movies.’ I agree with Shirky that we can do this based solely on the fact that “tag overlap is in the system” while “the tag semantics are in the users” only. But why not also make put the semantics in the machine? Ultimately, both are amenable to probabilistic logic.

Google showed it is the very best at serving us information when we know we care about something fuzzy and obscure—like “obstreperous minnesota.” I don’t think Shirky would dispute this, but it’s important to bear in mind that we also want our web services to serve us really well when we don’t know we care about something (see especially Daniel Tunkelang on HCIR (@dtunkelang)). That something might be fuzzy or specific, obscure or popular, subject to disagreement or perfectly unambiguous.

People and organizations tend to be unambiguous. No one says this fine fellow Clay Shirky (@cshirky) is actually Jay Rosen (@jayrosen_nyu). That would be such a strange statement that many people wouldn’t even understand it in order to declare it false. No one says the National Basketball Association means the National Football League them. Or if someone were to say that J.P. Morgan is the same company as Morgan Stanley, we could correct him and explain how they’re similar but not identical.

Some facts about people and organization can be unambiguous some of the time, too. Someone could argue that President Obama’s profession is sports, but we could correct her and explain how it’s actually politics, which maybe sometimes works metaphorically like sports. That doesn’t mean that Obama doesn’t like basketball or that no one will ever talk about him in the context of basketball. There may be more than a few contexts in which many people think it makes little sense to think of him as a politician, like when he’s playing a game of pick-up ball. But I think we can infer pretty well ex ante that it makes lots of sense to think of Obama as a politician when he’s giving a big televised speech, signing legislation, or meeting with foreign leaders. After all, what’s the likelihood that Silvio Berlusconi or Hu Jintao would let himself get schooled on the court? Context isn’t always that dependent.

Freemium News

I came across two great examples of freemium news. One was a reminder, and the other felt familiar but was a bolt from the blue.

First, the one. Blodget really does an admirable job digging into the fundamental economics of why the WSJ’s porous paywall. (Cf. this naive version at CJR.)

Second, the other. Mitch Ratcliffe drills deep into the economics of news on both the supply and demand sides of the equation. The supply side—what reporters need to report—is interesting. It asks, “How much money do journalists need to give scarce journalistic value to readers?”

But for my money, I like thinking about the demand side of the equation. Here the relevant (and symmetrical) question is, “How much scarce journalistic value do readers need to give money to journalists?”

What Ratcliffe and Blodget’s answers have in common is, essentially, price discrimination and luxury. In other words, make it easier or make it better (as in more value-added).

The WSJ’s habit of forcing me to jump through hoops to read its full articles is price discrimination at its heart. I have to pay with my time (instead of money) by copying the paywalled article’s headline and pasting it into a google search (generally adding “google news” as well) and then clicking back to wsj.com. Then I’m behind the paywall, and not a drop of google juice is spilt.

Ratcliffe proposes “added convenience or increased interaction” in the form of twitter access to the reporter, more timely alerts, or a “social page of your own” for giving feedback to the journalist. “It doesn’t need any new tech — all the pieces are there,” he tweeted (@godsdog). “Yes, integration is hard, but it’s good not to have to invent.”

These are great good thoughts—focused sharply on the economics of news, not BS about who’s a reporter and who’s not or what’s legitimately Web 2.0 and what’s not.

This is the future of news. This is networked news. Above all, this is the power of the interwebs: connecting unique buyers and sellers of information as individuals with diverse interests. Expect more soon.

What would a post-print Times look like?

My reaction, whenever I read stuff like this great piece from Michael Hirschorn, is frustratingly simple.

It’s about trading analog dollars for digital pennies. Or, to put it another way, even if we cut out all the overhead of paper and presses and delivery trucks, we can’t pay our existing writers and editors with only our revenue from online ads.

So what’s my reaction? Up your revenue from online ads.

Maybe my reaction’s not that helpful, or maybe it’s a needed slap in the face. A wake-up call.

Newspapers need to be way more imaginative than starting with the assumption that making “a Web-based strategy profitable” must involve the fearsome numbers we see today. If you don’t like today’s numbers, change them. Newspapers need to think about how they can quadruple their online ad revenue per reader.

I believe much of the answer lies in smarter advertising: make it fit the content contextually and make it fit the reader personally. These aren’t new ideas at all. They’re just important to bear in mind because, as worthwhile as Hirschorn’s piece is, it focuses its energy on the editorial side of the operation.

Which isn’t surprising at all, and that’s the point. Even in the best, most insightful posts on the future of news, writers who cut their teeth in a newspapering world in which editorial and business sat on different floors of the office still seem to forget that they really can try to reach into the business model and rejigger the numbers if they really want to.

If you can trade your analog dollars for digital dimes, after all, things don’t look so grim.

LATE UPDATE: Bringing thinking that’s a couple notches smarter, Felix Salmon begs to differ pretty seriously with Hirschorn on NYT.


Josh Young's Facebook profile

What I’m thinking

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.