Archive for the 'spam' Category

Curating the News Two Ways

There are two relatively new efforts to curate the best links from twitter. They’re both very simple tools, and their simplicity is powerful.

As with any good filter of information, there’s a simple, socially networked mechanism at play, and analyzing how that mechanism works helps us predict whether a tool will thrive. The name of the game is whether a mechanism fits social dynamics and harnesses self-interest but also protects against too much of it. (This kind of analysis has a storied history, btw.)

First came 40 twits, Dave Winer’s creation, with instances for himself, Jay Rosen, Nieman Lab, and others. It’s powered by clicks—but not just any clicks on any links. First Dave or Jay picks which links to tweet, and then you and I and everyone picks which links to click. There are two important layers there.

Like the others, Dave’s instance of 40 twits ranks his forty most recent tweets by the number of clicks on the links those tweets contain. (Along the way, retweets that keep the original short URL presumably count.) The result is a simple list of tweets with links. But If you’re reading Dave’s links, you know Dave likes the links by the simple fact that he tweeted them. So the real value added comes from how much you trust the folks who are following Dave to choose what’s interesting.

Note well, though, that those self-selected folks click before they read the thing to which the link points. They make some judgment based on the tweet’s snippet of text accompanying the links, but they may have been terribly, horribly disappointed by the results. Of course, this presumably doesn’t happen too too much since folks would just unfollow Dave in the longer term. In equilibrium, then, a click on a link roughly expresses both an interest generated by the snippet of text and a judgment about the long-term average quality of the pages to which Dave’s or Jay’s links point. Dave adds the data (the links), and his followers add the metadata (clicks reveal popularity and trust).

Are there features Dave could add? Or that anyone could add, once Dave releases the source? Sure there are. For one, it doesn’t have to be the case that all clicks are created equal. I’d like to know which of those clicks are from people I follow, for instance. I might also like to know which of those clicks are from people Dave follows or from people Jay follows. Their votes could count twice as much, for instance. This isn’t a democracy, after all; it’s a webapp.

But think a bit more abstractly. What we’re really saying is that someone’s position in the social graph—maybe relative to mine or yours or Dave’s—could weight their click. Maybe that weighting comes from tunkrank. Or maybe that weighting comes from something like it. For instance, if tunkrank indicates the chance that a random person will see a tweet, then I might be interested in the chance that some particular person will see a tweet. Maybe everyone could have a score based on the chance that their tweet will find its way to Dave or to me.

Second came the Hourly Press, with an instance Lyn Headley calls “News about News.” It’s powered not by clicks—but by tweets. And, again, not just any tweets. Headley picked a set of six twitter users, called “editors,” including C.W. Anderson, Jay Rosen, and others. And those six follow very many “sources,” including one another. There are two important layers there, though they overlap in that “editors” are also “sources.”

“News about News,” a filter after my own heart, looks back twelve hours and ranks links both by how many times they appear in the tweets posted by a source and also by the “authority” of each source. Sources gain authority by having more editors follow them. “If three editors follow a source,” the site reads, “that source has an authority of 3” rather than just 1. So, in total, a link “receives a score equal to the number of sources it was cited by multiplied by their average authority.” Note that what this does, in effect, is rank links by how many times they appear before the eyes of an editor, assuming all editors are always on twitter.

The result is a page of headlines and snippets, each flanked by a score and other statistics, like how many total sources tweeted the link and who was first to do so. If you’re already following the editors, as I am, you know the links they like by the simple fact that they tweeted them. But no editor need have tweeted any of the links for the to show up on the Hourly Press. Their role is to just to look at the links—to spend their scarce time and energy following the best sources and unfollowing the rest. There are incredible stores of value locked up in twitter’s asymmetrical social graph, and the Hourly Press very elegantly taps them.

Note well, though, that editors choose to follow sources before those sources post the tweets on the Hourly Press. Editors may be terribly, horribly disappointed by the link that any given tweet contains. But again, this presumably doesn’t happen too too much since those editors would unfollow the offending sources. In equilibrium, then, a tweet by a source roughly expresses the source’s own interest and the editor’s judgment about the long-term average quality of the pages to which the source’s links point. Sources add the data (the links), and editors add the metadata (attention reveals popularity and trust).

There’s so much room for the Hourly Press to grow. Users could choose arbitrary editors and create pages of all kinds. There’s a tech page just waiting to happen, for instance. Robert Scoble, Marshall Kirkpatrick, and others would flip their lids to see themselves as editors—headliners passively curating wave after hourly wave of tweets.

But again, I think there’s a more abstract and useful way to think about this. Why only one level of sources? Why not count the sources of sources? Those further-out, or second-level, contributing sources might have considerably diminished “authority” relative to the first-level sources. But not everyone can be on twitter all the time. I’m not always around to retweet great links to my followers, the editors, and giving some small measure of authority to the folks I follow (reflecting the average chance of retweet, e.g.) makes some sense.

But also, editors themselves could be more or less relatively important, so we could weight them differently, proportionally to the curatorial powers we take them to have. And those editors follow different numbers of sources. It means one thing when one user of twitter follows only fifty others, and it means something else altogether when another user follows five hundred. The first user is, on average, investing greater attention into each user followed, while the second is investing less. Again, this is the attention economics that twitter captures so elegantly and richly.

But it’s important to circle back to an important observation. In both apps, there are two necessary groups. One is small, and one is large. One adds data, and the other adds metadata. The job of the builder of these apps is to arrive at a good filter of information—powered by a simple, socially networked mechanism. That power must come from some place, from some fact or other phenomenon. The trick, then, is choosing wisely. Social mechanisms that work locally often fail miserably globally, once there’s ample incentive to game the system, spam its users, or troll its community.

But not all filters need to work at massive scale either. Some are meant to personal. 40 twits strikes me as fitting this mold. I love checking out Dave’s and Jay’s pages, making sure I didn’t miss anything, but if I thought tens of thousands of others were also doing the same, I might feel tempted to click a few extra times on links I want to promote. I don’t think a 40 twits app will work for a page with serious traffic. And, ultimately, that’s because it gets its metadata from the wrong source: clicks that anyone can contribute. If the clicks were some limited to coming from only a trusted group, or if the clicks weren’t clicks at all but attention, then maybe 40 twits could scale sky-high.

Hourly Press—which I don’t think is terribly well suited to being called a “newspaper,” because the moniker obscures more than it adds—doesn’t face this limitation. The fact that Hourly Press is powered by attention, which is inherently scarce, unlike clicks, is terribly powerful, just as the fact that twitter is powered by attention is terribly powerful. Write large, both are incredibly wise, and they contain extraordinarily important lessons in mechanism design of social filters of information.

Taking Twitter Seriously: What if it really were a really big deal?

Maybe @davewiner does wring his hands too violently about twitter’s recommended users. Maybe it is too early to worry about unintended consequences.

But maybe not. Either way, if we take a slightly different view of his worries, I think we can take them to heart much more easily. If we can shift tenses, it might help.

While @davewiner talks about twitter, he may be talking about it now in the present tense. Let’s try another: a kind of conditional tense. Let’s try a counterfactual conditional: Would this thing work if it were the case that…?

After all, to detect a problem in any system, we’ve got to imagine that system working at full scale. Whether it’s a database, a message board, or a social network like twitter, we’ve got to imagine its ideal—when everyone’s using it for any purpose that’s difficult to police cheaply.

When @davewiner worries about twitter’s editorial adventures, as he does here and here in conversation with @jayrosen_nyu, he’s taking it extraordinarily seriously. It’s a great compliment, I think. He’s sees a twitter that’s currently critical to very many people. That’s the present tense.

OK, so some of us don’t yet share that view. But I bet we can offer our own great compliment and imagine very many people using it—or maybe even virtually everyone using it. At the end of every day, I think many of us have less and less trouble imagining that.

So, if virtually everyone were using twitter—if it really were the “Future News System of the World,” again, as difficult as that might be to imagine—we might really insist that it refrain from the editorial business. If twitter really were that big, then it really would be critical. And if it really were critical, its closed nature would probably violate all kinds of praise-Murphy rules about leaving our data, our businesses, and our lives in the hands of a for-profit company, its secret business plan, and its fallible servers.

We’re not casting aspersions at what most everyone regards as an essentially fair and just company. Of course, that goes for @me too; I love twitter.

This is simply why we have the notion of a “common carriage.” For centuries, we’ve demanded ultra-reliable commodity transportation services. We’ve been so insistent on the reliability and the even-handedness of transportation that we’ve often saddled the carrier with the de facto burden of liability for losses, which raises its price to us. This is why we care about network neutrality.

If we really take twitter seriously, then we think it’s possible that twitter could be the next big deal. The trouble is that—at scale—big deals attract all manner of mischief—with potentially everyone using them for all things selfish and spammy.

If twitter could be the next big deal, we need to start thinking about safeguarding it now.

PS. That’s what tunkrank, which was conceived by @dtunkelang, is for.

In the news, is context possible?

As I’ve claimed before, a sure-fire way to think about the future of news is to think about the fundamentals. I discussed at some length how a more trusting relationship between creators and users can unlock serious value. (Thanks for the promotion, @jayrosen_nyu!)

Matt Thompson writes brilliantly about news and understanding. (See here too.) When we read the news, are we looking for understanding broader than the set of facts and overlaid analysis contained in a traditional article?

My sense is that the answer is something like, “Of course we are!”

“A focus on delivering context means that the news is never the endpoint,” he writes. “The trail of a story doesn’t end with the passage of a bill or the resignation of an official. It doesn’t end at all. It merely connects with more and more dots that form an ever-clearer picture of a better society.”

To some extent, this kind of context can be naively delivered through topic pages, which seem to me to be little more than a cluster of dots only faintly connected when what we’re looking for is a colorful picture.

One alternative is a kind of broad summary of the issue, place, person, company, etc. I’m still not certain whether newspapers are the best economic, social, and cultural structures in which to locate that job. Of course, the obvious point of comparison or target is the Great, Awesome, and Meritorious wikipedia. No sitting duck, that.

For one, there may be severe duplication of effort for any topic that’s not local. There’s only one wikipedia.

For two, it’s not clear that paying writers (per unit of time or per unit of understanding, as knol would do) will generate better results than not paying—for worries around gaming, spam, and other potentially perverse incentives we can’t even predict. That’s one catch about “understanding” as an end: it’s potentially so high-minded (in a good way) that the market, even buttressed by genuinely high-minded journalistic ideas, may not provide a great solution. That’s not to say that we humans are only competitive, adversarial, and conniving when the coin of the realm is in fact money. We also strive for status and influence, but they don’t seem to conflict with veritas as blatantly.

A virtue of one-off articles is that they’re relatively easy to verify. You can check the facts, for instance. And, as an editor or reader, your worry that the author is being appropriately focused or wide-rangingly ambitious is easier to allay because it’s easier to compare versus a necessarily narrower slice of reality. Likewise for your worry that the author is bring appropriately stingy with or indulgent of sources in pursuit of balance.

I’m not pretending to offer any solutions here. I just want to point out that there’s a reason journalism’s basic unit of information started as the article. Time-discrete units of information usually created by single authors are radically simpler things than infinitely relevant units of understanding created by teams.

Age-Old Questions about BWBX

What’s BWBX? It’s Business Week’s new social network for users to discover and share business-related content. It resembles web services like socialmedian and twine.

As Paul Miller explains, “Members can access background material on stories, submit additional resources of their own, and comment on the content they find.” The central unit of organization is the “topic,” which both the BX staff and members of the community can create. Miller writes that he gets “the impression that topics tend to be approved” if they’re “in-scope” and “actively discussed out on the open Web.”

Given that these are the interwebs we’re talking about here, my mind immediately races to worries about spam. Does BWBX have controls to disincentivize and sideline spam? How do they work? Are they effective?

I’ve had these questions for a while now, but I’ve kept them to myself while observing BWBX’s initial growth. Today, I saw that Paul Miller, the widely respected Semantic Web evangelist, wrote a post praising the news platform. So I pinged him on twitter:

@PaulMiller Great write-up of #bxbw! Curious about how articles get assigned to topics. Users push articles to topics? Isn’t that spammy?

Then he forwarded the question:

@jny2cornell Thanks Joshua. :-) Yes, users assign articles to topics. COULD be spammy. Doesn’t seem to be. Comment, @bwbx @roncasalotti

The folks as BWBX tweeted that they answered the question in the comments on Miller’s post. I’ve excerpted the relevant parts of the comment:

We track several user actions on each item and use a weighted algorithm to score both users and the articles/blog posts. We monitor those scores to not only determine top users or most valuable items in a topic … but also to determine gaming within the system. We also crowd-source user activity via a full reporting system and back-office moderation team.

Now, I’m no expert on “back-office moderation,” but that answer left me scratching my head. So I pinged again:

@PaulMiller What do you make of @bwbx’s comment on your post? I must admit, I’m having a difficult time parsing it.

Miller answered my question quite aptly, I think:

@jny2cornell seems clear… “back office magic keeps it clean”… ;-) You should try #BWBX, and see how the site performs to your own needs

Yes, it does seem clear—clear as mud. And that strikes me as a problem. If I’m thinking about joining BWBX, I’d like some assurance that all my effort poured into it isn’t going to go to waste as usage scales up and inevitable abuse creeps, or floods, in. I’d be worried, for instance, if I knew that the “back office moderation” is mostly human. Of course, I’d also obviously be worried if I knew that the automated processes were quite simply unfit for the job.

Peer-to-peer moderation doesn’t work magically. Take the quintessential case of wikipedia. It’s got a small and hierarchical army of editors. Perhaps more importantly, though, it’s perhaps the first human community in which vandalism is cheaper to clean up than it is to create. That ain’t trivial. It’s arguably not just important but an utterly critical disincentive against spam.

I wouldn’t have this level of concern were it not apparent that “push” logic drives BWBX. Consider a contrasting example: twitter works by “pull” logic and is therefore mercifully free of spam. I don’t worry about spammy content ending up wasting my attention because you can’t get content before me unless I invite it. And I can un-invite, or un-follow, very easy. This isn’t earth-shattering thinking here; it’s virtually as old as the internet—as old as spam itself.

So if we’re still getting it wrong, why? And if we’re getting it right, why can’t we be more transparent about it? We know how pagerank is the beating heart of google’s effort to out-engineer spam, and some argue that’s not even enough.

In fact, I encourage the folks at BWBX to give a close to read Daniel Tunkelang’s post, which asks, “Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?” What goes for search engines ought to go for back office magicians as well.

Josh Young's Facebook profile

What I’m thinking

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.