Curating the News Two Ways

There are two relatively new efforts to curate the best links from twitter. They’re both very simple tools, and their simplicity is powerful.

As with any good filter of information, there’s a simple, socially networked mechanism at play, and analyzing how that mechanism works helps us predict whether a tool will thrive. The name of the game is whether a mechanism fits social dynamics and harnesses self-interest but also protects against too much of it. (This kind of analysis has a storied history, btw.)

First came 40 twits, Dave Winer’s creation, with instances for himself, Jay Rosen, Nieman Lab, and others. It’s powered by clicks—but not just any clicks on any links. First Dave or Jay picks which links to tweet, and then you and I and everyone picks which links to click. There are two important layers there.

Like the others, Dave’s instance of 40 twits ranks his forty most recent tweets by the number of clicks on the links those tweets contain. (Along the way, retweets that keep the original short URL presumably count.) The result is a simple list of tweets with links. But If you’re reading Dave’s links, you know Dave likes the links by the simple fact that he tweeted them. So the real value added comes from how much you trust the folks who are following Dave to choose what’s interesting.

Note well, though, that those self-selected folks click before they read the thing to which the link points. They make some judgment based on the tweet’s snippet of text accompanying the links, but they may have been terribly, horribly disappointed by the results. Of course, this presumably doesn’t happen too too much since folks would just unfollow Dave in the longer term. In equilibrium, then, a click on a link roughly expresses both an interest generated by the snippet of text and a judgment about the long-term average quality of the pages to which Dave’s or Jay’s links point. Dave adds the data (the links), and his followers add the metadata (clicks reveal popularity and trust).

Are there features Dave could add? Or that anyone could add, once Dave releases the source? Sure there are. For one, it doesn’t have to be the case that all clicks are created equal. I’d like to know which of those clicks are from people I follow, for instance. I might also like to know which of those clicks are from people Dave follows or from people Jay follows. Their votes could count twice as much, for instance. This isn’t a democracy, after all; it’s a webapp.

But think a bit more abstractly. What we’re really saying is that someone’s position in the social graph—maybe relative to mine or yours or Dave’s—could weight their click. Maybe that weighting comes from tunkrank. Or maybe that weighting comes from something like it. For instance, if tunkrank indicates the chance that a random person will see a tweet, then I might be interested in the chance that some particular person will see a tweet. Maybe everyone could have a score based on the chance that their tweet will find its way to Dave or to me.

Second came the Hourly Press, with an instance Lyn Headley calls “News about News.” It’s powered not by clicks—but by tweets. And, again, not just any tweets. Headley picked a set of six twitter users, called “editors,” including C.W. Anderson, Jay Rosen, and others. And those six follow very many “sources,” including one another. There are two important layers there, though they overlap in that “editors” are also “sources.”

“News about News,” a filter after my own heart, looks back twelve hours and ranks links both by how many times they appear in the tweets posted by a source and also by the “authority” of each source. Sources gain authority by having more editors follow them. “If three editors follow a source,” the site reads, “that source has an authority of 3” rather than just 1. So, in total, a link “receives a score equal to the number of sources it was cited by multiplied by their average authority.” Note that what this does, in effect, is rank links by how many times they appear before the eyes of an editor, assuming all editors are always on twitter.

The result is a page of headlines and snippets, each flanked by a score and other statistics, like how many total sources tweeted the link and who was first to do so. If you’re already following the editors, as I am, you know the links they like by the simple fact that they tweeted them. But no editor need have tweeted any of the links for the to show up on the Hourly Press. Their role is to just to look at the links—to spend their scarce time and energy following the best sources and unfollowing the rest. There are incredible stores of value locked up in twitter’s asymmetrical social graph, and the Hourly Press very elegantly taps them.

Note well, though, that editors choose to follow sources before those sources post the tweets on the Hourly Press. Editors may be terribly, horribly disappointed by the link that any given tweet contains. But again, this presumably doesn’t happen too too much since those editors would unfollow the offending sources. In equilibrium, then, a tweet by a source roughly expresses the source’s own interest and the editor’s judgment about the long-term average quality of the pages to which the source’s links point. Sources add the data (the links), and editors add the metadata (attention reveals popularity and trust).

There’s so much room for the Hourly Press to grow. Users could choose arbitrary editors and create pages of all kinds. There’s a tech page just waiting to happen, for instance. Robert Scoble, Marshall Kirkpatrick, and others would flip their lids to see themselves as editors—headliners passively curating wave after hourly wave of tweets.

But again, I think there’s a more abstract and useful way to think about this. Why only one level of sources? Why not count the sources of sources? Those further-out, or second-level, contributing sources might have considerably diminished “authority” relative to the first-level sources. But not everyone can be on twitter all the time. I’m not always around to retweet great links to my followers, the editors, and giving some small measure of authority to the folks I follow (reflecting the average chance of retweet, e.g.) makes some sense.

But also, editors themselves could be more or less relatively important, so we could weight them differently, proportionally to the curatorial powers we take them to have. And those editors follow different numbers of sources. It means one thing when one user of twitter follows only fifty others, and it means something else altogether when another user follows five hundred. The first user is, on average, investing greater attention into each user followed, while the second is investing less. Again, this is the attention economics that twitter captures so elegantly and richly.

But it’s important to circle back to an important observation. In both apps, there are two necessary groups. One is small, and one is large. One adds data, and the other adds metadata. The job of the builder of these apps is to arrive at a good filter of information—powered by a simple, socially networked mechanism. That power must come from some place, from some fact or other phenomenon. The trick, then, is choosing wisely. Social mechanisms that work locally often fail miserably globally, once there’s ample incentive to game the system, spam its users, or troll its community.

But not all filters need to work at massive scale either. Some are meant to personal. 40 twits strikes me as fitting this mold. I love checking out Dave’s and Jay’s pages, making sure I didn’t miss anything, but if I thought tens of thousands of others were also doing the same, I might feel tempted to click a few extra times on links I want to promote. I don’t think a 40 twits app will work for a page with serious traffic. And, ultimately, that’s because it gets its metadata from the wrong source: clicks that anyone can contribute. If the clicks were some limited to coming from only a trusted group, or if the clicks weren’t clicks at all but attention, then maybe 40 twits could scale sky-high.

Hourly Press—which I don’t think is terribly well suited to being called a “newspaper,” because the moniker obscures more than it adds—doesn’t face this limitation. The fact that Hourly Press is powered by attention, which is inherently scarce, unlike clicks, is terribly powerful, just as the fact that twitter is powered by attention is terribly powerful. Write large, both are incredibly wise, and they contain extraordinarily important lessons in mechanism design of social filters of information.


17 Responses to “Curating the News Two Ways”

  1. 1 Zach Seward 2009 September 4 at 8:34 pm

    I’m as smitten by these apps as you are, and I think you’re right to identify the small group/large group dynamic. To me, 40 Twits and The Hourly Press work because their data is mostly or entirely unrelated to the app. They’re relying on ambient data (a.k.a. organic activity or social-networking byproducts).

    The contrast is to most social-bookmarking sites, where voting up a story is a direct act, susceptible to the Hawthorne effect (and, by extension, gaming). Digg, for instance, is supposed to measure popularity, but all it could ever measure is diggs.

    By taking a step back from direct action, The Hourly Press is able to capture something better. Attention is a fair way of putting it. Whatever the name, it’s not so much a metric as a measure.

    40 Twits, meanwhile, is valuable in two other ways. To the individual user, of course, it’s a simple method of tracking the popularity of your tweets. That’s how I use it for @NiemanLab. But on a conceptual level, I also just like how 40 Twits demonstrates that much as Twitter works best in a stream, it can also be shuffled around and presented in a way that mixes chronology with other factors. Ditto the Hourly Press.

    • 2 Josh Young 2009 September 4 at 9:01 pm

      I worry that you’re exaggerating the different between digg and 40twits. Let me put it this way, *if* 40twits *were* very serious, it would likely be gamed much the way digg is. Those perverse incentives are poised to take over 40twits; they’re only waiting till it makes sense. Which is to say nothing at all about its usefulness at a smaller scale. If no one but you ever cared about the popularity of @NiemanLab’s tweets, then 40twits would be completely resistant to gaming. I love how “why bother?” so dramatically obviously makes things work.

      • 3 Zach Seward 2009 September 4 at 9:15 pm

        No, you’re right. 40 Twits is totally susceptible to gaming at scale. In contrasting with Digg, I was thinking more of The Hourly Press (where this post has shot up to #4 in the five o’clock edition on account of some high-authority tweets, heh).

        • 4 Zach Seward 2009 September 4 at 9:43 pm

          But I guess I’m not really concerned about gaming (since, as you say, the why-bother factor is strong here) so much as understanding what’s being measured. Diggs for the sake popularity seem meaningless compared to clicks for the sake of clicks (40 Twits) and links for the sake of links (Hourly Press). They’re strength isn’t that they can’t be gamed but that the game is something distinct from the result.

    • 5 Lyn Headley 2009 September 4 at 10:33 pm

      I’m interested in the ambient side of things too, and the issue of gaming.

      One thing about getting popular using an aggregator like this is that once the eyeballs are focused on your stories, the gaming question comes to the fore. Perhaps the ambient side of things is an initial stage in this question, and the gaming side is the second stage. Is there a way to design an algorithm that remains ambient even after it has become popular? Would we even want to? Or is the ambient status largely an artifact of the lack of popularity? The ambient data is somehow pure because it reflects “natural” behavior rather than “intentional”. What google, e.g. has done is keep their algorithm secret, thus prohibiting intentionality. But what if everybody knew the algorithm and it was also popular? Then you’d have a kind of intentionality, but would it necessarily be a bad thing? One of my questions for the Hourly Press is what this might look like. I kind of want to call it “consciousness.”

      • 6 Josh Young 2009 September 5 at 2:36 am

        I’m sympathetic with a term like “ambient,” but I’m not sure it gets the issue quite right. I don’t think it’s possible to design a system that remains ambient even when popular.

        The question, rather, is one of apportioning costs correctly. There must be downside to gaming, and it’s best if that downside is baked into the system.

        With google, for instance, the downside to blackhat SEO must be punished externally. That is, google is the judge outside the system looking in and dinging your pagerank if you’re unscrupulous. It works well enough for google, I suppose, but there are serious problems as well, and they only start with the fact that it is enormously expensive for google.

        The powerful thing about twitter, though, is that the downside to gaming is internal. Take me. Yes, I tweeted a link to this post three times, and with all six editors following me, I was able to do most of the work necessary to get this thing up to slot number four in a hurry. (I wanted to see whether the Hourly Press would count multiple tweets.) But I may also have irked some of the editors who follow me. Over time, if I were to tweet each of my posts a half dozen times over the course of an hour, an editor would decide that I’m forcing myself on too great a share of his attention and unfollow me–and for reasons totally unrelated to the Hourly Press. This if, of course, what we mean when we contrast “diggs for the sake of popularity” with “links for the sake of links” and “natural behavior” with “intentional behavior.”

        But, again, these two conceptions don’t quite hit the mark. My three links today were for the sake of the Hourly Press, not twitter (or let’s suppose so, anyhow). My links were intentional, in Lyn’s sense. What makes the Hourly Press work nevertheless, even as it scales, is that I bear of the burden of risking the unfollow by tweeting too much or too hard.

        I’m going to keep this turning in my head. I suspect the best way to codify the dynamics here are with terms like “data” verus “metadata,” “internal” versus “external,” “scarce” versus “abundant,” and “attention” versus “action.”

        BTW, google’s algorithm is far from a total secret, and what we do know about it has led to gaming. See the google bomb, e.g.

        PS. You two should really be reading The Noisy Channel by Daniel Tunkelang:

  2. 7 spf73 2009 September 5 at 12:56 am

    The Hourly is designed to be resilient to gaming. A story gets to the top when many people with high authority talk about it. The authority, in turn, is conveyed by the follow choices made by the editors. And the editors are chosen by the publisher (a role we have not yet introduced – in the case of News about News, was played by Lyn).

    Each layer represents a delegation of trust. If you break that trust, you can get “fired” (unfollowed).

    So take the scenario of a clear spammer. First of all, he’d never get followed in the first place. Even if he did, as soon as he started spamming he’d get fired.

    Now, take the more subtle scenario of self-promotion. Let’s use the present article as an example: Josh himself tweeted a link to this story 3 times (btw, that only counts once) and we ourselves retweeted it and so did Dave Winer. I did it twice (once as payyattention and once as spf2). All of us had some element of self-promotion in mind when we did that. We also picked up a hand full of others, one of which is helping us out on architecture (Elias Torres). And we got to slot #3 in this community. I’m pleased with this result: there was enough basis in the community to get this story on the top 10, and I think it deserves that slot by-and-large. (Do you agree?)

    Say we wanted to game it to get more attention… how would we do that? I supposed I could dm people and say “please retweet this!” IE, I could do more work to get it in front of people who could then make their own decision about whether it was interesting or not. It would probably be somewhat effective, but would be constrained by their own judgment, and everyone’s reputation.

    Josh, I’ve been giving a lot of thought to looking deeper in the graph since we got started, even doing page/tunkrank on some discovered subgraph near the “editors”. One argument against that approach is that, by reducing accountability and increasing scale, it would introduce new opportunities for gaming.

    And Lyn, Google’s algorithm is not secret, and its susceptible to gaming is a whole industry

    • 8 Josh Young 2009 September 5 at 8:03 pm

      Sorry, Stephen, your thoughtful comment got caught up in my spam filter.

      So, two questions. (1) Why not count my retweets? Or why not count them partially? Like I explained in my post or in the comments, I bear the risk of getting “fired,” as you put it. Since I internalize the cost of each tweet, I should be able to benefit from them as well, without too many worries about gaming.

      Here’s what might happen. People would see that no one unfollowed me because I tweeted my post three times. Maybe that’s because my other tweets are valuable enough that people are willing to put up with occasional shameless self-promotion, or maybe that’s because people actually enjoyed the three different ways of summing up the post. Also, many people, some editors included, never had any intention of clicking on any of the three links and presumed that they all pointed to something different. In all likelihood, some combination of these and other reasons kept people from unfollowing me.

      So then others looking to shoot up on the Hourly’s board would copy me. Everyone would tweet their own post three times or more. Maybe someone would try five tweets, or maybe eight. But at some point, it really would get obnoxious, and some editor would not appreciate having his attention abused. He might DM the self-promoter a warning to cool it or be unfollowed, or he might just unfollow. People would push the limit and see what makes sense.

      But it’s not at all the case that everyone would have the same limit. As a source, Jay could probably tweet something ten times and not be unfollowed by any of the other editors. I’m not saying that’s good or bad; I’m just saying it’s probably true. My point, then, is that there’s real value in allowing the Hourly to reflect or express the fact that editors really get a lot of value out of following him.

      So, sure, there’d be some inflation. It would be harder to get on the board–which is to say the lowest slot would have more points versus how it is now, all else equal. Moreover, I’m not at all denying that some other gaming methods might crop up out of the blue. I can’t think of any now, but I would never rule them out.

      Next question. (2) What kind of gaming are you worried about with respect to taking more of the graph into consideration?

      The one thing I can see is that maybe a source would create a ton of dummy accounts (call them “sub-sources”) that he would follow and then have all them tweet the link. There’d be relatively little downside for that source because only he would have his page deluged. No one else would see the spam.

      But note that tunkrank-like logic would mitigate the effectiveness of a tactic like this, because the marginal sub-source would inherent less rank as the attention the source has to offer it shrinks. I like the idea of taking into consideration the attention editors have to offer sources, btw, although I also like the idea of publishers being able to assign arbitrary “authority” scores to editors. Those seem at cross-purposes, alas, and I don’t have a great sense of where I shake out.

    • 9 Lyn Headley 2009 September 5 at 8:33 pm

      From Wikipedia:

      “Google is known to penalize link farms and other schemes designed to artificially inflate PageRank. In December 2007 Google started actively penalizing sites selling paid text links. How Google identifies link farms and other PageRank manipulation tools are among Google’s trade secrets.”

      I take it that these practices have had some impact on the algorithm google uses to rank search results, making it a secret algorithm.

  3. 10 Lyn Headley 2009 September 5 at 1:34 am

    You mentioned that 40 twits probably counts retweets to the original short link. What are the implications of this in terms of what is being measured? Perhaps in addition to a measure of long-term trust in Dave or Jay, it’s a measure of long-term trust in Dave or Jay + long-term trust in those retweeters of Dave or Jay who use the same short links. I take it these are different measures, since there could be a highly authoritative retweeter whose audience clicks based on their trust in that retweeter rather than Dave or Jay.

  4. 11 Daniel Tunkelang 2009 September 5 at 8:44 pm

    40 Twits strikes me as being a glorified way of following @davewiner, even if it is further filtered by his readers. That’s cute, but to me it smacks of cult of personality. If I could pick the seeds (e.g., the people whose blogs I read or whom I follow on Twitter), then it would be interesting.

    The Hourly Press is potentially more interesting. First, I love the phrase “authoritative social filter”–I wish I’d thought of it first! Second, I agree with spf73 that it seems resistant to gaming–I certainly can’t make an editor follow me. I’m not thrilled with the arbitrary choice of roots in the trust tree-I’d rather pick those roots myself–which is to say I’d like to be the root for my own tree. But I’m not clear on how it propagates authority beyond the editors.

    In summary, my first concern with both of these approaches is that they hard-code the authority roots. But that seems easily fixed. The second concern is about how they model propagation through the social network–I think that’s where this space gets much more interesting.

  5. 12 spf73 2009 September 6 at 12:11 am

    Daniel – the choice is not arbitrary, but the result of deliberate consideration based on Lyn’s experience with this community. We’ll be starting to open up the ability to create these filters in the near future. The goal is to enable those who are knowledgeable about other communities (tech, photography, lolcats, whatever) to exercise their judgement similarly. This is part of why we call it a “newspaper” (a much-debated word choice around here, but I’ll stick with it for a moment), because it is tailored to a topic or point of view and it is a shared entity. You see the value of that in these comments – we can all talk about #3 on NaN and know what each other is talking about. We’re taking a retro slant: a bit of rejection of hyper-personalization, and a relatively slow news cycle (hourly).

    Having said that, we’re also considering the opposite choices. For example, you could make a “newspaper” where you’re the only editor. This would be news in your universe of interests. It might be miscellaneous and only make sense to you, or it might be something worth sharing.

    An advantage of our current approach is that we disconnect the choice of what to read from the choice of who to follow. This means you could benefit from the collective choices of the lolcat community without necessarily being a member. In our view, a huge advantage to this approach is that it enables people who don’t use or understand twitter to benefit from it.

    As for how far to explore the graph, or how many levels of the graph to project into a hierarchy… it’s a fascinating topic. One could imagine, or instance, adding another level that represents your personal interests. Kind of a personalized meta-newspaper. This is compelling, but raises the question of normalization between the communities. I might be more interested in journalism than lolcats, but if the lolcat community is bigger then it could crowd-out the journalism stuff. Sounds solvable, though, by weighing them differently or through automatic normalization.

    Josh, that’s a good question about gaming and bigger graphs. I think my point is that it opened up new opportunities for gaming, but I didn’t explain what they were and whether the benefits outweigh the risks. Here’s what I was thinking: the larger the scale and the more hops through the graph, the harder it is for an editor to understand the consequences of his decision to follow/unfollow someone. For example, if the editor thinks that sourceA is great but no one else knows about sourceA, but sourceB is well-known and a self-promoter, then sourceB would have more authority, even if the editor prefers sourceA.

    I agree this bears much more examination, though. One kind of experiment we’re keen to run is to look at the same data with a palette of algorithms, and see how they differ.

    Lyn and I debated the point about whether to let one source count more than once. I was making much the same arguments as you, and Lyn disagreed. Eventually I came to think that community resonance was more important than self-promotion and we ended up with Lyn’s approach. (Though there is a bug and the ‘top sources’ is computed the other way.)

  6. 13 Garrett French 2009 September 23 at 3:38 pm

    Could the system be insulated against gaming if it were hyper-niched?

    For example, these people are great thinkers in the CRM space, and many focus on social CRM, an emerging thought and technology area:

    If you hand-select trusted curators of a narrow niche, perhaps that could keep gaming to a minimum. Plus it would ensure that enthusiasts could have well groomed information.

    Of course that’s not massively scalable. But massively scalable is what bugs me about Digg and even something like TweetMeme. I don’t care what everyone thinks is great, only some people :)

    I’m not a developer and I’m a total newb to this conversation, so if anything I’ve said is borderline trollish, or even an oblivious redescription of pre-solved issues forgive me and respond with education ;)


  1. 1 links for 2009-09-04 « Blarney Fellow Trackback on 2009 September 5 at 1:06 am
  2. 2 links for 2009-09-07 « David Black Trackback on 2009 September 7 at 8:06 am
  3. 3 Curating the News Two Ways « Networked News « sull is vocally active Trackback on 2009 September 7 at 6:10 pm
  4. 4 Curating the News Two Ways « Predicate, LLC | Editorial + Content Strategy Trackback on 2009 September 21 at 3:06 pm

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Josh Young's Facebook profile

What I'm saving.

RSS What I’m reading.

  • An error has occurred; the feed is probably down. Try again later.

%d bloggers like this: