April 18, 2024

Political Information Overload and the New Filtering

[We’re pleased to introduce Luis Villa as a guest blogger. Luis is a law student at Columbia Law School, focusing on law and technology, including intellectual property, telecommunications, privacy, and e-commerce. Outside of class he serves as Editor-in-Chief of the Science and Technology Law Review. Before law school, Luis did great work on open source projects, and spent some time as “geek in residence” at the Berkman Center. — Ed]

[A big thanks to Ed, Alex, and Tim for the invitation to participate at Freedom To Tinker, and the gracious introduction. I’m looking forward to my stint here. — Luis]

A couple weeks ago at the Web 2.0 Expo NY, I more-or-less stumbled into a speech by Clay Shirky titled “It’s Not Information Overload, It’s Filter Failure.” Clay argues that there has always been a lot of information, so our modern complaints about information overload are more properly ascribed to a breakdown in the filters – physical, economic, and social- that used to keep information at bay. This isn’t exactly a shockingly new observation, but now that Clay put it in my head I’m seeing filters (or their absence) everywhere.

In particular, I’m seeing lots of great examples in online politics. We’ve probably never been so deluged by political information as we are now, but Clay would argue that this is not because there is more information- after all, virtually everyone has had political opinions for ages. Instead, he’d say that the old filters that kept those opinions private have become less effective. For example, social standards used to say ‘no politics at the dinner table’, and economics used to keep every Luis, Ed, and Alex from starting a newspaper with an editorial page. This has changed- social norms about politics have been relaxed, and ‘net economics have allowed for the blooming of a million blogs and a billion tweets.

Online political filtering dates back at least to Slashdot’s early attempts to moderate commenters, and criticism of them stretches back nearly as far. But the new deluge of political commentary from everyone you know (and everyone you don’t) rarely has filtering mechanisms, norms, or economics baked in yet. To a certain extent, we’re witnessing the birth of those new filters right now. Among the attempts at a ‘new filtering’ that I’ve seen lately:

  • The previously linked election.twitter.com. This is typical of the twitter ‘ambient intimacy‘ approach to filtering- everything is so short and so transient that your brain does the filtering for you (or so it is claimed), giving you a 100,000 foot view of the mumblings and grumblings of a previously unfathomably vast number of people.
  • fivethirtyeight.com: an attempt to filter the noise of the thousands of polls into one or two meaningful numbers by applying mathematical techniques originally developed for analysis of baseball players. The exact algorithms aren’t disclosed, but the general methodologies have been discussed.
  • The C-Span Debate Hub: this has not reached its full potential yet, but it uses some Tufte-ian tricks to pull data out of the debates, and (in theory) their video editing tool could allow for extensive discussion of any one piece of the debate, instead of the debate as a whole- surely a way to get some interesting collection and filtering.
  • Google’s ‘In Quotes’: this takes one first step in filtering (gathering all candidate quotes in one place, from disparate, messy sources) but then doesn’t build on that.

Unfortunately, I have no deep insights to add here. Some shallow observations and questions, instead:

  • All filters have impacts- keeping politics away from the dinner table tended to mute objections to the status quo, the ‘objectivity’ of the modern news media filter may have its own pernicious effects, and arguably information mangled by PowerPoint can blow up Space Shuttles. Have the designers of these new political filters thought about the information they are and are not presenting? What biases are being introduced? How can those be reduced or made more transparent?
  • In at least some of these examples the mechanisms by which the filtering occurs are not a matter of record (538’s math) or are not well understood (twitter’s crowd/minimal attention psychology). Does/should that matter? What if these filters became ‘dominant’ in any sense? Should we demand the source for political filtering algorithms?
  • The more ‘fact-based’ filters (538, inquotes) seem more successful, or at least more coherent and comprehensive. Are opinions still just too hard to filter with software or are there other factors at work here?
  • Slashdot’s nearly ten year old comment moderation system is still quite possibly the least bad filter out there. None of the ‘new’ politics-related filters (that I know of) pulls together reputation, meta-moderation, and filtering like slashdot does. Are there systemic reasons (usability, economics, etc.?) why these new tools seem so (relatively) naive?

We’re entering an interesting time. Our political process is becoming both less and more mediated– more ‘susceptible to software’ in Dave Weinberger’s phrase. Computer scientists, software interaction designers, and policy/process wonks would all do well to think early and often about the filters and values embedded in this software, and how we can (and can’t) ‘tinker’ with them to get the results we’d like to see.

How Yahoo could have protected Palin's email

Last week I criticized Yahoo for their insecure password recovery mechanism that allowed an intruder to take control of Sarah Palin’s email account. Several readers asked me the obvious follow-up question: What should Yahoo have done instead?

Before we discuss alternatives, let’s take a minute to appreciate the delicate balance involved in designing a password recovery mechanism for a free, mass-market web service. On the one hand, users lose their passwords all the time; they generally refuse to take precautions in advance against a lost password; and they won’t accept being locked out of their own accounts because of a lost password. On the other hand, password recovery is an obvious vector for attack — and one exploited at large scale, every day, by spammers and other troublemakers.

Password recovery is especially challenging for email accounts. A common approach to password recovery is to email a new password (or a unique recovery URL) to the user, which works nicely if the user has a stable email address outside the service — but there’s no point in sending email to a user who has lost the password to his only email account.

Still, Yahoo could be doing more to protect their users’ passwords. They could allow users to make up their own security questions, rather than offering only a fixed set of questions. They could warn users that security questions are a security risk and that users with stable external email addresses might be better off disabling the security-question functionality and relying instead on email for password recovery.

Yahoo could also have followed Gmail’s lead, and disabled the security-question mechanism unless no logged-in user had accessed the account for five days. This clever trick prevents password “recovery” when there is evidence that somebody who knows the password is actively using the account. If the legitimate user loses the password and doesn’t have an alternative email account, he has to wait five days before recovering the password, but this seems like a small price to pay for the extra security.

Finally, Yahoo would have been wise, at least from a public-relations standpoint, to give extra protection to high-profile accounts like Palin’s. The existence of these accounts, and even the email addresses, had already been published online. And the account signup at Yahoo asks for a name and postal code so Yahoo could have recognized that this suddenly-important public figure had an account on their system. (It seems unlikely that Palin gave a false name or postal code in signing up for the account.) Given the public allegations that Palin had used her Yahoo email accounts for state business, these accounts would have been obvious targets for freelance “investigators”.

Some commenters on my previous post argued that all of this is Palin’s fault for using a Yahoo mail account for Alaska state business. As I understand it, the breached account included some state business emails along with some private email. I’ll agree that it was unwise for Palin to put official state email into a Yahoo account, for security reasons alone, not to mention the state rules or laws against doing so. But this doesn’t justify the break-in, and I think anyone would agree that it doesn’t justify publishing non-incriminating private emails taken from the account.

Indeed, the feeding frenzy to grab and publish private material from the account, after the intruder had published the password, is perhaps the ugliest aspect of the whole incident. I don’t know how many people participated — and I’m glad that at least one Good Samaritan tried to re-lock the account — but I hope the republishers get at least a scary visit from the FBI. It looks like the FBI is closing in on the initial intruder. I assume he is facing a bigger punishment.

Palin's email breached through weak Yahoo password recovery mechanism

This week’s breach of Sarah Palin’s Yahoo Mail account has been much discussed. One aspect that has gotten less attention is how the breach occurred, and what it tells us about security and online behavior.

(My understanding of the facts is based on press stories, and on reading a forum post written by somebody claiming to be the perpetrator. I’m assuming the accuracy of the forum post, so take this with an appropriate grain of salt.)

The attacker apparently got access to the account by using Yahoo’s password reset mechanism, that is, by following the same steps Palin would have followed had she forgotten her own password.

Yahoo’s password reset mechanism is surprisingly weak and easily attacked. To simulate the attack on Palin, I performed the same “attack” on a friend’s account (with the friend’s permission, of course). As far as I know, I followed the same steps that the Palin attacker did.

First, I went to Yahoo’s web site and said I had forgotten my password. It asked me to enter my email address. I entered my friend’s address. It then gave me the option of emailing a new password to my friend’s alternate email address, or doing an immediate password reset on the site. I chose the latter. Yahoo then prompted me with my friend’s security question, which my friend had previously chosen from a list of questions provided by Yahoo. It took me six guesses to get the right answer. Next, Yahoo asked me to confirm my friend’s country of residence and zip code — it displayed the correct values, and I just had to confirm that they were correct. That’s all! The next step had me enter a new password for my friend’s account, which would have allowed me to access the account at will.

The only real security mechanism here is the security question, and it’s often easy to guess the right answer, especially given several tries. Reportedly, Palin’s question was “Where did you meet your spouse?” and the correct answer was “Wasilla high”. Wikipedia says that Palin attended Wasilla High School and met her husband-to-be in high school, so “Wasilla high” is an easy guess.

This attack was not exactly rocket science. Contrary to some news reports, the attacker did not display any particular technical prowess, though he did display stupidity, ethical blindness, and disrespect for the law — for which he will presumably be punished.

Password recovery is often the weakest link in password-based security, but it’s still surprising that Yahoo’s recovery scheme was so weak. In Yahoo’s defense, it’s hard to verify that somebody is really the original account holder when you don’t have much information about who the original account holder is. It’s not like Sarah Palin registered for the email account by showing up at a Yahoo office with three forms of ID. All Yahoo knows is that the original account holder claimed to have the name Sarah Palin, claimed to have been born on a particular date and to live in a particular zip code, and claimed to have met his/her spouse at “Wasilla high”. Since this information was all in the public record, Yahoo really had no way to be sure who the account holder was — so it might have seemed reasonable to give access to somebody who showed up later claiming to have the same name, email address, and spouse-meeting place.

Still, we shouldn’t let Yahoo off the hook completely. Millions of Yahoo customers who are not security experts (or are security experts but want to delegate security decisions to someone else) entrusted the security of their email accounts to Yahoo on the assumption that Yahoo would provide reasonable security. Palin probably made this assumption, and Yahoo let her down.

If there’s a silver lining in this ugly incident, it is the possibility that Yahoo and other sites will rethink their password recovery mechanisms, and that users will think more carefully about the risk of email breaches.