January 23, 2022

Archives for March 2005

Cornell Researchers on P2P Quality Control

Kevin Walsh and Emin Gün Sirer, of Cornell University, have a new paper on Credence, a system for detecting unwanted files in P2P networks. It’s a kind of reputation system for files, designed to detect in advance that certain files are not what they claim to be. One use of this technology is to detect spoofed files inserted into P2P nets by copyright owners.

Credence is really a reputation system for files. Users cast votes, which are simple thumbs-up or thumbs-down verdicts on particular files, saying whether a file is what it claims to be. Every vote is digitally signed by the user who cast it, so that recipients can verify the authenticity of votes they are given. If you’re not sure whether a file is genuine, you can ask people to send you votes, and then you can combine the votes in a special way to yield a quality score for the file.

P2P systems are open, and they generally don’t register their users (or at least they don’t prevent fraudulent or repeated registrations) so users cannot reliably be identified by their real names. Instead, users make up pseudonyms for themselves. Suppose Alice wants to join the system. She makes a pseudonym for herself, by generating a cryptographic key-pair, consisting of a private key and a public key. Alice keeps the private key secret, and uses it to put digital signatures on the votes she casts. Along with each vote, she distributes a copy of the public key, which anybody can use to verify the digital signature. The public key also serves as a “name” for Alice’s pseudonym. The key attribute of this system is that if Bob wants to forge a vote, that is, to create a vote that appears to have come from Alice’s pseudonym, he must somehow determine Alice’s private key, which is essentially impossible if Alice does her cryptography correctly. In short, the cryptography ensures that anybody can make a pseudonym, but only the creator of a pseudonym can cast votes on its behalf.

This only solves half of the problem, because an adversary can create as many pseudonyms as he likes, and have them cast false votes (i.e, votes in favor of the validity of files that are actually invalid, or against the validity of files that are actually valid). So you can’t just add up all of the votes you receive; you need some way to tell whose votes to trust and whose to ignore. Here Credence uses a simple rule – trust people who tend to vote the same way that you do. Suppose Alice knows that files A, B, and C are valid, and that files X, Y, and Z are not valid. If some pseudonym “Bob” has votes in favor of A, B, and C, and against X, Y, and Z, then Alice concludes that “Bob” tends to vote accurately. If another pseudonym “Charlie” votes the opposite way on those six files, then Alice concludes that votes from “Charlie” tend to be the opposite of the truth. So if she sees some new file that “Bob” says is valid and “Charlie” says is invalid, Alice will conclude that the file is valid. Each party’s vote on the new file gets a weight, equal to the correlation between that party’s votes and Alice’s votes on other files. (The paper hints at further mechanisms that assign trust to people whose votes correlate with those of other people Alice trusts.)

This scheme presents would-be adversaries with a dilemma. If Alice’s votes are truthful, then if you want to mislead Alice about one file, you have to earn her trust by telling her the truth about other files. You can tell occasional lies, but on the whole you have to be a truth-teller. (You can achieve the same effect by lying about almost everything, and telling the truth about just one file. Then Alice will conclude that you are a habitual liar, and will count your votes with negative weight, giving credence to the opposite of what you say. Again, you have to provide Alice with many useful-to-her votes in order to trick her once.)

It looks like this method will work, if it can be implemented efficiently in a real network. The real question, I think, is whether it will scale up to enormous P2P networks containing huge numbers of files. Here I have serious doubts. The paper’s authors don’t claim that users have to know about all of the votes cast in the system. But they’re not entirely clear on how individual users can efficiently get the votes they need to make good decisions, if the network is very large. In the long run, I don’t think this scaling problem is insurmountable; but more research is required to solve it.

Godwin's Law, Updated

One of the most famous observations about online discussions is Godwin’s Law:

As an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches one.

When it comes to copyright policy, a related law seems to hold:

As a copyright policy discussion grows longer, the probability of pornography being invoked approaches one.

What’s really interesting is the corollary:

When the topic of a copyright policy discussion switches to pornography, each side suddenly adopts the other side’s arguments.

For example, Hollywood argues that filesharing will lead to a shortage of movies, because nobody will make movies they can’t sell. But when the topic switches to pornographic movies, suddenly they start arguing that filesharing increases the creation and availability of content.

Similarly, some P2P vendors who say they can’t possibly filter or block copyrighted content, suddenly decide, when the topic switches to porn, that they can provide effective blocking. See, for example, a recent letter from the Distributed Computing Industry Association (a group of mostly P2P companies) to the Senate:

It is a fact that no industry – including the entertainment industry that cynically hatched the strategy of wrongly equating P2P with risks to children – has been more responsive than ours to concerns about the exposure of young people to inappropriate material. For example, by simply using the password-protected family filter included at no charge with leading P2P software programs, a parent can ensure that NO pornographic images or videos will be returned in response to any searches, including those of known child-pornography keywords.

The assertion that “NO pornographic images or videos will be returned in response to any searches”, can’t possibly be true. Content-based porn filtering will do just as poorly on content received via P2P as it does on content received via the web. These filters will be just as leaky as everybody else’s, and of course they’ll only operate for users who choose to turn them on.

I guess porn really does turn your brain to mush.

Pharm Policy

I wrote Monday about pharming attacks, in which a villain corrupts the DNS system, which translates textual names (like “www.freedom-to-tinker.com”) into the IP addresses (like “”) that are used to route traffic on the Internet. By doing this, the villain can impersonate an Internet site convincingly. Today I want to talk about how to address this problem.

The best approach would be to secure the DNS system. We know how to do this. Solutions involve having authoritative DNS servers put some kind of digital signature on the information they give out, so that a computer receiving DNS translation information can verify that the information is endorsed by an authoritative server. Such a system, if universally deployed, would put the pharmers out of business. Unfortunately, secure DNS is not widely deployed.

A partial solution, for web access at least, is to access websites via secure (HTTPS) connections. The user, on seeing a valid site, would notice the lock icon on his browser, and would know that his machine was connected to the legitimate owner of the URL that his browser was displaying. A pharmer could make accesses to “www.citibank.com” go to his evil site, but he couldn’t fool the secure-connection mechanism, so he could not make the lock icon on the user’s browser light up.

This approach works fine, as long as users notice the lock icon and refuse to deal with sites that don’t use secure connections. Will users be so vigilant? Probably not. In practice, many sites fail to use secure connections, and browsers give subtle indications of whether a connection is secure but don’t scream about insecure connections. (How could they, when insecure connections are so common?)

One drawback of relying on secure web connections is that it doesn’t protect other communication services, such as email and instant messaging. Pharmers might try to attract a user’s email or IM connections to hostile servers. We know how to secure email, by assigning encryption keys to individuals and having them encrypt and digitally sign their email. Standard email programs even know how to handle encryption and signing. But, again, few people use these facilities.

You may have noticed a common pattern here: each of these mechanisms would be effective if widely adopted, but aren’t used much in practice. In each case, we have a collective action problem. If nearly everybody adopted one of these technologies, then the holdouts would have an incentive to adopt it too; but until a critical mass of adoption is reached, there is little incentive for others to join.

Consider secure web connections. If nearly every website used secure connections, then insecure connections would be rare enough that browsers could issue prominent warnings whenever they saw an insecure connection. This would give legitimate websites a strong incentive to use secure connections, in order to avoid scaring users away. Today, insecure connections are so common that they don’t attract any suspicion. (An online banking site that used insecure connections would be odd, and might arouse suspicion from alert users; but we’re far from the point when browsers expect secure connections from everybody.)

A similar problem holds for secure email. I could digitally sign my outgoing email, but this wouldn’t do much to prevent forged messages in practice. A forged message would of course be unsigned, but unless unsigned messages were rare, nobody would be taken aback on seeing one. But if almost all messages were digitally signed, than an unsigned message would be rare enough to arouse suspicion, and might trigger a prominent warning from the user’s email program.

In all of these cases, there is a tipping point, where the authentication technology is used so widely that failing to use it attracts suspicion. Once the tipping point is reached, the remaining holdouts will switch to using the technology. Assuming we agree that it would be good to adopt one of these technologies, how can we get to the tipping point?