Christin, Weigend, and Chuang have an interesting new paper on corruption of files in P2P networks. Some files are corrupted accidentally (they call this “pollution”), and some might be corrupted deliberately (“poisoning”) by copyright owners or their agents. The paper measures the availability of popular, infringing files on the eDonkey, Overnet, Gnutella, and FastTrack networks, and simulates the effect of different pollution strategies that might be used.
The paper studied a few popular files for which corruption efforts were not occurring (or at least not succeeding). Polluted versions of these files are found, especially on FastTrack, but these aren’t a barrier to user access because non-corrupted files tend to have more replicas available than polluted files do, and the systems return files with more replicas first.
They move on to simulate the effect of various pollution strategies. They conclude that a sufficiently sophisticated pollution strategy, which injects different decoy versions of a file at different times, and injects many replicas of the same decoy at the same time, would significantly reduce user access to targeted files.
Some P2P programs use simple reputation systems to try to distinguish corrupted files from non-corrupted ones; the paper argues that these will be ineffective against their best pollution strategy. But they also note that better reputation systems could can detect their sophisticated poisoning strategy.
They don’t say anything more about the arms race between reputation technologies and pollution technologies. My guess is that in the long run reputation systems will win, and poisoning strategies will lose their viability. In the meantime, though, it looks like copyright owners have much to gain from poisoning.
[UPDATE (6:45 PM): I changed the second paragraph to eliminate an error that was caused by my misreading of the paper. Originally I said, incorrectly, that the study found little if any evidence of pollution for the files they studied. In fact, they chose those files because they were not subject to pollution. Thanks to Cypherpunk, Joe Hall, and Nicolas Christin for pointing out my error.]
“Goombas”?
Pay download services will require a truly acceptable online micropayments system, like an “online Interac”. It will have to have strong authentication before it lets money be transferred out of an account, strong fraud protections (like current credit cards), anyone can get an account (unlike current credit cards), and nearly zero transaction costs so that pennies, or even fractions of a penny, are efficiently transferable. And there’ll have to be multiple vendors of such services (likely the banks) with a standard system (like Interac) so that they are nonetheless compatible, while competing on service and the like. In other words, the anti-Paypal.
I think that the inherent arrogance of the music industry will continue to boost usage of p2p networks. What we need to do is build a kind of ramp to “legal” or pay download services, by allowing incentives and making the process easier. And over time, competition should lower download prices. Still, if you seek truly rare tracks, where else can you find those obscure files but the p2ps? I hope that you were able to watch the excellent program on C-SPAN today about digital intellectual property. It was great to see some of the “combined media” that are being brilliantly composed today. Canada seems to have a fairly-rational approach to this issue, with fees levied on digital media — to help compensate artists. Some similar (hopefully voluntary) system might be an answer. But I think we have a basic right to trade, as long as artists are adequately compensated. So let’s propose some creative incentives, spur technological growth, and then “everybody wins!” Except, of course, for the industry goombas…
Chunk corruption seems to explain the Limewire problems Cypher describes.
There needs to be a system for detecting a “rogue” in the midst of a file’s “mesh” and kicking them out. Their version of a chunk differing from everyone else’s for instance.
Thanks to all three of you for pointing out my error. I will revise the post to reflect the true situation.
I am one of the authors of the paper, and I just wanted to clarify a small point.
We certainly don’t claim that there is no pollution/poisoning on peer-to-peer networks. On the contrary, there are significant levels of poisoning and pollution as reported in Jian Liang et al.’s study (ref. [17] in our paper).
What we did, though, was to pick songs/movies/etc that were not poisoned (or for which a potentially ongoing poisoning attack had no noticeable effect). Our goal was to contrast what the network properties look like in absence and in presence of some poisoning strategies.
Cypherpunk is absolutely right, poisoning can be very effective at present, as we show in Section 5. There are some more advanced techniques we did not talk about in the paper (chunk corruption, for instance), which also can pose significant problems to currently deployed p2p software.
By all means, our study is a first step, where we are trying to get an idea of what the networks look like in terms of content from a user’s perspective, and how this can be taken advantage of by poisoning attacks.
Thank you for your comments on the paper!
I don’t think that’s what they say Cipher… in fact, they state We […] show that the injection of a small number of decoys can seriously impact the users’ perception of content availability.”
I’m baffled by their claim that at present, poisoning isn’t having much effect. Go to the Limewire forum at gnutellaforums.com, and look at Download/Upload Problems. Some sample comments:
“I saw that other ppl up top had problems with corrupt files, but not as bad as me. Sometimes i download every song that appears on the search, and every songle one is corrupt. Obviously they aren’t corrupt, what have i done, or need to do to fix this problem.”
“I am having the same problem…. No matter which song I download it comes up File Corrupt. Anyone know why this happens?”
“does anyone know what it means when it comes up file corrupt because it comes up on almost all of the files i try to download”
“Why can’t I find a phone number to call and complain? For example, I have downloaded every single song from one singer and NONE OF THEM WORKED.”
“I’m only getting about 10% or less of the files I want. The rest are corrupt. LimeWire used to be so good, but now I cant get hardly anything none corrupt…”
“I seem to have the same problem. 9 out of 10 are corrupted. I know how to delete them and everything, that is not the problem. What I would like to know is how come we have so much damage files? I even started to suspect my computer, firewall etc. because I can’t believe that most of my downloads are damaged. Does anyone know the answer? I have used limewire in the past and have nothing but love for the program, but if I keep getting all these corrupted files I don’t know if I can stay this loving”
“I click CTRL + A to select all available songs for download and click on download, and every single one of them EVERY SINGLE ONE is corrupt.”
“I find that if I download anything older than say six months it’s completely possible — I’ve downloaded almost every song Disturbed has ever made and it worked. I downloaded recently about 15 Beatles songs and htey all worked just fine. It’s teh new stuff, the stuff that’ll really sell that isn’t working..”
“I’ve been trying for about a week now to download two files (primarily): mp3s of [Edit] and [Edit], and *every* copy I try and download, no matter the size, the bitrate, the number of people sharing the file, every last one of them arrives and I get the ‘file corrupt’ message.”
I know someone who downloads a lot and they are having the same experience. It looks to me like poisoning is being extremely effective, at least on the Gnutella network.