December 26, 2024

Cost Tradeoffs of P2P

On Thursday, I jumped in to a bloggic discussion of the tradeoffs between centrally-controlled and peer-to-peer design strategies in distributed systems. (See posts by Randy Picker (with comments from Tim Wu and others), Lior Strahilevitz, me, and Randy Picker again.)

We’ve agreed, I think, that large-scale online services will be designed as distributed systems, and the basic design choice is between a centrally-controlled design, where most of the work is done by machines owned by a single entity, and a peer-to-peer design, where most of the work is done by end users’ machines. Google is a typical centrally-controlled design. BitTorrent is a typical P2P design.

The question in play at this point is when the P2P design strategy has a legitimate justification. Which justifications are “legitimate”? This is a deep question in general, but for our purposes it’s enough to say that improving technical or economic efficiency is a legitimate justification, but frustrating enforcement of copyright is not. Actions that have legitimate justifications may also have harmful side-effects. For now I’ll leave aside the question of how to account for such side-effects, focusing instead on the more basic question of when there is a legitimate justification at all.

Which design is more efficient? Compared to central control, P2P has both disadvantages and advantages. The main disadvantage is that in a P2P design, the computers participating in the system are owned by people who have differing incentives, so they cannot necessarily be trusted to work toward the common good of the system. For example, users may disconnect their machines when they’re not using the system, or they may “leech” off the system by using the services of others but refusing to provide services. It’s generally harder to design a protocol when you don’t trust the participants to play by the protocol’s rules.

On the other hand, P2P designs have three main efficiency advantages. First, they use cheaper resources. Users pay about the same price per unit of computing and storage as a central provider would pay. But the users’ machines a sunk cost – they’re already bought and paid for, and they’re mostly sitting idle. The incremental cost of assigning work to one of these machines is nearly zero. But in a centrally controlled system, new machines must be bought, and reserved for use in providing the service.

Second, P2P deals more efficiently with fluctuations in workload. The traffic in an online system varies a lot, and sometimes unpredictably. If you’re building a centrally-controlled system, you have to make sure that extra resources are available to handle surges in traffic; and that costs money. P2P, on the other hand, has the useful property that whenever you have more users, you have more users’ computers (and network connections) to put to work. The system’s capacity grows automatically whenever more capacity is needed, so you don’t have to pay extra for surge-handling capacity.

Third, P2P allows users to subsidize the cost of running the system, by having their computers do some of the work. In theory, users could subsidize a centrally-controlled system by paying money to the system operator. But in practice, monetary transfers can bring significant transaction costs. It can be cheaper for users to provide the subsidy in the form of computing cycles than in the form of cash. (A full discussion of this transaction cost issue would require more space – maybe I’ll blog about it someday – but it should be clear that P2P can reduce transaction costs at least sometimes.)

Of course, this doesn’t prove that P2P is always better, or that any particular P2P design in use today is motivated only by efficiency considerations. What it does show, I think, is that the relative efficiency of centrally-controlled and P2P designs is a complex and case-specific question, so that P2P designs should not be reflexively labeled as illegitimate.

"Centralized" Sites Not So Centralized After All

There’s an conversation among Randy Picker, Tim Wu, and Lior Strahilevitz over the U. Chicago Law School Blog about the relative merits of centralized and peer-to-peer designs for file distribution. (Picker post with Wu comments; Strahilevitz post) Picker started the discussion by noting that photo sharing sites like Flickr use a centralized design, rather than peer-to-peer. He questioned whether P2P design made sense, except as a way to dodge copyright enforcement. Wu pointed out that P2P designs can distribute large files more efficiently, as in BitTorrent. Strahilevitz pointed out that P2P designs resist censorship more effectively than centralized ones.

There’s a subtlety hiding here, and in most cases where people compare centralized services to distributed ones: from a technology standpoint, the “centralized” designs aren’t really centralized.

A standard example is Google. It’s presented to users as a single website, but if you look under the hood you’ll see that it’s really implemented by a network of hundreds of thousands of computers, distributed in data centers around the world. If you direct your browser to www.google.com, and I direct my browser to the same URL, we’ll almost certainly interact with entirely different sets of computers. The unitary appearance of the Google site is an illusion maintained by technical trickery.

The same is almost certainly true of Flickr, though on a smaller scale. Any big service will have to use a distributed architecture of some sort.

So what distinguishes “centralized” sites from P2P designs? I see two main differences.

(1) In a “centralized” site, all of the nodes in the distributed system are controlled by the same entity; in a P2P design, most nodes are controlled by end users. There is a technical tradeoff here. Centralized control offers some advantages, but they sacrifice the potential scalability that can come from enlisting the multitude of end user machines. (Users own most of the machines in the world, and those machines are idle most of the time – that’s a big untapped resource.) Depending on the specific application, one strategy or the other might offer better reliability.

(2) In a “centralized” site, the system interacts with the user through browser technologies; in a P2P design, the user downloads a program that offers a more customized user interface. There is another technical tradeoff here. Browsers are standardized and visiting a website is less risky for the user than downloading software, but a custom user interface sometimes serves users better.

The Wu and Strahilevitz argument focused on the first difference, which does seem the more important one these days. The bottom line, I think, is that P2P-style designs that involve end users’ machines make the most sense when scalability is at a premium, or when such designs are more robust.

But it’s important to remember that the issue isn’t whether the services uses lots of distributed computers. The issue is who controls those computers.

eDonkey Seeks Record Industry Deal

Derek Slater points to last week’s Senate hearing testimony by Sam Yagan, President of MetaMachine, the distributor of the popular eDonkey peer-to-peer file sharing software.

The hearing’s topic was “Protecting Copyright and Innovation in a Post-Grokster World”. Had the Supreme Court drawn a clearer legal line in its Grokster decision, we wouldn’t have needed such a hearing. But the Court instead chose to create a vague new inducement standard that will apparently ensnare Grokster, but that leaves us in the dark about the boundaries of copyright liability for distributors of file sharing technologies.

It has long been rumored that the record and movie industries avoided dealmaking with P2P companies during the Grokster case, because deals would undercut the industry’s efforts to paint P2P as an outlaw technology. Yagan asserts that these rumors are true:

[MetaMachine] held multiple meetings with major music labels and publishers as well as movie studios, and at one point, received verbal commitments from major entertainment firms to proceed with proof-of-concept technical testing and market trials.

The firms later rescinded these approvals, however, with the private explanation that to proceed in collaboration with eDonkey on a business solution, or even to appear to be doing so, could jeopardize the case of the petitioners in the pending MGM v. Grokster litigation.

An obvious question now is whether the record industry will sue MetaMachine on a Grokster-based inducement theory. The industry did send a cease-and-desist letter to MetaMachine, along with several other P2P vendors. Yagan asserted that MetaMachine could successfully defend a recording industry lawsuit. I don’t know whether that’s right – I don’t have access to the facts upon which a court would decide whether MetaMachine has induced infringement – but it’s at least plausible.

Whether MetaMachine could actually win such a suit is irrelevant, though, because the company can’t afford to fight a suit, and can’t afford to risk the very high statutory damages it would face if it lost. So, Yagan said, MetaMachine has no choice but to make a deal now, on the record industry’s terms.

Because we cannot afford to fight a lawsuit – even one we think we would win – we have instead prepared to convert eDonkey’s user base to an online content retailer operating in a “closed” P2P environment. I expect such a transaction to take place as soon as we can reach a settlement with the RIAA. We hope that the RIAA and other rights holders will be happy with our decision to comply with their request and will appreciate our cooperation to convert eDonkey users to a sanctioned P2P environment.

MetaMachine has decided, in other words, that it is infeasible to sell P2P tools without the record industry’s blessing. The Supreme Court said pretty clearly in its Grokster decision that record industry approval is not a necessary precondition for a P2P technology to be legal. But record industry approval may be a practical necessity nonetheless. Certainly, the industry is energetically spreading the notion that when it comes to P2P systems, “legitimate” is a synonym for “approved by the record industry”.

But just when we’re starting to feel sympathy for Yagan and MetaMachine as victims of copyright overreaching, he does some overreaching of his own. eDonkey faces competition from a compatible, free program called eMule; and Yagan wants eMule shut down.

Not only have the eMule distributors adopted a confusingly similar name, but they also designed their application to communicate with our eDonkey clients using our protocol.

In other words, eMule clients basically camouflage themselves as eDonkey clients in order to download files from eDonkey users. As a result, eMule computers actually usurp some of the bandwidth that should be allocated to eDonkey file transfers, degrading the experience of eDonkey users.

Ignoring the loaded language, what’s happening here is that the eMule program is compatible with eDonkey, so that eMule users and eDonkey users can share files with each other. This isn’t illegal, and Yagan offers no argument that it is. Indeed, his testimony is artfully worded to give the impression, without actually saying so, that creating compatible software without permission is clearly illegal. I guess he figures that if we’re going to have copyright maximalism, we might as well have it for everybody.

There’s more interesting stuff in Yagan’s testimony, but I’m out of space here. Mark Lemley’s testimony is interesting too, offering some thoughful suggestions.

P2P Still Growing; Traffic Shifts to eDonkey

CacheLogic has released a new report presentation on peer-to-peer traffic trends, based on measurement of networks worldwide. (The interesting part starts at slide 5.)

P2P traffic continued to grow in 2005. As expected, there was no dropoff after the Grokster decision.

Traffic continues to shift away from the FastTrack network (used by Kazaa and others), mostly toward eDonkey. BitTorrent is still quite popular but has lost some usage share. Gnutella showed unexpected growth in the U.S., though its share is still small.

CacheLogic speculates, plausibly, that these trends reflect a usage shift away from systems that faced heavier legal attacks. FastTrack saw several legal attacks, including the Grokster litigation, along with many lawsuits against individual users. BitTorrent itself didn’t come under legal attack, but some sites directories of (mostly) infringing BitTorrent traffic were shut down. eDonkey came in for fewer legal attacks, and the lawyers mostly ignored Gnutella as insignificant; these systems grew in popularity. So far in 2005, legal attacks have shifted users from one system to another, but they haven’t reduced overall P2P activity.

Another factor in the data, which CacheLogic doesn’t say as much about, is a possible shift toward distribution of larger files. The CacheLogic traffic data count the total number of bytes transferred, so large files are weighted much more heavily than small files. This factor will tend to inflate the apparent importance of BitTorrent and eDonkey, which transfer large files efficiently, at the expense of FastTrack and Gnutella, which don’t cope as well with large files. Video files, which tend to be large, are more common on BitTorrent and eDonkey. Overall, video accounted for about 61% of P2P traffic, and audio for 11%. Given the size disparity between video and audio, it seems likely that the majority of content (measured by number of files, or by dollar value, or by minutes of video/audio content) was still audio.

The report closes by predicting the continued growth of P2P, which seems like a pretty safe bet. It notes that copyright owners are now jumping on the P2P bandwagon, having learned the lesson of BitTorrent, which is that P2P is a very efficient way to distribute files, especially large files. As for users,

End users love P2P as it gives them access to the media they want, when they want it and at high speed …

Will the copyright owners’ authorized P2P systems give users the access and flexibility they have come to expect? If not, users will stick with other P2P systems that do.

Aussie Judge Tweaks Kazaa Design

A judge in Australia has found Kazaa and associated parties liable for indirect copyright infringement, and has tentatively imposed a partial remedy that requires Kazaa to institute keyword-based filtering.

The liability finding is based on a conclusion that Kazaa improperly “authorized” infringement. This is roughly equivalent to a finding of indirect (i.e. contributory or vicarious) infringement under U.S. law. I’m not an expert in Australian law, so on this point I’ll refer you to Kim Weatherall’s recap.

As a remedy, the Kazaa parties will have to pay the 90% of the copyright owners’ trial expenses, and will have to pay damages for infringement, in an amount to be determined by future proceedings. (According to Kim Weatherall, Australian law does not allow the copyright owners to reap automatic statutory damages as in the U.S. Instead, they must prove actual damages, although the damages are boosted somehow for infringements that are “flagrant”.)

More interestingly, the judge has ordered Kazaa to change the design of their product, by incorporating keyword-based filtering. Kazaa allows users to search for files corresponding to certain artist names and song titles. The required change would disallow search terms containing certain forbidden patterns.

Designing such a filter is much harder than it sounds, because there are so many artist names and song names. These two namespaces are so crowded that a great many common names given to non-infringing recordings are likely to contain forbidden patterns.

The judge’s order uses the example of the band Powderfinger. Presumably the modified version of Kazaa would ban searches with “Powderfinger” as part of the artist name. This is all well and good when the artist name is so distinctive. But what if the artist name is a character string that occurs frequently in names, such as “beck”, “smiths”, or “x”? (All are names of artists with copyrighted recordings.) Surely there will be false positives.

It’s even worse for song names. You would have to ban simple words and phrases, like “Birthday”, “Crazy”, “Morning”, “Sailing”, and “Los Angeles”, to name just a few. (All are titles of copyrighted recordings.)

The judge’s order asks the parties to agree on the details of how a filter will work. If they can’t agree on the details, the judge will decide. Given the enormous number of artist and song names, and the crowded namespace, there are a great many details to decide, balancing over- and under-inclusiveness. It’s hard to see how the parties can agree on all of the details, or how the judge can impose a detailed design. The only hope is to appoint some kind of independent arbiter to make these decisions.

Ultimately, I think the tradeoff between over- and under-inclusiveness will prove too difficult – the filters will either fail to block many infringing files, or will block many non-infringing files, or both.

This is the same kind of filtering that Judge Patel ordered Napster to use, after she found Napster liable for indirect infringement. It didn’t work for Napster. Users just changed the spelling of artist and song names, adopting standard misspellings (e.g., “Metallica” changed to “Metalica” or “MetalIGNOREica” or the Pig Latin “Itallicamay”), or encoding the titles somehow. Napster updated its filters to compansate, but was always one step behind. And Napster’s job was easier, because the filtering was done on Napster’s own computers. Kazaa will have to try to download updates to users’ computers every time it changes its filters.

To the judge’s credit, he acknowledges that filtering will be imprecise and might even fail miserably. So he orders only that Kazaa must use filtering, but not that the filtering must succeed in stopping infringement. As long as Kazaa makes its best effort to make the agreed-upon (or ordered) filtering scheme work, it will have have satisfied the order, even if infringement goes on.

Kim Weatherall calls the judge’s decision “brave”, because it wades into technical design and imposes a remedy that requires an ongoing engagement between the parties, two things that courts normally try to avoid. I’m not optimistic about this remedy – it will impose costs on both sides and won’t do much to stop infringement. But at least the judge didn’t just order Kazaa to stop all infringement, an order with which no general-purpose communication technology could ever hope to comply.

In the end, the redesign may be moot, as the prospect of financial damages may kill Kazaa before the redesign must occur. Kazaa is probably dying anyway, as users switch to newer services. From now on, the purpose of Kazaa, in the words of the classic poster, may be to serve as a warning to others.