There’s an conversation among Randy Picker, Tim Wu, and Lior Strahilevitz over the U. Chicago Law School Blog about the relative merits of centralized and peer-to-peer designs for file distribution. (Picker post with Wu comments; Strahilevitz post) Picker started the discussion by noting that photo sharing sites like Flickr use a centralized design, rather than peer-to-peer. He questioned whether P2P design made sense, except as a way to dodge copyright enforcement. Wu pointed out that P2P designs can distribute large files more efficiently, as in BitTorrent. Strahilevitz pointed out that P2P designs resist censorship more effectively than centralized ones.
There’s a subtlety hiding here, and in most cases where people compare centralized services to distributed ones: from a technology standpoint, the “centralized” designs aren’t really centralized.
A standard example is Google. It’s presented to users as a single website, but if you look under the hood you’ll see that it’s really implemented by a network of hundreds of thousands of computers, distributed in data centers around the world. If you direct your browser to www.google.com, and I direct my browser to the same URL, we’ll almost certainly interact with entirely different sets of computers. The unitary appearance of the Google site is an illusion maintained by technical trickery.
The same is almost certainly true of Flickr, though on a smaller scale. Any big service will have to use a distributed architecture of some sort.
So what distinguishes “centralized” sites from P2P designs? I see two main differences.
(1) In a “centralized” site, all of the nodes in the distributed system are controlled by the same entity; in a P2P design, most nodes are controlled by end users. There is a technical tradeoff here. Centralized control offers some advantages, but they sacrifice the potential scalability that can come from enlisting the multitude of end user machines. (Users own most of the machines in the world, and those machines are idle most of the time – that’s a big untapped resource.) Depending on the specific application, one strategy or the other might offer better reliability.
(2) In a “centralized” site, the system interacts with the user through browser technologies; in a P2P design, the user downloads a program that offers a more customized user interface. There is another technical tradeoff here. Browsers are standardized and visiting a website is less risky for the user than downloading software, but a custom user interface sometimes serves users better.
The Wu and Strahilevitz argument focused on the first difference, which does seem the more important one these days. The bottom line, I think, is that P2P-style designs that involve end users’ machines make the most sense when scalability is at a premium, or when such designs are more robust.
But it’s important to remember that the issue isn’t whether the services uses lots of distributed computers. The issue is who controls those computers.