October 22, 2020

"Centralized" Sites Not So Centralized After All

There’s an conversation among Randy Picker, Tim Wu, and Lior Strahilevitz over the U. Chicago Law School Blog about the relative merits of centralized and peer-to-peer designs for file distribution. (Picker post with Wu comments; Strahilevitz post) Picker started the discussion by noting that photo sharing sites like Flickr use a centralized design, rather than peer-to-peer. He questioned whether P2P design made sense, except as a way to dodge copyright enforcement. Wu pointed out that P2P designs can distribute large files more efficiently, as in BitTorrent. Strahilevitz pointed out that P2P designs resist censorship more effectively than centralized ones.

There’s a subtlety hiding here, and in most cases where people compare centralized services to distributed ones: from a technology standpoint, the “centralized” designs aren’t really centralized.

A standard example is Google. It’s presented to users as a single website, but if you look under the hood you’ll see that it’s really implemented by a network of hundreds of thousands of computers, distributed in data centers around the world. If you direct your browser to www.google.com, and I direct my browser to the same URL, we’ll almost certainly interact with entirely different sets of computers. The unitary appearance of the Google site is an illusion maintained by technical trickery.

The same is almost certainly true of Flickr, though on a smaller scale. Any big service will have to use a distributed architecture of some sort.

So what distinguishes “centralized” sites from P2P designs? I see two main differences.

(1) In a “centralized” site, all of the nodes in the distributed system are controlled by the same entity; in a P2P design, most nodes are controlled by end users. There is a technical tradeoff here. Centralized control offers some advantages, but they sacrifice the potential scalability that can come from enlisting the multitude of end user machines. (Users own most of the machines in the world, and those machines are idle most of the time – that’s a big untapped resource.) Depending on the specific application, one strategy or the other might offer better reliability.

(2) In a “centralized” site, the system interacts with the user through browser technologies; in a P2P design, the user downloads a program that offers a more customized user interface. There is another technical tradeoff here. Browsers are standardized and visiting a website is less risky for the user than downloading software, but a custom user interface sometimes serves users better.

The Wu and Strahilevitz argument focused on the first difference, which does seem the more important one these days. The bottom line, I think, is that P2P-style designs that involve end users’ machines make the most sense when scalability is at a premium, or when such designs are more robust.

But it’s important to remember that the issue isn’t whether the services uses lots of distributed computers. The issue is who controls those computers.

Comments

  1. Depending on the specific application, one strategy or the other might offer better reliability.

    At the risk of merely recapitulating the discussion of P2P systems that followed your posting on the Computer Scientists’ Grokster brief, I would argue that the P2P approach almost never offers better reliability. In fact, it almost always–and usually quite explicitly–sacrifices reliability for economy.

    First, of course, we have to define “P2P”. After all, as you point out, large “centralized” services are often composed of numerous redundant components that might be described as “peers”. Even the telephone network consists of multiple “peer” telephone companies. Yet these are normally not included under the rubric, “P2P”.

    What characterizes P2P architectures, I would argue, is the explicit rejection of an a priori distinction between “client” and “server” nodes. In a P2P network, any node can step up and declare itself to be a server, whereas in traditional client-server systems, servers must be specifically designated as such.

    This is precisely what makes P2P systems inherently less reliable. In a system where servers have to be assigned that role, they can be selected for adherence to predefined standards of reliability (as well as availability, trustworthiness, and so on). When client nodes can decide to become servers, on the other hand, nothing at all can be expected of them.

    Of course, designated servers are unlikely to meet meaningful predefined standards of reliability for free, and the cost of maintaining the reliability of these servers has to be borne by someone. On the other hand, clients may in some cases be willing to take on the role of highly unreliable server literally for free–specifically, if they have otherwise unused resources to spare. This is the underlying rationale for the supposed economic promise of P2P: the fact that donated resources, however unreliable, are also dirt-cheap.

    As for what applications P2P is “good for”, I don’t know of a single legal application for which P2P, as I define it, is widely preferred over client-server architectures. The Internet, the Ur-P2P architecture, is today most certainly not P2P in practice–clients must enter into elaborate agreements with their servers/ISPs, and usually pay a chunk of extra money, in order to become servers/ISPs in their own right. Email servers are all frantically looking for ways to distinguish “real” servers from everyone else–since the latter are primarily vehicles for spam. Even storage and distribution of large (legal) files–P2P’s supposed “killer app”–is dominated by high-bandwidth servers and centralized distribution networks. So far, at least, P2P’s most productive application seems to be the generation of glitzy-but-useless academic research.

  2. I agree with many of Dan Simon’s points.

    If P2P has any meaning at all it cannot be applied to server farms; the distinction lies in the service model at the network layer and above, not implementation details hidden behind a single-server HTTP model.

    In particular, the claim Tim Wu makes that email operates on “peer to peer” principles is stretching it quite a bit. Not everything distributed can be dubbed P2P, and both SMTP and DNS are much more like centralized services (even if it’s lots of little centralized services, some of which talk to each other) than like Grokster or Tapestry or Chord.

    BGP is an interesting case, because it demonstrates one case where P2P is useful (and perhaps necessary, although it has its own reliability, scalability, and security problems, due in part to its P2P nature.) But there are significant differences even there, between engineered and explicitly peered networks and “open” peering networks. NTP is another interesting example of P2P technology, but one that in practice is used in a centralized fashion.

    There is still a lot of content being served by “traditional” web servers. Anyone who is selling something probably has a budget sufficient to serve large files, or at least contract with someone to do so.

    The argument that P2P is better suited seems, in my mind, to boil down to the contention that is more economically efficient to “buy” lots of small pipes than to buy a big pipe. (Unless you get some savings from communication with nearby neighbors, which many P2P systems don’t.) This seems suspicious to me, because little pipes are fed by big pipes.

    But, for file sharing services, the ‘payment’ to use the small pipes generally consists of content not owned by the service provider, which makes the economics very attractive for the small-pipe case. (Network providers have a limited appetite for MP3s.) In the long haul I think this is just shifting some costs to the ISPs, who end up either blocking P2P traffic or charging a bit more. I’m not convinced this is a reasonable equilibrium to end up at; I’d prefer to pay for my content directly and have the content provider buy a big pipe rather than have my ISP charge me more because of costs associated with P2P traffic.

  3. Ed, don’t forget about the original Napster. It used a client that the user downloaded, yet it was centralized. That, and there are browsers that are talking about implementing a “bittorrent://” protocol, so the days of having to download a P2P client may also be numbered.

    Your point #1 is clearly the primary distinguisher between centralized and P2P. While the bandwidth advantages are clear, P2P is also primarily about protocols (Gnutella, FastTrack, Bittorrent), even though each kind may start with a single client; that is, it’s also about openness in the face of closed, centralized networks. This puts some measure of control (but a different kind of control than you mention) in the hands of the user.

  4. Remind me not to post so soon after awaking. Clearly Napster was primarily P2P with only the searching be centralized. All, other sentences stand as written.

  5. Randy Picker says:

    Ed,

    I address some of what you say in a further post; if you are interested, see http://uchicagolaw.typepad.com/faculty/2005/10/more_peertopeer.html

    Randy

  6. Mark Christiansen says:

    What a pile of sophistry I am reading here. All this silver shovel technical verbiage to dodge the real issue. Should individuals and organizations of limited means be able to publish to large audiences? When you write there is no legitimate use for peer to peer you answer that question as no. Peer to peer offers the cost sharing which can allow someone to pay for a small amount of net bandwidth yet reach a large audience of people willing to share the load. It is not really about technical issues. It is about control. By painting peer to peer as something dishonest the goal is to limit the number of people who can publish and so make them more controllable. Why not just force every blog and web site to register with the government the way China does? I’ll bet a police state can do a fine job of protecting copyright.

  7. Rob Simmons says:

    Two quick comments, mostly to Dan Simon’s post:

    1) I question Dan Simon’s notion of reliability. As Picker’s response reads, “If the boys at Google—Larry Page and Sergey Brin—get up one day and decide to flick a switch, they can turn off Google.” – I’m not sure I call that reliable, especially if we’re not talking about Google but about a small, vunerable company.

    2) Skype telephony and Bittorrent downloads of new-release Knoppix/Linux/etc CDs are two instances where, in my understanding at least, P2P is a widely accepcted method of distribution. Correct me if I’m wrong.

  8. Larry Page and Sergy Brin are not going to turn off a listed advertising company!