November 22, 2024

Archives for 2009

What Economic Forces Drive Cloud Computing?

You know a technology trend is all-pervasive when you see New York Times op-eds about it — and this week saw the first Times op-ed about cloud computing, by Jonathan Zittrain. I hope to address JZ’s argument another day. Today I want to talk about a more basic issue: why we’re moving toward the cloud.

(Background: “Cloud computing” refers to the trend away from services provided by software running on standalone personal computers (“clients”), toward services provided across the Net with data stored in centralized data centers (“servers”). GMail and HotMail provide email in the cloud, Flickr provides photo albums in the cloud, and so on.)

The conventional wisdom is that functions are moving from the client to the server because server-side computing resources (storage, computation, and data transfer) are falling in cost, relative to the cost of client-side resources. Basic economics says that if a product uses two inputs, and the relative costs of the inputs change, production will shift to use more of the newly-cheap input and less of the newly-expensive one — so as server-side resources get relatively cheaper, designs will start to use more server-side resources and fewer client-side resources. (In fact, both server- and client-side resources are getting cheaper, but the argument still works as long as the cost of server-side resources is falling faster, which it probably is.)

This argument seems reasonable — and smart people have repeated it — but I think it misses the most important factors driving us into the cloud.

For starters, the standard argument assume that a move into the cloud simply relocates functions from client to server — we’re consuming the same resources, just consuming them in a place where they’re cheaper. But if you dig into the details, it looks like the cloud approach may use a lot more resources.

Rather than storing data on the client, the cloud approach often replicates data, storing the data on both server and client. If I use GMail on my laptop, my messages are stored on Google’s servers and on my laptop. Beyond that, some computation is replicated on both client and server, and we mustn’t forget that it’s less resource-efficient to provide computing inside a web browser than on the raw hardware. Add all of this up, and we might easily find that a cloud approach, uses more client-side resources than a client-only approach.

Why, then, are we moving into the cloud? The key issue is the cost of management. Thus far we focused only on computing resources such as storage, computation, and data transfer; but the cost of managing all of this — making sure the right software version is installed, that data is backed up, that spam filters are updated, and so on — is a significant part of the picture. Indeed, as the cost of computing resources, on both client and server sides, continues to fall rapidly, management becomes a bigger and bigger fraction of the total cost. And so we move toward an approach that minimizes management cost, even if that approach is relatively wasteful of computing resources. The key is not that we’re moving computation from client to server, but that we’re moving management to the server, where a team of experts can manage matters for many users.

This is still a story about shifts in the relative costs of inputs. The cost of computing is getting cheaper (wherever it happens), so we’re happy to use more computing resources in order to use our relatively expensive management inputs more efficiently.

What does tell us about the future of cloud services? That question will have to wait for another day.

Lessons from Amazon's 1984 Moment

Amazon got some well-deserved criticism for yanking copies of Orwell’s 1984 from customers’ Kindles last week. Let me spare you the copycat criticism of Amazon — and the obvious 1984-themed jokes — and jump right to the most interesting question: What does this incident teach us?

Human error was clearly part of the problem. Somebody at Amazon decided that repossessing purchased copies of 1984 would be a good idea. They were wrong about this, as both the public reaction and the company’s later backtracking confirm. But the fault lies not just with the decision-maker, but also with the factors that made the decision more likely, including some aspects of the technology itself.

Some put the blame on DRM, but that’s not the problem here. Even if the Kindle used open formats and let you export and back up your books, Amazon could still have made 1984 disappear from your Kindle. Yes, some users might have had backups of 1984 stored elsewhere, but most users would have lost their only copy.

Some blame cloud computing, but that’s not precisely right either. The Kindle isn’t really a cloud device — the primary storage, computing and user interface for your purchased books are provided by your own local Kindle device, not by some server at Amazon. You can disconnect your Kindle from the network forever (by flipping off the wireless network switch on the back), and it will work just fine.

Some blame the fact that Amazon controls everything about the Kindle’s software, which is a better argument but still not quite right. Most PCs are controlled by a single company, in the sense that that company (Microsoft or Apple) can make arbitrary changes to the software on the PC, including (in principle) deleting files or forcibly removing software programs.

The problem, more than anything else, is a lack of transparency. If customers had known that this sort of thing were possible, they would have spoken up against it — but Amazon had not disclosed it and generally does offer clear descriptions of how the product works or what kinds of control the company retains over users’ devices.

Why has Amazon been less transparent than other vendors? I’m not sure, but let me offer two conjectures. It might be because Amazon controls the whole system. Systems that can run third-party software have to be more open, in the sense that they have to tell the third-party developers how the system works, and they face some pressure to avoid gratuitous changes that might conflict with third-party applications. Alternatively, the lack of transparency might be because the Kindle offers less functionality than (say) a PC. Less functionality means fewer security risks, so customers don’t need as much information to protect themselves.

Going forward, Amazon will face more pressure to be transparent about the Kindle technology and the company’s relationship with Kindle buyers. It seems that e-books really are more complicated than dead-tree books.

A Freedom-of-Speech-based Approach To Limiting Filesharing – Part III: Smoke, smoke!

Over the past two days we have seen that filesharing is vulnerable to spamming, and that as a defense, the filesharers have used the IP block list to exclude the spammers from sharing files. Today I discuss how I think lawyers and laypeople should look at the legal issues. Since I am most decidedly not a lawyer, nothing I say here should be considered definitive. Hopefully, it is at least interesting.

An analogy:

Washington Square, in New York City, was for many years a place where drugs were sold. A fellow would stand around quietly saying to passersby “Smoke, smoke!” However, this so-called “steerer” held no drugs. His role was simply to direct the buyer to the “pitcher”, who had the drugs somewhere nearby, and who kept silent.

Even the strongest defender of free-speech rights understands that the “steerer’s” words are not just speech. His words are not similar to those of this article, though both simply say that someone in the park is selling. He is as legally responsible for the sale as the “pitcher”, because they are, according to legal terminology, “acting in concert”. He is a drug dealer who may never touch any drugs. Note also that the “steerer” receives payments from the illegal transactions – though it is not in fact legally necessary to be able to prove the payments to establish that he’s “acting in concert”. All that’s required is that the “steerer” and the “pitcher” share “community of purpose” in facilitating the illegal transaction.

In the Napster case, the court held that Napster, even though it did not have any copyrighted data on its servers, was liable for contributory infringement. To use Napster, a downloader would login to Napster’s central server, which connected the user to another user who had a file that was being searched for. Since it was Napster’s role to hook up the parties illegally exchanging files, it is reasonable to see this as analogous to the “steerer” in Washington Square – Napster didn’t have the infringing materials, but that really isn’t a defense.

The gnutella network is decentralized to solve the legal problem presented by the Napster decision. Nonetheless, there is something still centralized in gnutella: the IP block list. Users of LimeWire get their block list from LimeWire and only from LimeWire. Accordingly, if Napster was like the “steerer” in Washington Square, LimeWire furthers the “community of purpose” in a different way; it is someone who gives negative information rather than affirmative. He’s someone paid to stand in the park pointing out who are cheaters selling bad drugs, allowing the purchasers to find the good stuff.

What is a legitimate P2P spam filtering authority versus one that shares “community of purpose” with infringers? The former could legitimately act to keep the network from being flooded by those selling weight loss drugs, without facilitating infringing. There is probably no bright-line rule, but it is reasonably clear that LimeWire is well on the wrong side of any possible grey area.

It’s useful to compare gnutella spam cop LimeWire with e-mail spam cop AOL.

LimeWire does not clearly advertise its spam cop role as a feature of its software, and does not discuss its block list. (The LimeWire web site has only the cryptic description “We’re always working to protect you from viruses and unwanted sharing.”) There is no discussion anywhere about what sorts of sites and files it is blocking and for what reason. No notification is given by LimeWire to a site when it is blocked, nor is there any way given to contact LimeWire to remove yourself from the block list.

In comparison, blocking e-mail spam is, for AOL, a major selling point. AOL does not block bulk e-mailers (many of which are legitimate) on a whim. Every e-mail rejected by AOL is bounced with a notification to the sender, and there are detailed instructions to bulk e-mailers as to what they need to do to avoid running afoul of AOL’s filters. There is a way to contact AOL to remove oneself from the block list, if one is legitimate. The whole process is transparent.

It is clear that a legitimate spam cop cannot block spoofers, since any search for a non-infringing file would be unmolested by spoofs, yet it appears that LimeWire does block MediaDefender. In fact, LimeWire appears to be quietly promising to do so, when it says that it protects against “unwanted sharing”, whatever that is.

Lastly, it appears that LimeWire’s statements in court conceal what it is doing.

As we mentioned in the first post, there is an ongoing case, Arista v Lime Group. In its motion for Summary Judgement, LimeWire states

Likewise, LW does not have the ability to control the manner in which users employ the LimeWire software. Unlike the Napster defendants, LW does not maintain central servers containing files or indices of files. … LW’s system is like that analysed by the Ninth Circuit in Grokster, “truly decentralized”. … LW no more controls the actions of its customers than do any of the thousands of companies that provide hardware or other software used in connection with the internet.

This omits any discussion of LimeWire’s centralized block list. LW assuredly does control the manner in which LimeWire users employ the LimeWire software, because if a site is added to the IP block list, it is no longer visible to most LimeWire users. This is very far from the normal situation applying in other software used in connection with the internet.

Moreover, the plaintiffs’ attorneys appear to be unaware of the blocking of spoofs, as their reply motion makes no mention of it (nor the other hidden features of LimeWire software discussed yesterday).

While it might be possible to run a legitimate spam-blocking service for P2P networks, it would look rather different from what LimeWire is doing.

Conclusion

The best way to regulate filesharing effectively is to analyze the various players’ roles on free-speech grounds. The individual filesharers (when they share infringing material) are certainly violating the law, but in a small way that probably can’t be reasonably controlled. The publishers of the software that allows the network to run (including LimeWire) are exercising free speech – the fact that their code can be made to do something illegal should be irrelevant. However, LimeWire is facilitating infringing because of the way it runs its IP block list. If LimeWire were shut down, the gnutella network become useless for downloading infringing music. Because of their actions to keep the network safe for infringers – their “acting in concert” – LimeWire should be liable for contributory infringement.

This course will avoid free speech restrictions that trouble many. In terms of preventing infringing, it also will be far more productive than trying to target the small fish. It is an effective measure that respects rights.

[This series of posts has been a somewhat shortened version of an article here.]

A Freedom-of-Speech-based Approach To Limiting Filesharing – Part II: The Block List

On Wednesday we discussed the open structure of filesharing and its resulting vulnerability to spam. While there are some similarities between e-mail and gnutella spam, the spoof files have no analogue in e-mail. When MediaDefender puts up spoofs for Rihanna’s Disturbia, unless you are using gnutella to search for Disturbia – which you cannot legally do – the spam has no effect on you. But of course, if MediaDefender is allowed to persist in doing this successfully, gnutella would lose much of its appeal.

The solution that has traditionally been adopted is an IP block list. When MediaDefender puts up spoof files, they come from the IP addresses of MediaDefender’s computers. While it is possible that MediaDefender could (and doubtless would have to) get several computers to perform the spoofing, they are all accessing the internet through a single ISP. Therefore, when an ISP is found to be hosting a spoofing operation such as MediaDefender’s, the entire range of IP addresses owned by the ISP is added to filesharing program’s IP block list. When an IP address is on the block list, other computers will refuse to connect to it, thereby preventing it from filesharing.

Because filesharing becomes useless without something to stop spoof files, IP block lists are a common part of P2P sharing programs. Generally, they are posted on web sites and downloaded by the P2P program, at the direction of the user. The program is generally configurable to download the block list from a site of the user’s choosing, and the block list file is stored in a known location and is readable and editable by interested users. For example, this forum discussion describes how to download the block file for the P2P client eMule.

What is not broadly appreciated is the role that LimeWire the corporation plays in the gnutella network. LimeWire is not merely a provider of software (and there are non-LimeWire gnutella clients, not as popular as LimeWire). Limewire’s client software, aside from supporting the gnutella protocol, receives from LimeWire a cryptographically signed file, called simpp.xml. This file contains a number of parameters for the operation of the client, including its IP block list. Because of the strong cryptographic signing by LimeWire corporation, no one else may send the list. LimeWire can therefore, at its sole discretion, block hosts from sending data to essentially all of its clients. Anyone putting up files that LimeWire deems unsuitable is knocked off in a matter of hours, and, since LimeWire is by far the most popular gnutella client, the spoofer is effectively shut down.

The LimeWire P2P clients are unusual in that there is nothing configurable about the choice of block list. Moreover, unlike other programs, there is no way for anyone other than LimeWire to send it, and no way for a non-technical user to examine its contents – in fact, the typical non-technical user would not even know that blocking is going on. (The only way to turn off blocking is on an advanced configuration panel.)

(One other interesting feature is also revealed from looking at the simpp.xml file: LimeWire has added a facility that allows its server, and only its server, to contact a running LimeWire client and ask it various questions about what the client is doing. This feature allows LimeWire to phone up LimeWire clients and inspect them, thereby gathering information about its network. This feature could be used as a sort of mini-spyware, though it is not clear exactly what LimeWire does with it.)

Tomorrow we shall see one way to interpret the legal significance of these behaviors on LimeWire corporation’s part.

A Freedom-of-Speech Approach To Limiting Filesharing – Part I: Filesharing and Spam

[Today we kick off a series of three guest posts by Mitch Golden. Mitch was a professor of physics when, in 1995, he was bitten by the Internet bug and came to New York to become an entrepreneur and consultant. He has worked on a variety of Internet enterprises, including one in the filesharing space. As usual, the opinions expressed in these posts are Mitch’s alone. — Ed]

The battle between the record labels and filesharers has been somewhat out of the news a bit of late, but it rages on still. There is an ongoing court case Arista Records v LimeWire, in which a group of record labels are suing to have LimeWire held accountable for the copyright infringing done by its users. Though this case has attracted less attention than similar cases before it, it may raise interesting issues not addressed in previous cases. Though I am a technologist, not a lawyer, this series of posts will advocate a way of looking at the issues, including legal, using a freedom-of-speech based approach, which leads to some unusual conclusions.

Let’s start by reviewing some salient features of filesharing.

Filesharing is a way for a group of people – who generally do not know one another – to allow one another to see what files they collectively have on their machines, and to exchange desired files with each other. There are at least two components to a filesharing system: one allows a user who is looking for a particular file to see if someone has it, and another that allows the file to be transferred from one machine to the other.

One of the most popular filesharing programs in current use is LimeWire, which uses a protocol called gnutella. Gnutella is decentralized, in the sense that neither the search nor the exchange of files requires any central server. It is possible, therefore, for people to exchange copyrighted files – in violation of the law – without creating any log of the search or exchange in a central repository.

The gnutella protocol was originally created by developers from Nullsoft, the company that had developed the popular music player WinAmp, shortly after it was acquired by AOL. AOL was at that time merging with Time Warner, a huge media company, and so the idea that they would be distributing a filesharing client was quite unamusing to management. Work was immediately discontinued; however, the source for the client and the implementation of the protocol had already been released under the GPL, and so development continued elsewhere. LimeWire made improvements both to the protocol and the interface, and their client became quite popular.

The decentralized structure of filesharing does not serve a technical purpose. In general, centralized searching is simpler, quicker and more efficient, and so, for example, to search the web we use Google or Yahoo, which are gigantic repositories. In filesharing, the decentralized search structure instead serves a legal purpose: to diffuse the responsibility so no particular individual or organization can be held accountable for promoting the illegal copying of copyright materials. At the time the original development was going on, the Napster case was in the news, in which the first successful filesharing service was being sued by the record labels. The outcome of that case a few months later resulted in Napster being shut down, as the US courts held it (which was a centralized search repository) responsible for the copyright infringing file sharing its users were doing.

Whatever their legal or technical advantages, decentralized networks, by virtue of their openness, are vulnerable to a common problem: spam. For example, because anyone may send anyone else an e-mail, we are all subject to a deluge of messages trying to sell us penny stocks and weight loss remedies. Filesharing too is subject this sort of cheating. If someone is looking for, say, Rihanna’s recording Disturbia, and downloads an mp3 file that purports to be such, what’s to stop a spammer from instead serving a file with an audio ad for a Canadian pharmacy?

Spammers on the filesharing networks, however, have more than just the usual commercial motivations in mind. In general, there are four categories of fake files that find their way onto the network.

  • Commercial spam
  • Pornography and Ads for Pornography
  • Viruses and trojans
  • Spoof files

The last of these has no real analogue to anything people receive in e-mail It works as follows: if, for example, Rihanna’s record label wants to prevent you from downloading Disturbia, they might hire a company called MediaDefender. MediaDefender’s business is to put as many spoof files as possible on gnutella that purport to be Disturbia, but instead contain useless noise. If MediaDefender can succeed in flooding the network so that the real Disturbia is needle in a haystack, then the record label has thwarted gnutella’s users from violating their copyright.

Since people are still using filesharing, clearly a workable solution has been found to the problem of spoof files. In tomorrow’s post, I discuss this solution, and in the following post, I suggest its legal ramifications.