December 12, 2024

Cloud(s), Hype, and Freedom

Richard Stallman’s recent description of ‘the cloud’ as ‘hype’ and a ‘trap’ seems to have stirred up a lot of commentary, but not a lot of clear discussion of the problems Stallman raised. This isn’t surprising- the term ‘the cloud’ has always been vague. (It was hard to resist saying ‘cloudy.’ 😉 When people say ‘the cloud’ they are really lumping at least four ‘cloud types’ together.

traditional applications, hosted elsewhere

Probably the most common type of ‘cloud’ is a service that takes a traditional software functionality and moves it to remotely hosted, (typically) web-delivered servers. Gmail and salesforce.com are like this- fairly traditional email and CRM applications, ‘just’ moved to the web.

If Stallman’s ‘hype’ claim is valid anywhere, it is here. Administration and maintenance costs are definitely lower when an expert like Google funds and runs the server, and reliability may improve as well. But the core functionality of these apps, and the ability to access data over a network, have been present since the dawn of networked computing. On average, this is undoubtedly a significant change in quality, but only rarely a change in type- making the buzz much harder to justify.

Stallman’s ‘trap’ charge is more complex. Computer users have long compromised on personal control by storing data remotely but accessing it via standardized protocols. This introduced risks- you had to trust the data host and couldn’t tinker with the server- but kept some controls- you could switch clients, and typically you could export the data. Some web apps still strike that balance- for example, most gmail features are accessible via good old POP and IMAP. But others don’t.

Getting your data out of a service like salesforce can be a ‘hidden cost’ of an apparently free service, and even with a relatively standards-based service like gmail you have no freedom to make changes to the server. These risks are what Stallman means when he talks about a ‘trap’, and regardless of your conclusion about them, understanding them is important.

services involving data that can’t (yet) be managed locally

Google Maps and Google Search are the canonical examples of this type of cloud service- heaps of data so large that one would need a large data center to host your own copy and a very, very fat pipe to keep it up-to-date.

Hype-wise, these are a mixed bag. These services definitely bring radical new functionality that traditionally can’t exist- I can’t store all of google maps on my phone. That hype is justified. At the same time, our personal ability to store and process data is still growing quickly, so the claims that this type of cloud service will always ‘require’ remote servers may be overblown.

‘Trap’-wise? Dependence on these services reminds me of ‘dependence’ on a library before the internet- you can work to make sure your library respects your privacy, prefer public libraries to private ones, or establish a personal library if your reading interests are narrow, but in the end eschewing large libraries is likely to be a case of cutting off your nose to spite your face. We’re in the same state with this type of cloud service. You can avoid them, but those concerned with freedom might be better off understanding and fixing them than condemning them altogether.

services that make creation of new data technically or economically feasible

Facebook and wikipedia are the canonical examples here. Unlike the first two types of cloud, where data was available but inconvenient before it ended up in the cloud, this class of cloud applications creates information that wasn’t previously feasible to collect at all.

There may well not be enough hype around this type of cloud. Replicating web scale collaborative facilities like these will be very difficult to do in a p2p fashion, and the impact of the creation of new information (even when it is as mundane as facebook’s data often is) is hard to understate.

Like the previous type of cloud, it is hard to call these a trap per se- they do make it hard to leave, but they do so by providing new functionality that is very hard to get with any traditional software model.

services offering computing and storage, rather than data

The most recent type of cloud service is remotely provisioned computing and storage, like Amazon’s EC2/S3 and Google’s App Engine. This is perhaps the most purely generative type of cloud, allowing individuals to create new services and scale them out to service millions of people without having to invest in their own physical infrastructure. It is hard to see any way in which this can reasonably be called ‘hype,’ given the reach it allows individuals and small or transient groups to have which might otherwise cost them many thousands of dollars.

From a freedom perspective, these can be both the best and worst of the cloud types. On the plus side, these services can be incredibly transparent- developers who use them directly have access to their own source code, and end users may not know they are using them at all. On the down side, especially for proprietary platforms like App Engine, these can have very deep lock-in- it is complicated, expensive, and risky to switch deployment platforms after achieving success. And they replace traditional, very open platforms- a tradeoff that isn’t always appreciated.

takeaways

‘The cloud’ isn’t going away, but hopefully we can clarify our thinking about it by talking about the different types of clouds. Hopefully this post is a useful step in that direction.

[This post is an extension of some ideas I’ve been playing around with on my own blog and at the autonomo.us group blog; readers curious about these issues may want to read further in those places. I also recommend reading this piece, which set me on the (very long) road to this particular post.]

More on Berman-Coble's Peer-to-Peer Definition

In a previous posting, I remarked on the overbreadth of the Berman-Coble bill’s definition of “peer to peer file trading network”. The definition has another interesting quirk, which looks to me like an error by the bill’s drafters.

Here is the definition:

‘peer to peer file trading network’ means two or more computers which are connected by computer software that–
(A) [is designed to support file sharing]; and
(B) does not permanently route all file or data inquiries or searches through a designated, central computer located in the United States;

Last time I dissected (A). Now let’s look at (B). I read (B) as requiring that all inquiries or searches be routed through a single computer in the U.S.

Some people speculate that this exception is supposed to protect AOL Instant Messenger and similar systems. Others surmise that it is meant to exclude “big central server” systems like Napster, on the theory that the central server can be sued out of existence so no hacking attacks on it are necessary.

In either case, the exception fails to achieve its aim. In fact, it’s hard to see how any popular file sharing system could possibly be covered by (B).

The reason is simple. Big sites don’t use a single server computer. They tend to use a cluster of computers, routing each incoming request to one or another of the computers. This is done because the load on a big site is simply too large for any single computer to handle, and because it allows the server to keep going despite the crash of any individual computer.

A really big site might use a hundred or more computers, and they might not all be in the same physical location. (Spreading them out increases fault tolerance and allows requests to be routed to a nearby server for faster service.)

Sites that implement advanced functions need even more computers. For example, Google uses more than 10,000 computers to provide their service.

Some small file sharing systems might be able to function with a single computer, but as soon as such a system became popular, it would have to switch to multiple computers and so the exception would no longer protect it.

It seems unlikely that the exception was intended to cover only small, unpopular systems. More likely, the authors of the bill, and the people who vetted it for them, simply missed this point.

Dornseif: Technological Definitions in the Law

Maximillian Dornseif offers some comments following up on my previous posts about Source vs. Object Code, and definitions in the Berman-Coble bill. A brief excerpt:

The court system and legal doctrine is built all arround definitions. While defining things like cruelty, carelessness and such stuff is a well understood problem for lawmakers and courts, technical circumstances seem to be a major problem.

By way of example he quotes a 124-word definition of “railway” from a German law.

He considers several explanations for this phenomenon (lawyers’ technophobia, techies’ lawphobia, technological change outracing the legal process, etc.) and finds them all valid, but not sufficient to explain the size and scope of the problem.

The Other Digital Divide

Long and well-written articleby Drew Clark and Bara Vaida in the National Journal’s Tech Daily, about the history of the current Hollywood vs. Silicon Valley battle over copy protection. If you’re still coming up to speed on this issue, the article is a great scene-setter. Even if you know the issue well, you still might learn a thing or two.

My favorite telling detail:

Valenti warned that the Hollings approach “might be what had to happen.”

No, the tech executives said, a process to resolve differences between the two industries was already in place: the technical working group formed in 1996. But Valenti wanted a CEO-level dialogue, not another meeting of the engineers.

Dilbert fans will recognize this as a classic Pointy-Haired Boss tactic: “We can’t solve this engineering problem. Maybe if we kick the engineers out of the room we can solve it faster.”

"Peer to Peer" in the Berman-Coble Bill

Yesterday’s defense of the Berman-Coble bill resurrected the argument that the bill only hurts the bad guys, because it authorizes hacking only of peer to peer file trading networks. And we all know that “Decentralized P2P networks were designed specifically (and ingeniously) to thwart suits for copyright infringement by ensuring there is no central service to sue.”

Let’s look at the bill’s definition:

‘peer to peer file trading network’ means two or more computers which are connected by computer software that–
(A) is primarily designed to – (i) enable the connected computers to transmit files or data to other connected computers; (ii) enable the connected computers to request the transmission of files or data from other connected computers; and (iii) enable the designation of files or data on the connected computers as available for transmission; and
(B) does not permanently route all file or data inquiries or searches through a designated, central computer located in the United States;

The definition clearly includes non-controversial technologies, such as the Web itself, that were not designed with copyright infringement in mind.

This is not just an easily-fixed bug in the bill’s definition. Instead, it reflects the fact that the Internet’s design philosophy is based on a peer to peer model in which anyone can send anything to anybody. The big-central-server design of a system like Napster is the historical exception; peer to peer is the rule.

I don’t see an easy way to rewrite the definition to draw a clear technical line between “bad” peer to peer technologies and “good” ones.