November 21, 2024

Cloud(s), Hype, and Freedom

Richard Stallman’s recent description of ‘the cloud’ as ‘hype’ and a ‘trap’ seems to have stirred up a lot of commentary, but not a lot of clear discussion of the problems Stallman raised. This isn’t surprising- the term ‘the cloud’ has always been vague. (It was hard to resist saying ‘cloudy.’ 😉 When people say ‘the cloud’ they are really lumping at least four ‘cloud types’ together.

traditional applications, hosted elsewhere

Probably the most common type of ‘cloud’ is a service that takes a traditional software functionality and moves it to remotely hosted, (typically) web-delivered servers. Gmail and salesforce.com are like this- fairly traditional email and CRM applications, ‘just’ moved to the web.

If Stallman’s ‘hype’ claim is valid anywhere, it is here. Administration and maintenance costs are definitely lower when an expert like Google funds and runs the server, and reliability may improve as well. But the core functionality of these apps, and the ability to access data over a network, have been present since the dawn of networked computing. On average, this is undoubtedly a significant change in quality, but only rarely a change in type- making the buzz much harder to justify.

Stallman’s ‘trap’ charge is more complex. Computer users have long compromised on personal control by storing data remotely but accessing it via standardized protocols. This introduced risks- you had to trust the data host and couldn’t tinker with the server- but kept some controls- you could switch clients, and typically you could export the data. Some web apps still strike that balance- for example, most gmail features are accessible via good old POP and IMAP. But others don’t.

Getting your data out of a service like salesforce can be a ‘hidden cost’ of an apparently free service, and even with a relatively standards-based service like gmail you have no freedom to make changes to the server. These risks are what Stallman means when he talks about a ‘trap’, and regardless of your conclusion about them, understanding them is important.

services involving data that can’t (yet) be managed locally

Google Maps and Google Search are the canonical examples of this type of cloud service- heaps of data so large that one would need a large data center to host your own copy and a very, very fat pipe to keep it up-to-date.

Hype-wise, these are a mixed bag. These services definitely bring radical new functionality that traditionally can’t exist- I can’t store all of google maps on my phone. That hype is justified. At the same time, our personal ability to store and process data is still growing quickly, so the claims that this type of cloud service will always ‘require’ remote servers may be overblown.

‘Trap’-wise? Dependence on these services reminds me of ‘dependence’ on a library before the internet- you can work to make sure your library respects your privacy, prefer public libraries to private ones, or establish a personal library if your reading interests are narrow, but in the end eschewing large libraries is likely to be a case of cutting off your nose to spite your face. We’re in the same state with this type of cloud service. You can avoid them, but those concerned with freedom might be better off understanding and fixing them than condemning them altogether.

services that make creation of new data technically or economically feasible

Facebook and wikipedia are the canonical examples here. Unlike the first two types of cloud, where data was available but inconvenient before it ended up in the cloud, this class of cloud applications creates information that wasn’t previously feasible to collect at all.

There may well not be enough hype around this type of cloud. Replicating web scale collaborative facilities like these will be very difficult to do in a p2p fashion, and the impact of the creation of new information (even when it is as mundane as facebook’s data often is) is hard to understate.

Like the previous type of cloud, it is hard to call these a trap per se- they do make it hard to leave, but they do so by providing new functionality that is very hard to get with any traditional software model.

services offering computing and storage, rather than data

The most recent type of cloud service is remotely provisioned computing and storage, like Amazon’s EC2/S3 and Google’s App Engine. This is perhaps the most purely generative type of cloud, allowing individuals to create new services and scale them out to service millions of people without having to invest in their own physical infrastructure. It is hard to see any way in which this can reasonably be called ‘hype,’ given the reach it allows individuals and small or transient groups to have which might otherwise cost them many thousands of dollars.

From a freedom perspective, these can be both the best and worst of the cloud types. On the plus side, these services can be incredibly transparent- developers who use them directly have access to their own source code, and end users may not know they are using them at all. On the down side, especially for proprietary platforms like App Engine, these can have very deep lock-in- it is complicated, expensive, and risky to switch deployment platforms after achieving success. And they replace traditional, very open platforms- a tradeoff that isn’t always appreciated.

takeaways

‘The cloud’ isn’t going away, but hopefully we can clarify our thinking about it by talking about the different types of clouds. Hopefully this post is a useful step in that direction.

[This post is an extension of some ideas I’ve been playing around with on my own blog and at the autonomo.us group blog; readers curious about these issues may want to read further in those places. I also recommend reading this piece, which set me on the (very long) road to this particular post.]

Comments

  1. “Anonymous (ahem) is, I think, a bit optimistic about what you can do with distributed crypto, albeit provably secure storage and evaluation are nice things to aim at.”

    This stuff is mathematically proven to be feasible, so there’s no “optimistic” about it.

  2. rp: the biggest problem with distributed crypto is primarily search. I can put all that data out there (see e.g. allmydata.org) but searching it requires sucking it all down, unencrypting, and then searching. Which obviously is not ideal.

  3. Anonymous (ahem) is, I think, a bit optimistic about what you can do with distributed crypto, albeit provably secure storage and evaluation are nice things to aim at. But the question is how much overhead you’re willing to pay for that security, since cost is supposedly a big factor for going out into the cloud in the first place…

  4. Hal wrote: “What inefficiencies would result in letting people put up the equivalent of their Facebook page, hosted on their own server? With links to their friends pages, and crawlers and protocols so that various pages update appropriately when remote pages change? Or how about Ebay, couldn’t that be done with a distributed system in the same way? There would be no service charge or commissions, no forced advertising, greater control over privacy and information sharing.”

    Why do you think most ISPs now have (and didn’t used to have, 10 years ago) a “no running your own servers” clause in their terms of service? Because their biggest business customers don’t want every Tom, Dick, and Harry going into business as their competitors and they let their ISPs know of their wishes, that’s why!

  5. rp wrote: “The discussion of lock-in for cloud computing power and data storage is interesting in significant part because it raises the question of how many such services there can profitably be. Is that market a natural oligopoly, or is it (like regular hosting) something that eventually becomes a commodity market full of sellers and resellers?”

    It’s like regular hosting. Storage and computing are fungible, endlessly subdividable, and don’t require physical rights-of-way or similarly, so anyone can get into the business with (relatively) low barriers to entry. The same factors that make hosting commoditized will therefore commoditize cloud storage and computing. In particular, the barriers to entry to providing (and scaling) all of these services are comparable.

  6. rp wrote: “Google (currently) has to collect that data and store it for months or longer to stay in business selling targeted ads and other services. And once the information is held, it can be gotten at.”

    PaulV wrote: “By moving your computing and your data to the cloud, the problem isn’t that it’s a pain to move, the problem is that you have no idea who has access to it (now or at any point in the future).”

    The two of you might be interested in checking out some current concepts in cryptography, including zero-knowledge proofs and secure function evaluation.

    In particular, cloud storage could be done storing encrypted files and keeping the key local, or storing two files on two different services that xor with each other to produce the real file but are essentially 100% random individually. (Each can be thought of as a cyphertext with the other a one-time pad!) An adversary has to find both files, and know which two of your files to xor together.

    There’s also Freenet for cloud storage of anything multiple people want access to, but where they want to publish and peruse anonymously and untraceably.

    And cloud computing could use SFE to avoid those providing the service being able to know anything about the computations they’re doing!

  7. rp wrote: “The library analogy is an interesting one. Libraries structure their information systems to retain a minimum amount of information about their users; the “cloud” equivalents seem to do the opposite, and to derive a fair amount of their revenue from that difference. Thus far Google et al seem to have been fairly successful at deflecting borad requests for user records, but what will happen when they aren’t?”

    It’s interesting that Google has put an “anonymous mode” into their Chrome browser, which could be used when accessing Google’s own sites…

  8. I think a big problem with Amazon/Google style compute clouds is that you are no longer in possession of your data. If the NSA convinces Amazon to give access, there’s no way for you to know, and nothing you can do about it. By moving your computing and your data to the cloud, the problem isn’t that it’s a pain to move, the problem is that you have no idea who has access to it (now or at any point in the future).

  9. Just pointing out that one of the things that has historically made libraries attractive is their relative anonymity and lack of detailed record-keeping.

    I’m curious to what extent this is actually true- how old is the tradition of library privacy? Libraries have been around a long time (nearly a couple millenia); privacy as a politically-respected concept, not so long (arguably less than a couple centuries.)

    re: oligopoly: certainly there are economies of scale in some of these areas (particularly hosting of large data centers) that suggest that oligopoly would be likely, at least until technology (connection speeds, efficacy of p2p, reliability of local software) changes significantly.

    Hal: there are a large number of factors- reliability, installation time, maintenance, distributed search, distributed notification- all of which are biased towards centralized services in these kinds of cases. None are insurmountable, but there isn’t much incentive to overcome them when the current solution works very well from every perspective other than autonomy/privacy.

  10. I’ve never fully understood why services like Facebook could not be implemented in a distributed P2P fashion. Suppose we overcame the hurdle of everyone having their own personal web server. What inefficiencies would result in letting people put up the equivalent of their Facebook page, hosted on their own server? With links to their friends pages, and crawlers and protocols so that various pages update appropriately when remote pages change? Or how about Ebay, couldn’t that be done with a distributed system in the same way? There would be no service charge or commissions, no forced advertising, greater control over privacy and information sharing.

  11. Luis:

    I agree completely about the practicality side. Just pointing out that one of the things that has historically made libraries attractive is their relative anonymity and lack of detailed record-keeping. (Which in turn is why removing “inappropriate” stuff from library shelves is considered such a big deal by arguers pro and con.) As long as the business model of the online analogs relies so heavily on tracking and identification of users, there’s going to be some tension there, and simply saying that we might get Google or whoever to adopt strong privacy policies misses the point a little, because Google (currently) has to collect that data and store it for months or longer to stay in business selling targeted ads and other services. And once the information is held, it can be gotten at.

    The discussion of lock-in for cloud computing power and data storage is interesting in significant part because it raises the question of how many such services there can profitably be. Is that market a natural oligopoly, or is it (like regular hosting) something that eventually becomes a commodity market full of sellers and resellers?

  12. Joe: you’re right that I used ‘proprietary’ very loosely there. Only so much you can do when you’re trying to force yourself to stay under 1K words 😉 Appdrop is very, very interesting, I hadn’t seen that. Disappointing that they ‘don’t claim to scale’; I wonder if it only duplicates the API without doing the backend scaling bits? (Tim Bray has some words about this exact issue today, by the way.)

    rp: I certainly wasn’t trying to claim that these digital ‘libraries’ have the same privacy protections as traditional libraries, just that they contain data that we would find very difficult to keep ‘at home’ (like a traditional library). They of course *could* get the same privacy protections as traditional libraries if we decided to push for them.

  13. The library analogy is an interesting one. Libraries structure their information systems to retain a minimum amount of information about their users; the “cloud” equivalents seem to do the opposite, and to derive a fair amount of their revenue from that difference. Thus far Google et al seem to have been fairly successful at deflecting borad requests for user records, but what will happen when they aren’t?

  14. “On the down side, especially for proprietary platforms like App Engine…”

    While the infrastructure that App Engine runs on is proprietary, the SDK is open source (http://code.google.com/p/googleappengine/) and it has been ported to other back-ends, such as AppDrop (http://appdrop.com/) which ported it over to work on EC2.

    -joe

    —-
    Developer Advocate
    Google App Engine

  15. My employer switched from in-house-hosted to GMAIL-hosted email, and I am not happy with the experience. I *average* five IMAP-server failures *per*day*. In contrast to that, my university email account (where I am adjunct faculty) has had two IMAP failures in the last six years, and my home ISP has had three IMAP failures in the last eight years.

    Google’s IMAP service *stinks* !!

  16. Though† provoking post Luis, and thanks for drawing my attention to the Stallman story.
    It’s interesting seeing this from an ed tech perspective, where most schools remain happy using desktop based applications, it’s the cutting edge few who are heading in the web 2.0 direction of cool, externally hosted tools, ranging from google docs through niche to flickr, youtube and the rest. The gain in connected learning is a tremendous one, but at what expense? Using open source tools and bringing similar functionality inside the school makes it easier to purpose the creativity and communication to educational ends as well as doing much to address e-safety issues out there on the wild web.
    A few thoughts of my own at http://milesberry.net/?p=352