October is “cyber security awareness month“. Among other notable announcements, Google just rolled out “advanced protection” — free for any Google account. So, in the spirit of offering pragmatic advice to real users, I wrote a short document that’s meant not for the usual Tinker audience but rather for the sort of person running a […]
Sidekick Users' Data Lost: Blame the Cloud?
Users of Sidekick mobile phones saw much of their data disappear last week due to engineering problems at a Microsoft data center. Sidekick devices lose the contents of their memory when they don’t have power (e.g. when the battery is being changed), so all data is transmitted to a data center for permanent storage — which turned out not to be so permanent.
(The latest news is that some of the data, perhaps most of it, may turn out to be recoverable.)
A common response to this story is that this kind of danger is inherent in “cloud” computing services, where you rely on some service provider to take care of your data. But this misses the point, I think. Preserving data is difficult, and individual users tend to do a mediocre job of it. Admit it: You have lost your own data at some point. I know I have lost some of mine. A big, professionally run data center is much less likely to lose your data than you are.
It’s worth noting, too, that many cloud services face lower risk of this sort of problem. My email, for example, lives in the cloud–the “official copy” is on a central server, and copies are downloaded frequently to my desktop and laptop computers. If the server were to go up in flames, along with all of the server backups, I would still be in good shape, because I would still have copies of everything on my desktop and laptop.
For my email and similar services, the biggest risk to data integrity is not that the server will disappear altogether, but that the server will misbehave in subtle ways, causing my stored data to be corrupted over time. Thanks to the automatic synchronization between the server and my two clients (desktop and laptop), bad data could be replicated silently into all copies. In principle, some of the damage could be repaired later, using the server’s backups, but that’s a best case scenario.
This risk, of buggy software corrupting data, has always been with us. The question is not whether problems will happen in the cloud — in any complex technology, trouble comes with the territory — but whether the cloud makes a problem worse.
What Economic Forces Drive Cloud Computing?
You know a technology trend is all-pervasive when you see New York Times op-eds about it — and this week saw the first Times op-ed about cloud computing, by Jonathan Zittrain. I hope to address JZ’s argument another day. Today I want to talk about a more basic issue: why we’re moving toward the cloud.
(Background: “Cloud computing” refers to the trend away from services provided by software running on standalone personal computers (“clients”), toward services provided across the Net with data stored in centralized data centers (“servers”). GMail and HotMail provide email in the cloud, Flickr provides photo albums in the cloud, and so on.)
The conventional wisdom is that functions are moving from the client to the server because server-side computing resources (storage, computation, and data transfer) are falling in cost, relative to the cost of client-side resources. Basic economics says that if a product uses two inputs, and the relative costs of the inputs change, production will shift to use more of the newly-cheap input and less of the newly-expensive one — so as server-side resources get relatively cheaper, designs will start to use more server-side resources and fewer client-side resources. (In fact, both server- and client-side resources are getting cheaper, but the argument still works as long as the cost of server-side resources is falling faster, which it probably is.)
This argument seems reasonable — and smart people have repeated it — but I think it misses the most important factors driving us into the cloud.
For starters, the standard argument assume that a move into the cloud simply relocates functions from client to server — we’re consuming the same resources, just consuming them in a place where they’re cheaper. But if you dig into the details, it looks like the cloud approach may use a lot more resources.
Rather than storing data on the client, the cloud approach often replicates data, storing the data on both server and client. If I use GMail on my laptop, my messages are stored on Google’s servers and on my laptop. Beyond that, some computation is replicated on both client and server, and we mustn’t forget that it’s less resource-efficient to provide computing inside a web browser than on the raw hardware. Add all of this up, and we might easily find that a cloud approach, uses more client-side resources than a client-only approach.
Why, then, are we moving into the cloud? The key issue is the cost of management. Thus far we focused only on computing resources such as storage, computation, and data transfer; but the cost of managing all of this — making sure the right software version is installed, that data is backed up, that spam filters are updated, and so on — is a significant part of the picture. Indeed, as the cost of computing resources, on both client and server sides, continues to fall rapidly, management becomes a bigger and bigger fraction of the total cost. And so we move toward an approach that minimizes management cost, even if that approach is relatively wasteful of computing resources. The key is not that we’re moving computation from client to server, but that we’re moving management to the server, where a team of experts can manage matters for many users.
This is still a story about shifts in the relative costs of inputs. The cost of computing is getting cheaper (wherever it happens), so we’re happy to use more computing resources in order to use our relatively expensive management inputs more efficiently.
What does tell us about the future of cloud services? That question will have to wait for another day.
Lessons from Amazon's 1984 Moment
Amazon got some well-deserved criticism for yanking copies of Orwell’s 1984 from customers’ Kindles last week. Let me spare you the copycat criticism of Amazon — and the obvious 1984-themed jokes — and jump right to the most interesting question: What does this incident teach us?
Human error was clearly part of the problem. Somebody at Amazon decided that repossessing purchased copies of 1984 would be a good idea. They were wrong about this, as both the public reaction and the company’s later backtracking confirm. But the fault lies not just with the decision-maker, but also with the factors that made the decision more likely, including some aspects of the technology itself.
Some put the blame on DRM, but that’s not the problem here. Even if the Kindle used open formats and let you export and back up your books, Amazon could still have made 1984 disappear from your Kindle. Yes, some users might have had backups of 1984 stored elsewhere, but most users would have lost their only copy.
Some blame cloud computing, but that’s not precisely right either. The Kindle isn’t really a cloud device — the primary storage, computing and user interface for your purchased books are provided by your own local Kindle device, not by some server at Amazon. You can disconnect your Kindle from the network forever (by flipping off the wireless network switch on the back), and it will work just fine.
Some blame the fact that Amazon controls everything about the Kindle’s software, which is a better argument but still not quite right. Most PCs are controlled by a single company, in the sense that that company (Microsoft or Apple) can make arbitrary changes to the software on the PC, including (in principle) deleting files or forcibly removing software programs.
The problem, more than anything else, is a lack of transparency. If customers had known that this sort of thing were possible, they would have spoken up against it — but Amazon had not disclosed it and generally does offer clear descriptions of how the product works or what kinds of control the company retains over users’ devices.
Why has Amazon been less transparent than other vendors? I’m not sure, but let me offer two conjectures. It might be because Amazon controls the whole system. Systems that can run third-party software have to be more open, in the sense that they have to tell the third-party developers how the system works, and they face some pressure to avoid gratuitous changes that might conflict with third-party applications. Alternatively, the lack of transparency might be because the Kindle offers less functionality than (say) a PC. Less functionality means fewer security risks, so customers don’t need as much information to protect themselves.
Going forward, Amazon will face more pressure to be transparent about the Kindle technology and the company’s relationship with Kindle buyers. It seems that e-books really are more complicated than dead-tree books.
Satyam and the Inadvertent Web
Satyam is one of the handful of large companies who dominate the IT outsourcing market in India, A week ago today, B. Ramalinga Raju, the company chairman, confessed to a years-long accounting fraud. More than a billion dollars of cash the company claimed to have on hand, and the business success that putatively generated those dollars, now appear to have been fictitious.
There are many tech policy issues here. For one, frauds this massive in high tech environments are a challenge and opportunity for computer forensics. For another, though we can hope this situation is unique, it may turn out to be the tip of an iceberg. If Satyam turns out to be part of a pattern of lax oversight and exaggerated profits across India’s high tech sector, it might alter the way we look at high tech globalization, forcing us to revise downward our estimates of high tech’s benefits in India. (I suppose it could be construed as a silver lining that such news might also reveal America, and other western nations, to be more globally competitive in this arena than we had believed them to be.)
But my interest in the story is more personal. I met Mr. Raju in early 2007, when Satyam helped organize and sponsor a delegation of American journalists to India. (I served as Managing Editor of The American at the time.) India’s tech sector wanted good press in America, a desire perhaps increased by the fact that Democrats who were sometimes skeptical of free trade had just assumed control of the House. It was a wonderful trip—we were treated well at others’ expense and got to see, and learn about, the Indian tech sector and the breathtaking city of Hyderabad. I posted pictures of the trip on Flickr, mentioning “Satyam” in the description, showed the pics to a few friends, and moved on with life.
Then came last week’s news. Here’s the graph of traffic to my flickr account: That spike represents several thousand people suddenly viewing my pictures of Satyam’s pristine campus.
When I think about the digital “trails” I leave behind—the flickr, facebook and twitter ephemera that define me by implication—there are some easy presumptions about what the future will hold. Evidence of raw emotions, the unmediated anger, romantic infatuation, depression or exhilaration that life sometimes holds, should generally be kept out of the record, since the social norms that govern public display of such phenomena are still evolving. While others in their twenties may consider such material normal, it reflects a life-in-the-fishbowl style of conduct that older people can find untoward, a style that would years ago have counted as exhibitionistic or otherwise misguided.
I would never, however, have guessed that a business trip to a corporate office park might one day be a prominent part of my online persona. In this case, I happen to be perfectly comfortable with the result—but that feels like luck. A seemingly innocuous trace I leave online, that later becomes salient, might just as easily prove problematic for me, or for someone else. There seems to be a larger lesson here: That anything we leave online could, for reasons we can’t guess at today, turn out to be important later. The inadvertent web—the set of seemingly trivial web content that exists today and will turn out to be important—may turn out to be a powerful force in favor of limiting what we put online.