Last week Google introduced its video service, which lets users download free or paid-for videos. The service’s design is distinctive in many ways, not all of them desirable. One of the distinctive features is a DRM (anti-infringement) mechanism which is applied if the copyright owner asks for it. Today I want to discuss the design of Google Video’s DRM, and especially its privacy implications.
First, some preliminaries. Google’s DRM, like everybody else’s, can be defeated without great difficulty. Like all DRM schemes that rely on encrypting files, it is vulnerable to capture of the decrypted file, or to capture of the keying information, either of which will let an adversary rip the video into unprotected form. My guess is that Google’s decision to use DRM was driven by the insistence of copyright owners, not by any illusion that the DRM would stop infringement.
The Google DRM system works by trying to tether every protected file to a Google account, so that the account’s username and password has to be entered every time the file is viewed. From the user’s point of view, this has its pros and cons. On the one hand, an honest user can view his video on any Windows PC anywhere; all he has to do is move the file and then enter his username and password on the new machine. On the other hand, the system works only when connected to the net, and it carries privacy risks.
The magnitude of privacy risk depends on the details of the design. If you’re going to have a DRM scheme that tethers content to user accounts, there are three basic design strategies available, which differ according to how much information is sent to Google’s servers. As we’ll see, Google apparently chose the design that sends the most information and so carries the highest privacy risk for users.
The first design strategy is to encrypt files so that they can be decrypted without any participation by the server. You create an encryption key that is derived from the username and password associated with the user’s Google account, and you encrypt the video under that key. When the user wants to play the video, software on the user’s own machine prompts for the username and password, derives the key, decrypts the video, and plays it. The user can play the video as often as she likes, without the server being notified. (The server participates only when the user initially buys the video.)
This design is great from a privacy standpoint, but it suffers from two main drawbacks. First, if the user changes the password in her Google account, there is no practical way to update the user’s video files. The videos can only be decrypted with the user’s old password (the one that was current when she bought the videos), which will be confusing. Second, there is really no defense against account-sharing attacks, where a large group of users shares a single Google account, and then passes around videos freely among themselves.
The second design tries to address both of these problems. In this design, a user’s files are encrypted under a key that Google knows. Before the user can watch videos on a particular machine, she has to activate her account on that machine, by sending her username and password to a Google server, which then sends back a key that allows the unlocking of that user’s videos on that machine. Activation of a machine can last for days, or weeks, or even forever.
This design addresses the password-change problem, because the Google server always knows the user’s current password, so it can require the current password to activate an account. It also addresses the account-sharing attack, because a widely-shared account will be activated on a suspiciously large number of machines. By watching where and how often an account is activated, Google can spot sharing of the account, at least if it is shared widely.
In this second design, more information flows to Google’s servers – Google learns which machines the user watches videos on, and when the user first uses each of the machines. But they don’t learn which videos were watched when, or which videos were watched on which machine, or exactly when the user watches videos on a given machine (after the initial activation). This design does have privacy drawbacks for users, but I think few users would complain.
In the third design, the user’s computer contacts Google’s server every time the user wants to watch a protected video, transmitting the username and password, and possibly the identity of the video being watched. The server then provides the decryption key needed to watch that particular video; after showing the video the software on the user’s computer discards the key, so that another handshake with the server is needed if the user wants to watch the same video later.
Google hasn’t revealed whether or not they send the identity of the video to the server. There are two pieces of evidence to suggest that they probably do send it. First, sending it is the simplest design strategy, given the other things we know about Google’s design. Second, Google has not said that they don’t send it, despite some privacy complaints about the system. It’s a bit disappointing that they haven’t answered this question one way or the other, either to disclose what information they’re collecting, or to reassure their users. I’d be willing to bet that they do send the identity of the video, but that bet is not a sure thing. [See update below.]
This third design is the worst one from a privacy standpoint, giving the server a full log of exactly where and when the user watches videos, and probably which videos she watches. Compared to the second design, this one creates more privacy risk but has few if any advantages. The extra information sent to the server seems to have little if any value in stopping infringement.
So why did Google choose a less privacy-friendly solution, even though it provided no real advantage over a more privacy-friendly one? Here I can only speculate. My guess is that Google is not as attuned to this kind of privacy issue as they should be. The company is used to logging lots of information about how customers use its services, so a logging-intensive solution would probably seem natural, or at least less unnatural, to its engineers.
In this regard, Google’s famous “don’t be evil” motto, and customers’ general trust that the company won’t be evil, may get Google into trouble. As more and more data builds up in the company’s disk farms, the temptation to be evil only increases. Even if the company itself stays non-evil, its data trove will be a massive temptation for others to do evil. A rogue employee, an intruder, or just an accidental data leak could cause huge problems. And if customers ever decide that Google might be evil, or cause evil, or carelessly enable evil, the backlash would be severe.
Privacy is for Google what security is for Microsoft. At some point Microsoft realized that a chain of security disasters was one of the few things that could knock the company off its perch. And so Bill Gates famously declared security to be job one, thousands of developers were retrained, and Microsoft tried to change its culture to take security more seriously.
It’s high time for Google to figure out that it is one or two privacy disasters away from becoming just another Internet company. The time is now for Google to become a privacy leader. Fixing the privacy issues in its video DRM would be a small step toward that goal.
[Update (Feb. 9): A Google representative confirms that in the current version of Google Video, the identity of the video is sent to their servers. They have updated the service’s privacy policy to disclose this clearly.]