November 23, 2024

Google Video and Privacy

Last week Google introduced its video service, which lets users download free or paid-for videos. The service’s design is distinctive in many ways, not all of them desirable. One of the distinctive features is a DRM (anti-infringement) mechanism which is applied if the copyright owner asks for it. Today I want to discuss the design of Google Video’s DRM, and especially its privacy implications.

First, some preliminaries. Google’s DRM, like everybody else’s, can be defeated without great difficulty. Like all DRM schemes that rely on encrypting files, it is vulnerable to capture of the decrypted file, or to capture of the keying information, either of which will let an adversary rip the video into unprotected form. My guess is that Google’s decision to use DRM was driven by the insistence of copyright owners, not by any illusion that the DRM would stop infringement.

The Google DRM system works by trying to tether every protected file to a Google account, so that the account’s username and password has to be entered every time the file is viewed. From the user’s point of view, this has its pros and cons. On the one hand, an honest user can view his video on any Windows PC anywhere; all he has to do is move the file and then enter his username and password on the new machine. On the other hand, the system works only when connected to the net, and it carries privacy risks.

The magnitude of privacy risk depends on the details of the design. If you’re going to have a DRM scheme that tethers content to user accounts, there are three basic design strategies available, which differ according to how much information is sent to Google’s servers. As we’ll see, Google apparently chose the design that sends the most information and so carries the highest privacy risk for users.

The first design strategy is to encrypt files so that they can be decrypted without any participation by the server. You create an encryption key that is derived from the username and password associated with the user’s Google account, and you encrypt the video under that key. When the user wants to play the video, software on the user’s own machine prompts for the username and password, derives the key, decrypts the video, and plays it. The user can play the video as often as she likes, without the server being notified. (The server participates only when the user initially buys the video.)

This design is great from a privacy standpoint, but it suffers from two main drawbacks. First, if the user changes the password in her Google account, there is no practical way to update the user’s video files. The videos can only be decrypted with the user’s old password (the one that was current when she bought the videos), which will be confusing. Second, there is really no defense against account-sharing attacks, where a large group of users shares a single Google account, and then passes around videos freely among themselves.

The second design tries to address both of these problems. In this design, a user’s files are encrypted under a key that Google knows. Before the user can watch videos on a particular machine, she has to activate her account on that machine, by sending her username and password to a Google server, which then sends back a key that allows the unlocking of that user’s videos on that machine. Activation of a machine can last for days, or weeks, or even forever.

This design addresses the password-change problem, because the Google server always knows the user’s current password, so it can require the current password to activate an account. It also addresses the account-sharing attack, because a widely-shared account will be activated on a suspiciously large number of machines. By watching where and how often an account is activated, Google can spot sharing of the account, at least if it is shared widely.

In this second design, more information flows to Google’s servers – Google learns which machines the user watches videos on, and when the user first uses each of the machines. But they don’t learn which videos were watched when, or which videos were watched on which machine, or exactly when the user watches videos on a given machine (after the initial activation). This design does have privacy drawbacks for users, but I think few users would complain.

In the third design, the user’s computer contacts Google’s server every time the user wants to watch a protected video, transmitting the username and password, and possibly the identity of the video being watched. The server then provides the decryption key needed to watch that particular video; after showing the video the software on the user’s computer discards the key, so that another handshake with the server is needed if the user wants to watch the same video later.

Google hasn’t revealed whether or not they send the identity of the video to the server. There are two pieces of evidence to suggest that they probably do send it. First, sending it is the simplest design strategy, given the other things we know about Google’s design. Second, Google has not said that they don’t send it, despite some privacy complaints about the system. It’s a bit disappointing that they haven’t answered this question one way or the other, either to disclose what information they’re collecting, or to reassure their users. I’d be willing to bet that they do send the identity of the video, but that bet is not a sure thing. [See update below.]

This third design is the worst one from a privacy standpoint, giving the server a full log of exactly where and when the user watches videos, and probably which videos she watches. Compared to the second design, this one creates more privacy risk but has few if any advantages. The extra information sent to the server seems to have little if any value in stopping infringement.

So why did Google choose a less privacy-friendly solution, even though it provided no real advantage over a more privacy-friendly one? Here I can only speculate. My guess is that Google is not as attuned to this kind of privacy issue as they should be. The company is used to logging lots of information about how customers use its services, so a logging-intensive solution would probably seem natural, or at least less unnatural, to its engineers.

In this regard, Google’s famous “don’t be evil” motto, and customers’ general trust that the company won’t be evil, may get Google into trouble. As more and more data builds up in the company’s disk farms, the temptation to be evil only increases. Even if the company itself stays non-evil, its data trove will be a massive temptation for others to do evil. A rogue employee, an intruder, or just an accidental data leak could cause huge problems. And if customers ever decide that Google might be evil, or cause evil, or carelessly enable evil, the backlash would be severe.

Privacy is for Google what security is for Microsoft. At some point Microsoft realized that a chain of security disasters was one of the few things that could knock the company off its perch. And so Bill Gates famously declared security to be job one, thousands of developers were retrained, and Microsoft tried to change its culture to take security more seriously.

It’s high time for Google to figure out that it is one or two privacy disasters away from becoming just another Internet company. The time is now for Google to become a privacy leader. Fixing the privacy issues in its video DRM would be a small step toward that goal.

[Update (Feb. 9): A Google representative confirms that in the current version of Google Video, the identity of the video is sent to their servers. They have updated the service’s privacy policy to disclose this clearly.]

Comments

  1. Interesting post. Two points to note, though:
    1. While it is true that Netflix, eBay, and other businesses (in particular credit card companies) do end up collecting significant information about individual buying behavior, it appears less threatening because they do not use that information on an individual basis (they do sell aggregated information based on buying behavior). The issue is partly perception and partly intent.

    2. With an ad-driven model that has a maniacal focus on relevance, Google is absolutely an information hog. If there ever is an opportunity to collect information that can be leveraged into serving more targeted ads, you can bet that Google will collect it.

  2. In another twist, Google Video’s stated terms of service prohibit films involving “invasions of personal privacy.” (from http://www.guardian.co.uk/uk_news/story/0,,1692690,00.html) It will be interesting to see what comes of the user-end privacy issue in addition to privacy issues related to content.

  3. Google are aligned with DivX and are using DivX 4 for its video downloads so if they are going to lock themselves its self with any vendor it seems to be DivX who develop cross platform funtionalty for thier product .If they want to lock in with DivX I have no problems …..

    DiX DRm is Account based not device based and if you have a DivX certifed device you can play protected DivX content on your set top box .
    You still need to connect to the internet to get your intitial key And Im sure this is what Google is working towards so they can achive convergence …

  4. A little issue (not concerning privacy) on that an internet connection is needed to start watching a movie: the content is unusable if the connection is nonexistent or not working. What if you wanted to entertain yourself on a train, bus or ship watching a video on your laptop? Even if you have wireless via GSM, there are “empty areas”.

  5. Not mentioned in the analysis is the cost of storage and CPU power on the above three models. As I read it scenario #3 would win handily on those two marks – probably Google’s most significant cost on this project.

  6. Don’t forget about the strategic issues with DRM — the one thing for which it is really effective is vendor lock-in, so Google absolutely had to succeed in this project or it would have been forced to grant that lock-in to Microsoft. It was in a weak position, so probably gave away quite a lot.

  7. Google can ensure it ‘does no evil’ and avoid privacy concerns quite simply by publishing all the otherwise private information. It can keep its own information private of course (such as private keys).

  8. Just to clarify, Google has had the video service for months, downloading said video’s was only introduced a few days ago.

  9. Supercat said:

    “I don’t particularly like the privacy issues raised by all this, but provided Google’s software stays within bounds and doesn’t install non-removable garbage the way XCP/Mediamax do, Google has the right to decide on what terms it will release its content (and I as a consumer have the right to decide on what terms I will buy it).”

    I agree that Google and the consumer have the right to decide however I was hoping that Google would have the foresight and courage to take the truly ‘no evil’ approach where everyone could trade real rights to these intellectual works without the need for DRM and the associated privacy and control issues.

    As I have said here: http://groups.google.com/group/DRM-Copyright-and-Google?lnk=li

  10. Ed, I think this is a non-issue as it’s nothing new or unusual, and even you’re guilty of it to some extent — after all, every time I leave a comment on here you can work out where I am by comapring comment timestamps with server logs — you can then work out what sort of time I get to work in the morning, and all sorts of other things. In fact any website where users are identified in some way can track the movement of those users, so why pick on Google?

  11. How long until Google becomes evil?…

    Ed Felten (Freedom to Tinker ) made me really think with this comment:

    It’s high time for Google to figure out that it is one or two privacy disasters away from becoming just another Internet company.
    He’s talking here specificall…

  12. “I don’t know if we ever fully recovered from the label of spying.”

    It’s not just RealJukebox that is a problem, regular Realplayer plants a persistent process every time it runs and stays there until you have to manually turn it off. I’m sure it’s some sort of update thing but my Zonealarm goes off randomly when that rogue process is running on my PC. You have to kill the process manually *and* edit the startup config because it sets itself to start up upon a reboot. IMHO, that is not cool at all and is the reason I completely stopped using real player or any real products. So as for recovering from a “label of spying,” perhaps Real Networks need to clean up their act and not do things that appear to be spying. All reputable applications will close themselves down completely upon a user’s request to do so. Real player, to this day, does not.

  13. “Knowing that (Mary) once watched a video at Bob’s house is one thing; knowing that she does it twice a week, and often late at night, is another thing entirely. Netflix just doesn’t get this kind of information.”

    Again, there is nothing new here. So if Mary logs into her wsj.com subscription at Bob’s house twice a week, then the WSJ knows she’s at Bob’s.

    If Mary checks her Yahoo email at Bob’s house, Yahoo knows Mary is at Bob’s.

    Of course none of this is nearly a privacy problem for Mary than the frequent phone calls she is making to Bob’s cell phone to arrange these late-night video-watching liasons…

    I’m just puzzled at what’s new here – you have to log in to get access to Internet content you paid for – people have been doing this with their email and their onlines subscriptions for over 10 years now.

  14. As someone who worked for Realnetworks for many years, I cannot reiterate enough the importance of avoiding one privacy-related incident. We had an issue with RealJukebox which was never intended to be invasive or malicious, but when it was discovered, it became a publicity nightmare. We learned a very hard lesson very quickly. From that moment on, we were overly careful about anything that in any way could infringe on a user’s privacy. We were probably the best in the business at that. But on the other hand, I don’t know if we ever fully recovered from the label of spying.

  15. While I agree with your complaints about Sony, I’m not sure your complaints about Google are so well-founded. If Alice is visiting Bob, Bob’s computer is only authorized to play the content Alice has paid for while Alice is logged in. Therefore, Bob’s computer cannot contain on any persistent medium all of the information necessary to view Alice’s videos or else Bob could view the information at will without authorization.

    I would expect that every time the viewer is fired up, it randomly generates a decryption key via some method; when a logged-in user selects a video to view, this key is sent (encrypted) to Google, which then encrypts the master key for the movie so that the supplied decryption key can process it.

    Using an approach like this means that not only is the master decryption key for the movie never stored in any persistent medium, but there’s nothing stored in any persistent medium [i]or communicated[/i] that will allow the master key to be reconstructed. If the player didn’t generate a new random decryption key when it fired up, capturing the communications would allow for a replay attack; randomizing that aspect of the player makes such an attack useless.

    I don’t particularly like the privacy issues raised by all this, but provided Google’s software stays within bounds and doesn’t install non-removable garbage the way XCP/Mediamax do, Google has the right to decide on what terms it will release its content (and I as a consumer have the right to decide on what terms I will buy it).

  16. Anonymous,

    Netflix knows which videos I have watched, but not when and where. The when and where data conveys information about my movements that I might not want to reveal.

    Suppose that Alice frequently watches Google videos on Bob’s home computer. That tells you something about her relationship to Bob that she might not want just anybody to know. Knowing that she once watched a video at Bob’s house is one thing; knowing that she does it twice a week, and often late at night, is another thing entirely. Netflix just doesn’t get this kind of information.

  17. “This third design is the worst one from a privacy standpoint, giving the server a full log of exactly where and when the user watches videos, and probably which videos she watches.”

    I don’t understand how DRM matters here…the server will have a log of all videos watched for any given IP address and cookie…the presence of DRM doesn’t change the fact that a server is giving you video and a log of that exists somewhere. Sure if you provide a credit card and log in, there’s a record of that – but there’s a record of that anytime you buy anything, so what’s the issue?

    How is this any different than Netflix knowing all the videos you ordered?

    How is this worse than an Internet company having all your personally identifiable information in your email? Email seems far more revealing than what movies you watch.

    This argument seems like a red herring…am I missing something?

  18. Google has been doing a lot of work with DivX and DivX also uses a account based DRM that needs to connect to a central server for verification .

    DivX and Google are also planing to get Video from the computer to a set top box or portable device something that needs authentication via a IP or a CD in the case of a DivX certified video player.

    Your site also needs vewrification of email to post Ed….

  19. I’m a fan of Google, I think they make good products, and while normally I don’t think trusting people to do the right thing is enough I think it’s safe to say that they’ve been loyal to the “Do no evil” concept so far.

    Unfortunately, the US government hasn’t.

    http://www.msnbc.msn.com/id/10925344/

    The good news is, Google is resisting attempts to abuse their records. May we note, this case not a question of privacy invasion. It’s about a study on population for legislative purposes, not seeking out individuals for prosecution. Google’s strict refusal is very pro-consumer/pro-privacy, but what if the courts force them to turn over their records?

    If Google loses, it sends a clear message: mass-accumulated data on user activities is NOT safe in anyone’s hands.

  20. The more data Google collects, the more tempting it is not only to Google and hackers, but to other industries as well. I can certainly see the MPAA wanting the information if they suspect there is some infringment going on. Once the MPAA decides it “must” have something, be it a new law or simply ISP logs, they have proven they are willing to spend whatever it takes on lawyers, lobbyists, and even media campains to get it.

  21. Google and Privacy…

    Professor Edward Felten accurately recognizes that privacy is Google’s Achilles Heel. He writes, “Privacy is for Google what security is for Microsoft. At some point Microsoft realized that a chain of security disasters was one of the few things that…

  22. Foolish Jordan says

    Internal employees and hackers aren’t the only ones who might want Google’s treasure trove of data. The gubmnt is after it as well: http://news.bbc.co.uk/1/hi/technology/4630694.stm

  23. “It’s high time for Google to figure out that it is one or two privacy disasters away from becoming just another Internet company. The time is now for Google to become a privacy leader. Fixing the privacy issues in its video DRM would be a small step toward that goal.”

    Unfortunately, I think that since the gov’t has realized just how much information gets stored stores, they will keep Google going. Google has turned into Big Business, at least as Evil as any. The U.S. gov’t used IBM’s factories to fight Germany (side-stepping a whole mess of things I could also bring up, I’ll actually try to reach my point), because of how useful they saw them. Different war, different technology: same story.

    I think the comparison you gave to Microsoft does have a certain ring to it. However, Microsoft’s security issues come from a massive failing and negligence, while Google’s privacy issues stem from active data harvesting and processing.

    And so ends my two groggy cents to add…