November 29, 2024

Google Video and Privacy

Last week Google introduced its video service, which lets users download free or paid-for videos. The service’s design is distinctive in many ways, not all of them desirable. One of the distinctive features is a DRM (anti-infringement) mechanism which is applied if the copyright owner asks for it. Today I want to discuss the design of Google Video’s DRM, and especially its privacy implications.

First, some preliminaries. Google’s DRM, like everybody else’s, can be defeated without great difficulty. Like all DRM schemes that rely on encrypting files, it is vulnerable to capture of the decrypted file, or to capture of the keying information, either of which will let an adversary rip the video into unprotected form. My guess is that Google’s decision to use DRM was driven by the insistence of copyright owners, not by any illusion that the DRM would stop infringement.

The Google DRM system works by trying to tether every protected file to a Google account, so that the account’s username and password has to be entered every time the file is viewed. From the user’s point of view, this has its pros and cons. On the one hand, an honest user can view his video on any Windows PC anywhere; all he has to do is move the file and then enter his username and password on the new machine. On the other hand, the system works only when connected to the net, and it carries privacy risks.

The magnitude of privacy risk depends on the details of the design. If you’re going to have a DRM scheme that tethers content to user accounts, there are three basic design strategies available, which differ according to how much information is sent to Google’s servers. As we’ll see, Google apparently chose the design that sends the most information and so carries the highest privacy risk for users.

The first design strategy is to encrypt files so that they can be decrypted without any participation by the server. You create an encryption key that is derived from the username and password associated with the user’s Google account, and you encrypt the video under that key. When the user wants to play the video, software on the user’s own machine prompts for the username and password, derives the key, decrypts the video, and plays it. The user can play the video as often as she likes, without the server being notified. (The server participates only when the user initially buys the video.)

This design is great from a privacy standpoint, but it suffers from two main drawbacks. First, if the user changes the password in her Google account, there is no practical way to update the user’s video files. The videos can only be decrypted with the user’s old password (the one that was current when she bought the videos), which will be confusing. Second, there is really no defense against account-sharing attacks, where a large group of users shares a single Google account, and then passes around videos freely among themselves.

The second design tries to address both of these problems. In this design, a user’s files are encrypted under a key that Google knows. Before the user can watch videos on a particular machine, she has to activate her account on that machine, by sending her username and password to a Google server, which then sends back a key that allows the unlocking of that user’s videos on that machine. Activation of a machine can last for days, or weeks, or even forever.

This design addresses the password-change problem, because the Google server always knows the user’s current password, so it can require the current password to activate an account. It also addresses the account-sharing attack, because a widely-shared account will be activated on a suspiciously large number of machines. By watching where and how often an account is activated, Google can spot sharing of the account, at least if it is shared widely.

In this second design, more information flows to Google’s servers – Google learns which machines the user watches videos on, and when the user first uses each of the machines. But they don’t learn which videos were watched when, or which videos were watched on which machine, or exactly when the user watches videos on a given machine (after the initial activation). This design does have privacy drawbacks for users, but I think few users would complain.

In the third design, the user’s computer contacts Google’s server every time the user wants to watch a protected video, transmitting the username and password, and possibly the identity of the video being watched. The server then provides the decryption key needed to watch that particular video; after showing the video the software on the user’s computer discards the key, so that another handshake with the server is needed if the user wants to watch the same video later.

Google hasn’t revealed whether or not they send the identity of the video to the server. There are two pieces of evidence to suggest that they probably do send it. First, sending it is the simplest design strategy, given the other things we know about Google’s design. Second, Google has not said that they don’t send it, despite some privacy complaints about the system. It’s a bit disappointing that they haven’t answered this question one way or the other, either to disclose what information they’re collecting, or to reassure their users. I’d be willing to bet that they do send the identity of the video, but that bet is not a sure thing. [See update below.]

This third design is the worst one from a privacy standpoint, giving the server a full log of exactly where and when the user watches videos, and probably which videos she watches. Compared to the second design, this one creates more privacy risk but has few if any advantages. The extra information sent to the server seems to have little if any value in stopping infringement.

So why did Google choose a less privacy-friendly solution, even though it provided no real advantage over a more privacy-friendly one? Here I can only speculate. My guess is that Google is not as attuned to this kind of privacy issue as they should be. The company is used to logging lots of information about how customers use its services, so a logging-intensive solution would probably seem natural, or at least less unnatural, to its engineers.

In this regard, Google’s famous “don’t be evil” motto, and customers’ general trust that the company won’t be evil, may get Google into trouble. As more and more data builds up in the company’s disk farms, the temptation to be evil only increases. Even if the company itself stays non-evil, its data trove will be a massive temptation for others to do evil. A rogue employee, an intruder, or just an accidental data leak could cause huge problems. And if customers ever decide that Google might be evil, or cause evil, or carelessly enable evil, the backlash would be severe.

Privacy is for Google what security is for Microsoft. At some point Microsoft realized that a chain of security disasters was one of the few things that could knock the company off its perch. And so Bill Gates famously declared security to be job one, thousands of developers were retrained, and Microsoft tried to change its culture to take security more seriously.

It’s high time for Google to figure out that it is one or two privacy disasters away from becoming just another Internet company. The time is now for Google to become a privacy leader. Fixing the privacy issues in its video DRM would be a small step toward that goal.

[Update (Feb. 9): A Google representative confirms that in the current version of Google Video, the identity of the video is sent to their servers. They have updated the service’s privacy policy to disclose this clearly.]

How Would Two-Tier Internet Work?

The word is out now that residential ISPs like BellSouth want to provide a kind of two-tier Internet service, where ordinary Internet services get one level of performance, and preferred sites or services, presumably including the ISPs’ own services, get better performance. It’s clear why ISPs want to do this: they want to charge big web sites for the privilege of getting preferred service.

I should say up front that although the two-tier network is sometimes explained as if there were two tiers of network infrastructure, the obvious and efficient implementation in practice would be to have a single fast network, and to impose deliberate delay or bandwidth throttling on non-preferred traffic.

Whether ISPs should be allowed to do this is an important policy question, often called the network neutrality issue. It’s a harder issue than advocates on either side admit. Regular readers know that I’ve been circling around this issue for a while, without diving into its core. My reason for shying away from the main issue is simply that I haven’t figured it out yet. Today I’ll continue circling.

Let’s think about the practical aspects of how an ISP would present the two-tier Internet to customers. There are basically two options, I think. Either the ISP can create a special area for preferred sites, or it can let sites keep their ordinary URLs. As we’ll see, either option leads to problems.

The first option is to give the preferred sites special URLs. For example, if this site had preferred status on AcmeISP, its URL for AcmeISP customers would be something like freedom-to-tinker.preferred.acmeisp.com. This has the advantage of telling customers clearly which sites are expected to have preferred-level performance. But it has the big disadvantage that URLs are no longer portable from one ISP to another. Portability of URLs – the fact that a URL means the same thing no matter where you use it – is one of the critical features that makes the web work, and makes sites valuable. It’s hard to believe that sites and users will be willing to give it up.

The second option is for users to name sites using ordinary names and URLs. For example, this site would be called freedom-to-tinker.com, regardless of whether it had preferred status on your ISP. In this scenario, the only difference between preferred and ordinary sites is that users would see much better performance for perferred sites.

To an ordinary user, this would look like a network that advertises high peak performance but often has lousy performance in practice. If you’ve ever used a network whose performance varies widely over time, you know how aggravating it can be. And it’s not much consolation to learn that the poor performance only happens when you’re trying to use that great video site your friend (on another ISP) told you about. You assume something is wrong, and you blame the ISP.

In this situation, it’s hard to believe that a complaining user will be impressed by an explanation that the ISP could have provided higher performance, but chose not to because the site didn’t pay some fee. Users generally expect that producers will provide the best product they can at a given cost. Business plans that rely on making products deliberately worse, without reducing the cost of providing them, are widely seen as unfair. Given that explanation, users will still blame the ISP for the performance problems they see.

The basic dilemma for ISPs is pretty simple. They want to segregate preferred sites in users’ minds, so that users will blame the site rather than the ISP for the poor performance of non-preferred sites; but segregating the preferred sites makes the sites much less valuable because they can no longer be named in the same way on different ISPs.

How can ISPs escape this dilemma? I’m not sure. It seems to me that ISPs will be driven to a strategy of providing Internet service alongside exclusive, only-on-this-ISP content. That’s a strategy with a poor track record.

Clarification (3:00 PM EST): In writing this post, I didn’t mean to imply that web sites were the only services among which providers wanted to discriminate. I chose to use Web sites because they’re useful in illustrating the issues. I think many of the same issues would arise with other types of services, such as VoIP. In particular, there will be real tension between the ISPs desire to label preferred VoIP services as strongly associated with, and supported by, that particular ISP; but VoIP services will have strong reasons to portray themselves as being the same service everywhere.

CGMS-A + VEIL = SDMI ?

I wrote last week about the Analog Hole Bill, which would require almost all devices that handle analog video signals to implement a particular anti-copying scheme called CGMS-A + VEIL. Today I want to talk about how that scheme works, and what we can learn from its design.

CGMS-A + VEIL is, not surprisingly, a combination of two discrete signaling technologies called CGMS-A and VEIL. Both allow information to be encoded in an analog video signal, but they work in different ways.

CGMS-A stores a few bits of information in a part of the analog video signal called the vertical blanking interval (VBI). Video is transmitted as a series of discrete frames that are displayed one by one. In analog video signals, there is an empty space between the frames. This is the VBI. Storing information there has the advantage that it doesn’t interfere with any of the frames of the video, but the disadvantage that the information, being stored in part of the signal that nobody much cares about, is easily lost. (Nowadays, closed captioning information is stored in the VBI; but still, VBI contents are easily lost.) For example, digital video doesn’t have a VBI, so straight analog-to-digital translation will lose anything stored in the VBI. The problem with CGMS-A, then, is that it is too fragile and will often be lost as the signal is stored, processed, and translated.

There’s one other odd thing about CGMS-A, at least as it is used in the Analog Hole Bill. It’s remarkably inefficient in storing information. The version of CGMS-A used there (with the so-called RCI bit) stores three bits of information (if it is present), so it can encode eight distinct states. But only four distinct states are used in the bill’s design. This means that it’s possible, without adding any bits to the encoding, to express four more states that convey different information about the copyright owner’s desires. For example, there could be a way for the copyright owner to signal that the customer was free to copy the video for personal use, or even that the customer was free to retransmit the video without alteration. But our representatives didn’t see fit to support those options, even though there are unused states in their design.

The second technology, VEIL, is a watermark that is inserted into the video itself. VEIL was originally developed as a way for TV shows to send signals to toys. If you pointed the toy at the TV screen, it would detect any VEIL information encoded into the TV program, and react accordingly.

Then somebody got the idea of using VEIL as a “rights signaling” technology. The idea is that whenever CGMS-A is signaling restrictions on copying, a VEIL watermark is put into the video. Then if a signal is found to have a VEIL watermark, but no CGMS-A information, this is taken as evidence that CGMS-A information must have been lost from that signal at some point. When this happens, the bill requires that the most restrictive DRM rules be applied, allowing viewing of the video and nothing else.

Tellingly, advocates of this scheme do their best to avoid calling VEIL a “watermark”, even though that’s exactly what it is. A watermark is an imperceptible (or barely perceptible) component, added to audio or video signal to convey information. That’s a perfect description of VEIL.

Why don’t they call it a watermark? Probably because watermarks have a bad reputation as DRM technologies, after the Secure Digital Music Initiative (SDMI). SDMI used two signals, one of which was a “robust” watermark, to encode copy control information in content. If the robust watermark was present but the other signal was absent, this was taken as evidence that something was wrong, and strict restrictions were to be enforced. Sound familiar?

SDMI melted down after its watermark candidates – all four of them – were shown to be removable by an adversary of modest skill. And an adversary who could remove the watermark could then create unprotected copies of the content.

Is the VEIL watermark any stronger than the SDMI watermarks? I would expect it to be weaker, since the VEIL technology was originally designed for an application where accidental loss of the watermark was a problem, but deliberate removal by an adversary was not an issue. So how does VEIL work? I’ll write about that soon.

UPDATE (23 Jan): An industry source tells me that one factor in the decision not to call VEIL a watermark is that some uses of watermarks for DRM are patented, and calling it a watermark might create uncertainty about whether it was necessary to license watermarking patents. Some people also assert (incorrectly, in my view) that a watermark must encode some kind of message, beyond just the presence of the watermark. My view is still that VEIL is accurately called a watermark.

The Professional Device Hole

Any American parent with kids of a certain age knows Louis Sachar’s novel Holes, and the movie made from it. It’s set somewhere in the Texas desert, at a boot camp for troublemaking kids. The kids are forced to work all day in the scorching sun, digging holes in the rock-hard ground then re-filling them. It seems utterly pointless but the grown-ups say it builds character. Eventually we learn that the holes aren’t pointless but in fact serve the interests of a few nasty grown-ups.

Speaking of holes, and pointless exercises, last month Reps. Sensenbrenner and Conyers introduced a bill, the Digital Transition Content Security Act, also known as the Analog Hole Bill.

“Analog hole” is an artfully chosen term, referring to the fact that audio and video can be readily converted back and forth between digital and analog formats. This is just a fact about the universe, but calling it a “hole” makes it sound like a problem that might possibly be solved. The last large-scale attack on the analog hole was the Secure Digital Music Initiative (SDMI) which went down in flames in 2002 after its technology was shown to be ineffective (and after SDMI famously threatened to sue researchers for analyzing the technology).

The Analog Hole Bill would mandate that any devices that can translate certain types of video signals from analog to digital form must comply with a Byzantine set of design restrictions that talk about things like “certified digital content rights protection output technologies”. Let’s put aside for now the details of the technology design being mandated; I’ll critique them in a later post. I want to write today about the bill’s exemption for “professional devices”:

PROFESSIONAL DEVICE.—(A) The term‘‘professional device” means a device that is designed, manufactured, marketed, and intended for use by a person who regularly employs such a device for lawful business or industrial purposes, such as making, performing, displaying, distributing, or transmitting copies of audiovisual works on a commercial scale at the request of, or with the explicit permission of, the copyright owner.

(B) If a device is marketed to or is commonly purchased by persons other than those described in subparagraph (A), then such device shall not be considered to be a ‘‘professional device”.

Tim Lee at Tech Liberation Front points out one problem with this exemption:

“Professional” devices, you see, are exempt from the restrictions that apply to all other audiovisual products. This raises some obvious questions: is it the responsibility of a “professional device” maker to ensure that too many “non-professionals” don’t purchase their product? If a company lowers its price too much, thereby allowing too many of the riffraff to buy it, does the company become guilty of distributing a piracy device? Perhaps the government needs to start issuing “video professional” licenses so we know who’s allowed to be part of this elite class?

I think this legislative strategy is extremely revealing. Clearly, Sensenbrenner’s Hollywood allies realized that all this copy-protection nonsense could cause problems for their own employees, who obviously need the unfettered ability to create, manipulate, and convert analog and digital content. This is quite a reasonable fear: if you require all devices to recognize and respect encoded copy-protection information, you might discover that content which you have a legitimate right to access has been locked out of reach by over-zealous hardware. But rather than taking that as a hint that there’s something wrong with the whole concept of legislatively-mandated copy-protection technology, Hollywood’s lobbyists took the easy way out: they got themselves exempted from the reach of the legislation.

In fact, the professional device hole is even better for Hollywood than Tim Lee realizes. Not only will it protect Hollywood from the downside of the bill, it will also create new barriers to entry, making it harder for amateurs to create and distribute video content – and just at the moment when technology seems to be enabling high-quality amateur video distribution.

The really interesting thing about the professional device hole is that it makes one provision of the bill utterly impossible to put into practice. For those reading along at home, I’m referring to the robustness rulemaking of section 202(1), which requires the Patent and Trademark Office (PTO) to establish technical requirements that (among other things) “can only with difficulty be defeated or circumvented by use of professional tools or equipment”. But there’s a small problem: professional tools are exempt from the technical requirements.

The robustness requirements, in other words, have to stop professional tools from copying content – and they have to do that, somehow, without regulating what professional tools can do. That, as they say, is a tall order.

That’s all for today, class. Here’s the homework, due next time:
(1) Table W, the most technical part of the bill, contains an error. (It’s a substantive error, not just a typo.) Explain what the error is.
(2) How would you fix the error?
(3) What can we learn from the fact that the error is still in the bill at this late date?

Predictions for 2006

Each January, I have offered predictions for the upcoming year. This year, Alex and I put our heads together to come up with a single list of predictions. Having doubled the number of bloggers making predictions, we seem to have doubled the number of predictions, too. Each prediction is supported by at least one of us, except the predictions that turn out to be wrong, which must have slipped in by mistake.

And now, our predictions for 2006:

(1) DRM technology will still fail to prevent widespread infringement. In a related development, pigs will still fail to fly.

(2) The RIAA will quietly reduce the number of lawsuits it files against end users.

(3) Copyright owners, realizing that their legal victory over Grokster didn’t solve the P2P problem, will switch back to technical attacks on P2P systems.

(4) Watermarking-based DRM will make an abortive comeback, but will still be fundamentally infeasible.

(5) Frustrated with Apple’s market power, the music industry will try to cozy up to Microsoft. Afraid of Microsoft’s market power, the movie industry will try to cozy up to Washington.

(6) The Google Book Search case will settle. Months later, everybody will wonder what all the fuss was about.

(7) A major security and/or privacy vulnerability will be found in at least one more major DRM system.

(8) Copyright issues will still be stalemated in Congress.

(9) Arguments based on national competitiveness in technology will have increasing power in Washington policy debates.

(10) Planned incompatibility will join planned obsolescence in the lexicon of industry critics.

(11) There will be broad consensus on the the need for patent reform, but very little consensus on what reform means.

(12) Attention will shift back to the desktop security problem, and to the role of botnets as a tool of cybercrime.

(13) It will become trendy to say that the Internet is broken and needs to be redesigned. This meme will be especially popular with those recommending bad public policies.

(14) The walls of wireless providers’ “walled gardens” will get increasingly leaky. Providers will eye each other, wondering who will be the first to open their network.

(15) Push technology (remember PointCast and the Windows Active Desktop?) will return, this time with multimedia, and probably on portable devices. People won’t like it any better than they did before.

(16) Broadcasters will move toward Internet simulcasting of free TV channels. Other efforts to distribute authorized video over the net will disappoint.

(17) HD-DVD and Blu-ray, touted as the second coming of the DVD, will look increasingly like the second coming of the Laserdisc.

(18) “Digital home” products will founder because companies aren’t willing to give customers what they really want, or don’t know what customers really want.

(19) A name-brand database vendor will go bust, unable to compete against open source.

(20) Two more significant desktop apps will move to an Ajax/server-based design (as email did in moving toward Gmail). Office will not be one of them.

(21) Technologies that frustrate discrimination between different types of network traffic will grow in popularity, backed partly by application service providers like Google and Yahoo.

(22) Social networking services will morph into something actually useful.

(23) There will be a felony conviction in the U.S. for a crime committed entirely in a virtual world.