November 24, 2024

The Slingbox Pro: Information Leakage and Variable Bitrate (VBR) Fingerprints

[Today’s guest blogger is Yoshi Kohno, a Computer Science prof at University of Washington who has done interesting work on security and privacy topics including e-voting. – Ed]

If you follow technology news, you might be aware of the buzz surrounding technologies that mate the Internet with your TV. The Slingbox Pro and the Apple TV are two commercial products leading this wave. The Slingbox Pro and the Apple TV system are a bit different, but the basic idea is that they can stream videos over a network. For example, you could hook the Slingbox Pro up to your DVD player or cable TV box, and then wirelessly watch a movie on any TV in your house (via the announced Sling Catcher). Or you could watch a movie or TV show on your laptop from across the world.

Privacy is important for these technologies. For example, you probably don’t want someone sniffing at your ISP to figure out that you’re watching a pirated copy of Spiderman 3 (of course, we don’t condone piracy). You might not want your neighbor, who likes to sniff 802.11 wireless packets, to be able to figure out what channel, movie, or type of movie you’re watching. You might not want your hotel to figure out what movie you’re watching on your laptop in order to send you targeted ads. The list goes on…

To address viewer privacy, the Slingbox Pro uses encryption. But does the use of encryption fully protect the privacy of a user’s viewing habits? We studied this question at the University of Washington, and we found that the answer to this questions is No – despite the use of encryption, a passive eavesdropper can still learn private information about what someone is watching via their Slingbox Pro.

The full details of our results are in our Usenix Security 2007 paper, but here are some of the highlights.

First, in order to conserve bandwidth, the Slingbox Pro uses something called variable bitrate (VBR) encoding. VBR is a standard approach for compressing streaming multimedia. At a very abstract level, the idea is to only transmit the differences between frames. This means that if a scene changes rapidly, the Slingbox Pro must still transmit a lot of data. But if the scene changes slowly, the Slingbox Pro will only have to transmit a small amount of data – a great bandwidth saver.

Now notice that different movies have different visual effects (e.g., some movies have frequent and rapid scene changes, others don’t). The use of VBR encodings therefore means that the amount data transmitted over time can serve as a fingerprint for a movie. And, since encryption alone won’t fully conceal the number of bytes transmitted, this fingerprint can survive encryption!

We experimented with fingerprinting encrypted Slingbox Pro movie transmissions in our lab. We took 26 of our favorite movies (we tried to pick movies from the same director, or multiple movies in a series), and we played them over our Slingbox Pro. Sometimes we streamed them to a laptop attached to a wired network, and sometimes we streamed them to a laptop connected to an 802.11 wireless network. In all cases the laptop was one hop away.

We trained our system on some of those traces. We then took new query traces for these movies and tried to match them to our database. For over half of the movies, we were able to correctly identify the movie over 98% of the time. This is well above the less than 4% accuracy that one would get by random chance.

What does all this mean? First and foremost, this research result provides further evidence that critical information can leak out through encrypted channels; see our paper for related work. In the case of encrypted streaming multimedia, one might wonder how our results scale since we only tested 26 movies. Addressing the scalability question for our new VBR-based fingerprinting approach is a subject of future research; but, as cryptanalysts like to say, attacks only get better. Moreover, if the makers of movies wanted to, they could potentially make the VBR fingerprints for their movies even stronger and more uniquely identifying.

(This note is not meant to criticize the makers of the Slingbox Pro. In fact, we were very pleased to learn that the Slingbox Pro uses encryption, which does raise the bar against a privacy attacker. Rather, this note describes new research results and fundamental challenges for privacy and streaming multimedia.)

Why So Many False Positives on the No-Fly List?

Yesterday I argued that Walter Murphy’s much-discussed encounter with airport security was probably just a false positive in the no-fly list matching algorithm. Today I want to talk about why false positives (ordinary citizens triggering mistaken “matches” with the list) are so common.

First, a preliminary. It’s often argued that the high false positive rate proves the system is poorly run or even useless. This is not necessarily the case. In running a system like this, we necessarily trade off false positives against false negatives. We can lower either kind of error, but doing so will increase the other kind. The optimal policy will balance the harm from false positives against the harm from false negatives, to minimize total harm. If the consequences of a false positive are relatively minor (brief inconvenience for one traveler), but the consequences of a false negative are much worse (non-negligible probability of multiple deaths), then the optimal choice is to accept many false positives in order to drive the false negative rate way down. In other words, a high false positive rate is not by itself a sign of bad policy or bad management. You can argue that the consequences of error are not really so unbalanced, or that the tradeoff is being made poorly, but your argument can’t rely only on the false positive rate.

Having said that, the system’s high false positive rate still needs explaining.

The fundamental reason for the false positives is that the system matches names , and names are a poor vehicle for identifying people, especially in the context of air travel. Names are not as unique as most people think, and names are frequently misspelled, especially in airline records. Because of the misspellings, you’ll have to do approximate matching, which will make the nonuniqueness problem even worse. The result is many false positives.

Why not use more information to reduce false positives? Why not, for example, use the fact that the Walter Murphy who served in the Marine Corps and used to live near Princeton is not a threat?

The reason is that using that information would have unwanted consequences. First, the airlines would have to gather much more private information about passengers, and they would probably have to verify that information by demanding documentary proof of some kind.

Second, checking that private information against the name on the no-fly list would require bringing together the passenger’s private information with the government’s secret information about the person on the no-fly list. Either the airline can tell the government what it knows about the passenger’s private life, or the government can tell the airline what it knows about the person on the no-fly list. Both options are unattractive.

A clumsy compromise – which the government is apparently making – is to provide a way for people who often trigger false positives to supply more private information, and if that information distinguishes the person from the no-fly list entry, to give the person some kind of “I’m not really on the no-fly list” certificate. This imposes a privacy cost, but only on people who often trigger false positives.

Once you’ve decided to have a no-fly list, a significant false positive rate is nearly inevitable. The bigger policy question is whether, given all of its drawbacks, we should have a no-fly list at all.

Walter Murphy Stopped at Airport: Another False Positive

Blogs are buzzing about the story of Walter Murphy, a retired Princeton professor who reported having triggered a no-fly list match on a recent trip. Prof. Murphy suspects this happened because he has given speeches criticizing the Bush Administration.

I studied the no-fly list mechanism (and the related watchlist) during my service on the TSA’s Secure Flight Working Group. Based on what I learned about the system, I am skeptical of Prof. Murphy’s claim. I think he reached, in good faith, an incorrect conclusion about why he was stopped.

Based on Prof. Murphy’s story, it appears that when his flight reservation was matched against the no-fly list, the result was a “hit”. This is why he was not allowed to check in at curbside but had to talk to an airline employee at the check-in desk. The employee eventually cleared him and gave him a boarding pass.

(Some reports say Prof. Murphy might have matched the watchlist, a list of supposedly less dangerous people, but I think this is unlikely. A watchlist hit would have caused him to be searched at the security checkpoint but would not have led to the extended conversation he had. Other reports say he was chosen at random, which also seems unlikely – I don’t think no-fly list challenges are issued randomly.)

There are two aspects to the no-fly list, one that puts names on the list and another that checks airline reservations against the list. The two parts are almost entirely separate.

Names are put on the list through a secret process; about all we know is that names are added by intelligence and/or law enforcement agencies. We know the official standard for adding a name requires that the person be a sufficiently serious threat to aviation security, but we don’t know what processes, if any, are used to ensure that this standard is followed. In short, nobody outside the intelligence community knows much about how names get on the list.

The airlines check their customers’ reservations against the list, and they deal with customers who are “hits”. Most hits are false positives (innocent people who trigger mistaken hits), who are allowed to fly after talking to an airline customer service agent. The airlines aren’t told why any particular name is on the list, nor do they have special knowledge about how names are added. An airline employee, such as the one who told Prof. Murphy that he might be on the list for political reasons, would have no special knowledge about how names get on the list. In short, the employee must have been speculating about why Prof. Murphy’s name triggered a hit.

It’s well known by now that the no-fly list has many false positives. Senator Ted Kennedy and Congressman John Lewis, among others, seem to trigger false positives. I know a man living in Princeton who triggers false positives every time he flies. Having many false positives is inevitable given that (1) the list is large, and (2) the matching algorithm requires only an approximate match (because flight reservations often have misspelled names). An ordinary false positive is by far the most likely explanation for Prof. Murphy’s experience.

Note, too, that Walter Murphy is a relatively common name, making it more likely that Prof. Murphy was being confused with somebody else. Lycos PeopleSearch finds 181 matches for Walter Murphy and 307 matches for W. Murphy in the U.S. And of course the name on the list could be somebody’s alias. Many false positive stories involve people with relatively common names.

Given all of this, the most likely story by far is that Prof. Murphy triggered an ordinary false positive in the no-fly system. These are very annoying to the affected person, and they happen much too often, but they aren’t targeted at particular people. We can’t entirely rule out the possibility that the name “Walter Murphy” was added to the no-fly list for political reasons, but it seems unlikely.

(The security implications of the false positive rate, and how the rate might be reduced, are interesting issues that will have to wait for another post.)

Viacom, YouTube, and Privacy

Yesterday’s top tech policy story was the copyright lawsuits filed by Viacom, the parent company of Comedy Central, MTV, and Paramount Pictures, against YouTube and its owner Google. Viacom’s complaint accuses YouTube of direct, contributory, and vicarious copyright infringement, and inducing infringement. The complaint tries to paint YouTube as a descendant of Napster and Grokster.

Viacom argues generally that YouTube should have done more to help it detect and stop infringement. Interestingly, Viacom points to the privacy features of YouTube as part of the problem, in paragraph 43 of the complaint:

In addition, YouTube is deliberately interfering with copyright owners’ ability to find infringing videos even after they are added to YouTube’s library. YouTube offers a feature that allows users to designate “friends” who are the only persons allowed to see videos they upload, preventing copyright owners from finding infringing videos with this limitation…. Thus, Plaintiffs cannot necessarily find all infringing videos to protect their rights through searching, even though that is the only avenue YouTube makes available to copyright owners. Moreover, YouTube still makes the hidden infringing videos available for viewing through YouTube features like the embed, share, and friends functions. For example, many users are sharing full-length copies of copyrighted works and stating plainly in the description “Add me as a friend to watch.”

Users have many good reasons to want to limit access to noninfringing uploaded videos, for example to make home movies available to family members but not to the general public. It would be a shame, and YouTube would be much less useful, if there were no way to limit access. Equivalently, if any copyright owner could override the limits, there would be no privacy anymore – remember that we’re all copyright owners.

Is Viacom really arguing that YouTube shouldn’t let people limit access to uploaded material? Viacom doesn’t say this directly, though it is one plausible reading of their argument. Another reading is that they think YouTube should have an extra obligation to police and/or filter material that isn’t viewable by the public.

Either way, it’s troubling to see YouTube’s privacy features used to attack the site’s legality, when we know those features have plenty of uses other than hiding infringement. Will future entrepreneurs shy away from providing private communication, out of fear that it will be used to brand them as infringers? If the courts aren’t careful, that will be one effect of Viacom’s suit.

Soft Coercion and the Secret Ballot

Today I want to continue our discussion of the secret ballot. (Previous posts: 1, 2.) One purpose of the secret ballot is to prevent coercion: if ballots are strongly secret, then the voter cannot produce evidence of how he voted, allowing him to lie safely to the would-be coercer about how he voted.

Talk about coercion usually centers on lead-pipe scenarios, where somebody issues a direct threat to a voter. Nice kneecaps you have there … be a shame if something unfortunate should happen to them.

But coercion needn’t be so direct. Consider this scenario: Big Johnny is a powerful man in town. Disturbing rumors swirl around him, but nothing has ever been proven. Big Johnny is pals with the mayor, and it’s no secret that Big Johnny wants the mayor reelected. The word goes around town that Big Johnny can tell how you vote, though nobody is quite sure how he does it. When you get to the polling place, Big Johnny’s cousin is one of the poll workers. You’re no fan of the mayor, but you don’t know much about his opponent. How do you vote?

What’s interesting about this scenario is that it doesn’t require Big Johnny to do anything. No lawbreaking is necessary, and the scheme works even if Big Johnny can’t actually tell how you vote, as long as the rumor that he can is at all plausible. You’re free to vote for the other guy, but Big Johnny’s influence will tend to push your vote toward the mayor. It’s soft coercion.

This sort of scheme would work today. E-voting systems are far from transparent. Do you know what is recorded in the machine’s memory cartridge? Big Johnny’s pals can get the cartridge. Is your vote time-stamped? Big Johnny’s cousin knows when you voted. Are the votes recorded in the order they were cast? Big Johnny’s cousin knows that you were the 37th voter today.

Paper ballots aren’t immune to such problems, either. Are you sure the blank paper ballot they gave you wasn’t marked? Remember: scanners can see things you can’t. And high-res scanners might be able to recognize tiny imperfections in that sheet of paper, or distinctive ink-splatters in its printing. Sure, the ballots are counted by hand, right there in the precinct, but what happens to them afterward?

There’s no perfect defense against this problem, but a good start is to insist on transparency in the election technology, and to research useful technologies and procedures. It’s a hard problem, and we have a long way to go.