A while back I wrote about a method (well known to cryptography nerds) to frustrate network monitoring. It works by breaking up a file into several shares, in such a way that any individual share, and indeed any partial subset of the shares, looks entirely random, but if you have all of the shares then you can add them together to get back the original file. Today I want to develop this idea a bit further.
The trick I discussed before sent the shares one at a time, perhaps interspersed with other traffic, so that a network monitor would have to gather and store all of the shares, and know that they were supposed to go together, in order to know what was really being transmitted. The network monitor couldn’t just look at one message at a time. In other words, the shares were transmitted from the same place, but at different times.
It turns out that you can also transmit the shares from different places. The idea is to divide a file into shares, and put one share on server A, another on server B, another on server C, and so on. Somebody who wanted the file (and who knew how it was distributed) would go to all of the servers and get one share from each, and then combine them. To figure out what was going on, a network monitor would have to be monitoring the traffic from all of the servers, and it would have to know how to put the seemingly random shares together. The network monitor would have to gather information from many places and bring it together. That’s difficult, especially if there are many servers involved.
If the network monitor did figure out what is going on, then it would know which servers are participating in the scheme. If Alice and Bob were both publishing shares of the file, then the network monitor would blame them both.
Congratulations on making it this far. Now here’s the cool part. Suppose that Alice is already publishing some file A that looks random. Now Bob wants to publish a file F using two-way splitting; so Bob publishes B = F-A, so that people can add A and B to get back F. Now suppose the network monitor notices that the two random-looking files A and B add up to F; so the network monitor tries to blame Alice and Bob. But Alice says no – she was just publishing the random-looking file A, and Bob came along later and published F-A. Alice is off the hook.
But note that Bob can use the same excuse. He can claim that he published the random-looking file B, and Alice came along later and published F-B. To the monitor, A and B look equally random. So the monitor can’t tell who is telling the truth. Both Alice and Bob have plausible deniability – Alice because she is actually innocent, and Bob because the network monitor can’t distinguish his situation from Alice’s. (Of course, this also works with more than two people. There could be many innocent Alices and one tricky Bob.)
Bob faces some difficulties in pulling off this trick. For example, the network monitor might notice that Alice published her file before Bob published his. Bob doesn’t have a foolproof scheme for distributing files anonymously – at least not yet. But stay tuned for my next post about this topic.
This looks similar to how Distributed Hash Tables and Key Based Routing works. The file is split up into multiple blocks, and each block is inserted into the network. Since each block has a hash that probably looks random, it is inserted into different parts of the network. (The network usually froms a pseudo-ring, mapping hashes to various parts of the ring.)
Adding a bit of crypto to the split files would make it almost impossible to determine what’s in the blocks until they are all retrieved.
Seems to me a key thing here is to make automated discovery of infringers impossible, say by making robotic detection of infringing traffic necessarily have to embody a solution to the NLP. Then the RIAA has to employ actual human beings to discover how and from whom mp3z are being distributed, which hopefully makes leaving p2p alone cost less for them than stopping p2p and suing infringers, even after any settlement or judgment awards are factored in. At that point the p2p users have won — either the fooAA organizations leave them alone, or they don’t out of sheer spite and fritter away their money and go bankrupt. Or someone solves the NLP, which would be a good outcome too I suppose. 🙂
This issue isn’t security, it’s obscuring the fact that anything that needs security is even going on in the first place.
If you monitor a connection, and all that is happening is P2P downloads and generic web-browsing, then nothing stands out and you move on to monitor someone else. Everything going in or out of the computer is innocuous, espcially with the rise of “spoofed” P2P files.
Who is to say that the corrupt copy of “Let it Be” they downloaded wasn’t spoofed by the RIAA, instead of a “payload” file. And surely no one looks suspicious reading articles from CNN?
There is no evidence of the fact that once on the user’s computer, that data can be combined to produce (whatever).
That’s the beauty of the system. Never mind the “backchannel” – that’s nearly a non-issue. “Hey Bob, it’s Dave. Have you seen that article on FDA researchers on CNN’s site?” “Hey Bob, it’s Carol. I saw that song you were looking for from [userx] on Kazaa”
There’s a forest/trees disconnect in these comments. It’s not meant to be “uncrackable”, or “perfectly secure”, or “the perfect solution to one-time pads”, or even “something truly new and unique”. It’s a great run at security through obscurity, and hiding things in plain sight.
Been there, done that. In order to pull this off you need a widely distributed shared data pool. “Way back when…” we designed this sort of system as an anonymous email system (e.g. mailboxes that were not on your system until you actively when and retrieved instead of just messages that get forwarded to you) and realized that the first component you needed to create is this shared data pool. So we created one (called Mojo Nation, from whence BitTorrent sprung at a later point) and discovered just how hard that particular task was to actually pull off… Getting to phase 2 of the plan ended up getting postponed indefinitely.
The sub-genre in the crypto literature that anyone really interested in this should check out is called “private information retrieval.”
It sounds so easy when you draw up little mental diagrams of how it should work, it is fiendishly difficult to actually implement without leaking information elsewhere in the system, and it is next to impossible to create a shared incentive structure that prevents attackers from filling up the data pool with random data.
It turns out that the simpler, albeit less secure, systems where enrypted mail is forwarded to you and you share files with people you trues is a lot easier to actually implement.
I think the point of this was not so much secrecy as denial of liability. Ed focuses on who is to blame, not on hiding the existence of the file.
Another problem though is in communicating the fact that these files available from Alice and Bob have the necessary relationship. Someone who wants to effectively share the data must include, in effect, a pair of URLs for the Alice and Bob files. Anyone who distributes this pair might also be said to be to blame, if it comes down to legal remedies and assigning liability.
The major sticking point you mention relates directly to my note about one time pads. You’ll need some sub-channel communications (via Tor as you mention, or stego, or encrypted and hidden in packet headers, et cetera) to signal A, B, C and D about the various cyphertext and secret file they want to share. It then gets complex (which defeats the purpose of the simplicity of the underlying system) The sub-channel becomes a weak point — sub-channel distribution relies on knowledge that the sub-band is not being intercepted, and that A, B, C and D are all trusted parties. Once one untrusted party E has either the sub-channel information or the secret, then the entire structure falls apart.
The idea seems pretty neat, though, especially through exploitation of the ambiguity of cyphertexts with multiple keys resulting in different plaintext.
There’s one question that needs to be answered. What do you want it to be for? Is this for widespread P2P distribution–in which case simple XORing with one or more pads or keys might be just another minor bump in the P2P “arms race” (but it won’t stop serious monitoring by copyright holders or our friends off of Interstate 295). Moreover, for widespread P2P distribution, this won’t work because knowing the secret or the sub-carrier defeats the system.
You need a layer of security on the sub-carrier level that is strong enough that it would render the simple obfuscation of the secret file via cyphertexts and pads moot.
Anyway, I don’t pretend to be an expert in this stuff. My only point is that while the idea seems great for a single group of trusted people distributing a single secret file, on a P2P scale it seems to become a much more complex problem.
Note: I’m answering from the perspective of how I personally see the implementation — this may have nothing at all to do with Ed’s original idea. I have been known to not only grab the wrong end of the stick but to grab something that’s not even a stick.
1) Er, sort of.
2) It’s not designed to be “one time” — the theory is that lots of people are downloading lots of stuff from lots of places, no individual piece of which is identifiable as either “interesting” or meaningful. Some of it looks meaningful but may not be (or may mean something different), some of the noise will have meaning and some will be noise, one peer’s noise is another peer’s key, etc.
3) Yes.
4) See 2.
5) It’s more complicated than that; it may not involve just A and B, and B may be getting stuff just to forward it to C, etc. Here’s a small example:
I’m downloading spreadsheets from A, text files from B, random junk (which may or may not be important junk) from C and pictures of granny from D; A is downloading spreadsheets from D, text files from C, random junk from me, and pictures of granny from B; etc. I may not personally want the pictures of granny and am only getting it so I can send it to A, but it’s some handy noise. (Who’s supplying what to whom will change all the time — one minute my noise is coming from C, the next it’s coming from A.)
Now the eavesdropper has several problems:
Which file is the cyphertext, which is the fake key, which is the real key, and which things are what they appear to be, and who is it exactly that is supplying the material they’re interested in?
Short of intercepting all the traffic from everyone all the time they’re going to have to rely on a great deal of luck. Couple this with some sort of encrypting anonymous packet forwarding service (like Tor — http://tor.eff.org/) and it’s the Devil’s own job to tell what the heck is going on.
Sure, it’s inefficient, but the ineffciency is part of the trick — the signal to noise ratio (coupled with the fact that the eavesdropper doesn’t even know which bits that look like signal are actually signal, and the fact that some of the noise may not be noise) makes monitoring more than a simple case of “what’s coming out of A and going to B?”.
The major sticking point is that A, B, C and D also need to know which is what — if this were to be implemented as a P2P system each peer would have to send some requested data plus a lot of other crud without somehow allowing an eavesdropper to decipher the request; it’s doable (perhaps by implementing a Tor-style routing system within the P2P “network”), but it’s not easy.
(This idea came to me as a spin-off of a diff-style patching system I wrote years back; of course in the world at large it wasn’t new, but I was young and foolish! Once it dawned on me that I could send a genuine patch and a not-so genuine one (that may in turn actually have been a genuine patch for something else — is it noise or not?) that could then be combined in to something else entirely I got all excited; however, this was 10 years ago and the most common computer connections were about as fast as a piece of string tied between two cans so any file transfer system that relied on sending a lot of totally unneccessary data wasn’t going to be popular.)
Sorry, rambled on a bit there!
(1) isn’t this really just a digital one time pad, where one file serves as the pad and the other serves as the cyphertext?
(2) if it is really a one time pad, then it can never be transmitted again after its first use without losing its effectiveness.
(3) The excel spreadsheet/mp3 idea is a version of a pad system with three files, a cyphertext, fakekey1 and realkey2, where cyphertext + fakekey1 becomes shakespeare (or an excel file), and the cyphertext + realkey2 becomes the secret.
(4) Its impractical on a large scale because you’ll need not just pad codes for each file – you’ll need a new pad code for each time the file is going to be transmitted. In a P2P system many people need access to both A & B’s file, and as the repeated transmission of the same coded files between users occurs, the value of the pad quickly goes to zero.
(5) Of course, the best way to circumvent this is just to tap A & B at the source, so whatever transmission coding they use is irrelevant.
Cypherpunk,
Actually, I think there is a lot of random internet traffic – anything that is appropriately encrypted essentially appears as random transmissions.
To me, the interesting extension of Ed’s ideas are to a group of p2p filesharers. I’ve been toying with an idea for bittorrent hosting service like the one that Ed linked to in his linkblog and it would be greatly enhanced by some true annonymizing.
– Mike
“And what about file sizes? Many file formats are not self-delimiting. Won’t your MP3 have loud noise at the end of it if there is extra random data there? Yet if one of Alice or Bob trims their data to be the exact length of the MP3 file, they are incriminated.”
My implementation used a short header on the “password” to indicate the length of the shorter file, and used some built-in “noise” to generate the slack space. That way “encoding” a short file with a long one produced a long key that was still sufficiently different from either file, decoding the long file with the key produced a short file, and decoding the short file with the key produced the long file.
Of course it’s better to start with files of roughly equal length, which is why dummy data like spreadsheets or WP documents are ideal — just add or remove text until it’s the right size.
Why would Alice be publishing a random file? That would look pretty incriminating if this idea caught on. Granted, a few people might legitimately publish random files, such as random.org, but for your idea to work, substantial numbers of people need to be publishing such files.
(Years ago I suggested using random dot stereograms as cover traffic for random files. Maybe something like that could work.)
And what about file sizes? Many file formats are not self-delimiting. Won’t your MP3 have loud noise at the end of it if there is extra random data there? Yet if one of Alice or Bob trims their data to be the exact length of the MP3 file, they are incriminated.
I wrote an implementation of a similar scheme a while back, but applied a slight twist to it in terms of interpretation. A was the file “I’m not supposed to have” (for example one of those evil MP3s), B was any other similarly sized file (some sort of tedious Excel spreadsheet, say), and F (produced by XORing A and B) I called a “password”.
Now all I need to do is delete B. If the RIAA come knocking because they don’t like me making copies of legitimately purchased items, all I need say was “no, it’s just a spreadsheet, and it’s pure coincidence that I encrypted it and it ended up being identical to your material — after all there are a finite number of files of any given size; allow me to demonstrate by unencrypting it”.
One downside with this file splitting is that F tends to be VERY random and not very easily compressed, thus effectively doubling storage requirements. (Some saving can be made by chaining these events and using the F of one file as the B for another, but this can make transmitting an arbitrary file require transmitting many other intermediate files…)
The other downside, of course, is that it’s an incredibly lame defense. 🙂
Bob should check out http://www.cl.cam.ac.uk/users/rja14/eternity/eternity.html
:^)