December 2, 2020

Network Monitoring: Harder Than It Looks

Proposals like the Cal-INDUCE bill often assume that it’s reasonably easy to monitor network traffic to block certain kinds of data from being transmitted. In fact, there are many simple countermeasures that users can (and do, if pressed) use to avoid monitoring.

As a simple example, here’s an interesting (and well known) technical trick. Suppose Alice has a message M that she wants to send to Bob. We’ll treat M as a number (bearing in mind that any digital message can be thought of as a number). Alice chooses a random number R which has the same number of digits as M. She sends the message R to Bob; then she computes X = M-R, and sends the message X to Bob. Obviously, Bob can add the two messages, R + (M-R), and the sum will be M – the message Alice originally wanted to send him.

[Details, for mathematical purists: all arithmetic is done modulo a large prime P; R is chosen randomly in [0, P-1]. When I say a value “looks random” I mean that it is indistinguishable (in the information-theoretic sense) from a random value.]

Now here’s the cool part: both of the messages that Alice sends look completely random. Obviously R looks random, because Alice generated it randomly. But it turns out that X looks random too. To be more precise: either message by itself looks completely random; only by combining the two messages can any information be extracted.

By this expedient, Alice can foil any network monitor who looks at network messages one at a time. Each individual message looks innocuous, and it is only by storing messages and combining them that a monitor can learn what Alice is really telling Bob. If Alice sends the two messages by different paths, then the monitor has to gather messages from multiple paths, and combine them, to learn what Alice is telling Bob.

It’s easy for Alice to extend this trick, to split her message M into any number of pieces. For example, Alice could split M into five pieces, by generating four random numbers, R1, R2, R3, and R4, and then computing X = M-(R1+R2+R3+R4). Given any four of these five pieces, nothing can be deduced. Only somebody who has all five pieces, and knows to combine them by addition, can extract information. So a monitor has to gather and compare many messages to see what Alice is up to, even though Alice isn’t using encryption.

There are many more technical tricks like this that are easy for Alice and Bob to adopt, but hard for network monitors to cope with. If the monitors want to engage in an arms race, they’ll lose.

Comments

  1. “If the monitors want to engage in an arms race, they’ll lose.”

    I agree with this to a point. From a technical standpoint, I’m sure you are completely correct.

    However, I’m wondering how hard it would be to detect the mere use of this trick? The data appears random, you say… but how random does other traffic appear at that level? I’m thinking along these lines because, based on precedence, we could very easily end up with laws that forbid using this sort of masking at all.

    The idea then is that it probably won’t matter what’s being sent… they’ll take action against the user based on the method of transmission.

  2. Cypherpunk says:

    Or Alice can just encrypt her content to Bob, right?

    I think the place the legal system would attack such a mechanism would be in the software which handled the extra steps to conceal the content. For any concealment process to come into widespread use on a P2P network, there has to be a standard that people follow, a protocol for the software, and an implementation of the protocol. Software which implemented this concealment would become the point of legal attack, since there would be no “legitimate” purpose in such a concealment layer.

    In the old days of Napster, there was an era where the system was required to filter out content with certain file names provided by the record companies. What sprang up was an informal system of obfuscating the names to get around this. But it was inefficient; there were multiple incompatible obfuscations in use, and the filter was gradually enhanced to look for the common obfuscations. There was no clear winner in this arms race but it certainly made the system less convenient for illicit file sharing than it had once been.

    In the same way, informal concealment mechanisms will reduce usability, and formal ones can be subjected to legal attack. Either way, the usefulness of the system for illicit purposes will be reduced.

  3. “method of transmission”

    What do you even mean by method of transition? That can be circumvented just as easily. Between the all of different protocols and encodings, transmission, detecting is just straight impossible.

    Think of using Felten’s method (or more simply XORing the plaintext with a cryptographically random bit string of the same length) and posting the two items onto a “web site” in HTML Encoded (or UUEncoded, MIME encoded or as hex characters written in unicode (UTF-8, 16, 32?)) and available on port 80, retrievable using HTTP.

    What about encoding the file as a shell/perl/python script or exe that when run writes out the data file? Then posting the source for these various “song generators” into web pages. (Postscript kind of works like that anyways; I don’t believe it would be too hard to write a tool to generate the program.)

    A simple addition of one more of these techniques invalidates all current detection programs. Even more, as the number of variations of these encodings increases, the network detectors are stuck decoding all possible interpretations of the bit stream.

    And that’s not even getting into real steganography or encryption. If we were to do that, then there is a really simple thing that would totally eliminate network based sniffing: use SSL connections.

  4. Roland Schulz says:

    Cypherpunk,

    So in the same scenario we would outlaw any encryption software completely since there would be no “legitimate” purpose of encrypting content, it’s only used for copyright violation at all, and lawful citizens don’t need to fear the government snooping on them? I think you’ll agree this is not the case.

    What Ed Felten describes is basically the same as encrypting with a one time pad, except making the one time pad public, too (therefore it’s not really encryption). A legitimate use would be to store data in remote locations, but without any of the parties alone being able to read the data.
    Notice that you can do this for data you own the rights to, and there is a legitimate purpose for doing it.

    To defeat a theoretical ban on P2P software using this technique to prevent monitors, the user just would need to use an external program to do the splitting, and announce all parts (each being a file on its own) as usual on a normal P2P network. The recipient downloads all the parts, and then uses a separate application to combine them. This will require a little more effort from the P2P users, but if the arms race forces them to it, I’ll bet they’ll be fast to adopt it.

    By the way: if the sharing of the parts was done by different P2P users, would each one be able to plead ignorance of the original contents of the infringing file in court? Since you can always make up a one time pad to turn one file into what you want it to be, one of the files could be an already well spread one, like a recent linux iso image or a porn file, and the seeder just supplies the key to turn this into the most recent 0-day movie?

  5. Scott Craver says:

    This sort of obfuscation (you could also AES encrypt the document, prepended with the key) must remain legal in the US, unless cryptography itself is outlawed.

    I assert the random string theory of censorship: that in a free society, people are allowed to store and send one other random bit strings.

    In steganography, we often assume that overtly random strings are outlawed; even in this censored environment, any residual entropy in “innocent” messages could allow the transmission of a random string. Such censorship is only possible if the censor can either detect the imperfect modification of that entropy, or can filter it out by noising.

    Scott

  6. Benjamin Reeve says:

    Do those who know what will be in the Grokster briefs due March 1, 2005 know whether a capable discussion of the real problems associated with finding “contribution” (as in “contributory infringement”) and with the practical limits to sanctions and penalities will be in ANY of them? This Old Court has not proved especially talented at understanding technological matters, much less coming to thoughtful rulings on them, but to have so many of these frooey “balancing test” discussions out there, and no real descriptive understanding of networks, information, simulation, etc. can’t help.

    The answer, by the way, to the described encryption methods getting around the surveillance will necessarily be to extend the supervision into the apparatus used by the user. The intended control cannot otherwise be achieved. That is what DRM begins to be about, but the logic can be extended to the point to which you would not say it is “your” computer. And when it is someone else’s computer, that’s right, the someone else can decide what it does or does not do.

    The question, the answer to which is not clear — but it may be no, is whether people will buy computers that then are not theirs.