Pseudonyms: The Natural State of Online Identity

March 30, 2010 by Ed Felten

I’ve been writing recently about the problems that arise when you try to use cryptography to verify who is at the other end of a network connection. The cryptographic math works, but that doesn’t mean you get the identity part right.

You might think, from this discussion, that crypto by itself does nothing — that cryptographic security can only be bootstrapped from some kind of real-world identity verification. That’s the way it works for website certificates, where a certificate authority has to check your bona fides before it will issue you a certificate.

But this intuition turns out to be wrong. There is one thing that crypto can do perfectly, without any real-world support: providing pseudonyms. Indeed, crypto is so good at supporting pseudonyms that we can practically say that pseudonyms are the natural state of identity online.

To explain why this is true, I need to offer a gentle introduction to a basic crypto operation: digital signatures. Suppose John Doe (“JD”) wants to use digital signatures. First, JD needs to create a private cryptographic key, which he does by generating some random numbers and combining them according to a special geeky recipe. The result is a unique private key that only JD knows. Next, JD uses a certain procedure to determine the public key that corresponds to his private key. He announces the public key to everyone. The math guarantees that (1) JD’s public key is unique and corresponds to JD’s private key, and (2) a person who knows JD’s public key can’t figure out JD’s private key.

Now JD can make digital signatures. If JD wants to “sign” a certain message M, he combines M with JD’s private key in a special way, and the result is JD’s “signature on M”. Now anybody can verify the signature, using JD’s public key. Only JD can make the signature, because only JD knows JD’s private key; but anybody can verify the signature.

At no point in this process does JD tell anybody who he is — I called him “John Doe” for a reason. Indeed, JD’s public key is a perfect pseudonym: it conveys nothing about JD’s actual identity, yet it has a distinct “owner” whose presence can be verified. (“You’re really the person who created this public key? Then you should be able to make a signature on the message ‘squeamish ossifrage’ for me….”)

Using this method, anybody can make up a fresh pseudonym whenever they want. If you can generate random numbers and do some math (or have your computer do those things for you), then you can make a fresh pseudonym. You can make as many as you want, without needing to coordinate with anybody. This is all easy to do.

These methods, pseudonyms and signatures, are used even in cases where we want to verify somebody’s real-world identity. When you connect to (say) https://mail.google.com, Google’s web server gives you its public key — a pseudonym — along with a digital certificate that attests that that public key — that pseudonym — belongs to Google Inc. Binding public keys — pseudonyms — to real-world identities is tedious and messy, but of course this is often necessary in practice.

Online, identities are hard to manage. Pseudonyms are easy.

Filed Under: Privacy & Security Tagged With: Anonymity, Crypto, Privacy, Security

Comments

Ammon says

April 4, 2010 at 9:00 am

Interesting post on the technology behind creating anonymity — I look at similar issues from a philosopher’s perspective here:

http://doctorideas.blogspot.com/2010/03/anonymity-pseudonymity-and-internets.html
rp says

March 31, 2010 at 8:49 pm

in all this, since it’s relatively easy to create pseudonyms, is to create a bunch of them so that JD’s actions in different spheres can’t be cross-referenced as easily. You can prove that some set of posts in a web forum, say, all came from the same digital persona, and that some set of signed pictures on Flickr all came from another digital persona, but you can’t prove that both things originated from the same meatspace person.

Or can you? How far can the de-anonymizing techniqes that have been discussed here go toward linking different pseudonyms? Or, conversely, how far up the tower of source-obfuscation techniques would you have to go to maintain the separation of different pseudonyms? Some services (e.g. stores selling physical objects) pretty much require a designator the traces back to a real person, but Tor, pseudoymous domain registration and other tools can hide your home IP and email addresses. On the other hand, content-anaysis might well link texts or photos coming from a single author, regardless of signatures.
- felten says
  
  April 1, 2010 at 2:10 pm
  
  The public keys / pseudonyms themselves can be generated in a way that makes them unlinkable, so that’s not an issue. Of course, the things that you *say* while speaking pseudonymously will be characteristic of you, so they could in principle be linked to your identity — especially if you say a lot of distinctive things.
  - rp says
    
    April 2, 2010 at 1:11 pm
    
    And, of course, you should be careful not to say anything from the same IP that you use to purchase physical objects. Hmm. Does the potential proliferation of available addresses in IPv6 do anything useful here, or are all the addresses associated with a given leaf-node pipe effectively linked?
intel_chris says

March 31, 2010 at 1:56 pm

Having an all powerful man-in-the-middle is not as difficult in real-life as one would expect, it is called the man-in-the-browser. The basic point is that most people have only one connection to the internet, their PERSONAL computer. If you can put your man-in-the-middle on their personal computer, the user won’t be aware that it is there because they won’t have (or won’t think to check) an alternate channel to see if their public key is what they think it is Then, the man-in-the-browser truly does control the communication channel and can spoof the conversations to his content.

Note, for this reason, I recommend when people get suspicious (e.g. phishing) email that they don’t click the link in the email to check to see if it is valid, but instead they use an alternate way of checking the validity of the message (a separate communications channel, e.g. call your bank on your phone). I don’t know if there is a cryptographic term for having two independent channels, it may not even be a cryptographic concept. However, in dealing with the man-in-the-browser attack, it is important.
GaryM says

March 30, 2010 at 2:03 pm

An impersonator can make a second public-private key pair for the same pseudonym. How do you tell which one belongs to the “owner” of the pseudonym and which one is counterfeit?
- David N. says
  
  March 30, 2010 at 3:30 pm
  
  You are pointing out exactly why identities are hard to manage online. There is no easy way to decide who controls a specific public/private key pair, but it is easy to confirm that two different signed messages were created by the same public/private key pair. The article defines a pseudonym not as another name (like Mark Twain for Samuel Clemens), but as the public key itself, and the math provides the guarantee that (in all practical terms) a specific public key matches up with only its matching private key. So ownership of a pseudonym becomes defined as knowledge of the private key for that pseudonym, which can be transferred but not impersonated.
- felten says
  
  March 30, 2010 at 4:36 pm
  
  In the terminology of the main article, a public key IS IDENTICAL TO a pseudonym. So if somebody generates a new public-private key pair, the new key pair will necessarily have a different public key, and hence will necessarily belong to a different pseudonym. So GaryM’s attack doesn’t work here.
  
  These pseudonyms aren’t easy to remember, but they are immune to impersonation.
  - rp says
    
    March 31, 2010 at 8:34 pm
    
    Isn’t it more properly speaking the key pair that is the pseudonym? JD can’t sign something without the private key, but no one else can recognize the signature as JD’s without the public key.
    - felten says
      
      April 1, 2010 at 12:05 pm
      
      I think it’s the public key alone that plays the role of a pseudonym. Certainly you have to generate the public and private keys together, as a pair. But the pseudonym is the name by which the public knows you — which is only the public key. (The private key, obviously, is not known to the public.)
Bill P. Godfrey says

March 30, 2010 at 1:11 pm

Interesting post, thanks.

I recall a discussion on this topic a long time ago when I had a casual interest in anti-spam techniques.

The idea was that someone (JD) maintained a list of bad IP addresses. However, JD realised that some people might not want JD to maintain and publish his list, and would various means to stop him. (Lawsuits, DoS attacks, personal threats, etc.)

So instead, he would anonymously publish updates to his list as signed Usenet messages. The idea was that even though he was anonymous, he would build up a reputation for quality over time and spammers couldn’t pollute his list by posting fake updates because those fake updates would fail the signature check.

The objection to this scheme was the man-in-the-middle problem. Under this plan, JD’s first message to the world would be a unsigned PGP key over the same fakeable Usenet channel. What if, a spammer managed to capture all of JD’s posts, including the initial key, and craft his own fake posts with the spammer’s IP missing?

(In the real world Usenet, this attack would be a rather impractical, but that’s beside the point. Just acknowledging the fact.)
- David N. says
  
  March 30, 2010 at 2:23 pm
  
  Hi Bill,
  
  You make a good point that someone with full control over JD’s connection to Usenet (say, his ISP) has the opportunity to capture all of his signed posts and replace them with their own, using a new public/private key pair that they control, as well as to replace all requests by JD for any of the altered posts with its original so that JD doesn’t notice something wrong. However, this becomes unsustainable as time passes and the number of paths JD can take to reach the published information (moving locations, checking with friends, referring to the public key for another purpose) increases beyond our attacker’s reach.
  
  And this all assumes that the original public key was released over the same channel as the posts, which is quite insecure for exactly this reason and can be avoided by pointing somewhere else (e.g., a public key server) where his public key has already been established and cannot be easily replaced.
- felten says
  
  March 31, 2010 at 7:52 am
  
  I ignored the issue of man-in-the-middle attacks because there is another crypto trick called Diffie-Hellman Key Exchange, that can prevent them in the scenario I discussed. If you want to connect to JD and to be sure that nobody is in the middle, then you and JD can exchange messages according to the Diffie-Hellman procedure, then JD can put his signature on a transcript of that message exchange. JD’s signature on the transcript guarantees that a man-in-the-middle (if there is one) did not mess with the messages of the Diffie-Hellman exchange. From that fact, and the properties of the Diffie-Hellman method, we can prove that you and JD can now compute a secret key that only the two of you know. From this point on, a man-in-the-middle is not a problem.
  
  I ignored this issue in the original post for simplicity. But, come to think of it, Diffie-Hellman is cool enough that it might be worth explaining in a subsequent post.
  - John Doe says
    
    April 3, 2010 at 1:05 pm
    
    Nope, calling upon good old Diffie and Hellman ain’t going to help you here Ed, without additional authentication they can only prevent passive eavesdroppers from stealing the session key.
    
    A man in the middle who has the power to replace the public key in JDs initial (and following) key announcements to the world will simply do two D-H key exchanges, one with JD and one with his intended recipient and neither part will have a clue.
    
    The only way JD can mitigate the risk of his pseudonym being compromised is to publish the public key in as many ways, shapes and forms as possible.
    
    Please excuse me while I climb the nearest rooftop, I have some screaming to do.
- John Millington says
  
  March 31, 2010 at 11:25 am
  
  “What if, a spammer managed to capture all of JD’s posts, including the initial key, and craft his own fake posts with the spammer’s IP missing?”
  
  A spammer capable of doing that, is powerful beyond imagining. They own all the telecoms and governments, and they’re probably already hiding under your bed, right now!
  
  I have a great protocol for crossing the street safely (look both ways), and I think it works pretty well, but if you’re pointing out that I can still be hit on the head by a meteorite, then I guess you’re right.
  - Bill P. Godfrey says
    
    March 31, 2010 at 7:03 pm
    
    That’s almost exactly how I recall the discussion continued back then.
    
    Remember, theory and practice are the same. (In theory.)
    
    (Just to clarify – I wasn’t taking part in that original discussion, just observing.)
Nicholas Bohm says

March 30, 2010 at 10:47 am

“Only JD can make the signature, because only JD knows JD’s private key; but anybody can verify the signature.”

But to make the signature, JD has to use a computer, as does the verifier. Almost all computers are untrustworthy in the face of sophisticated malicious software, so there are important limits on how sure verifiers can be that it was really JD that signed the message they see.
felten says

March 30, 2010 at 9:00 am

Let me address a few questions you might have, if you’re a crypto expert. (Non-experts may find this incomprehensible; sorry about that.)

(1) I described a key-pair generation process where the private key is generated by a randomized procedure, then the public key is computed from the private key. Not all keygen algorithms work this way. For example, RSA keygen can generate part of the private key (the modulus) first, then choose the public key arbitrarily, then compute the rest of the private key. This doesn’t affect my argument at all, but it does complicate the explanation, so I ignored it in the main post.

(2) Some public-key cryptosystems let you encode your true identity into your public key. For example, your RSA public exponent can be an encoding of your name. However, a name encoded this way is utterly unverified, so it’s still effectively a pseudonym.

(3) Identity-based encryption (IBE) works differently, I’ll admit, but it doesn’t invalidate my argument. IBE requires some authority who does the necessary due diligence to verify a person’s identity before giving them their private key. So IBE doesn’t change the fact that pseudonyms are easy, and verified identities are hard.

Pseudonyms: The Natural State of Online Identity

Comments

Contributors

Archives by Month

Pseudonyms: The Natural State of Online Identity

Comments

What We Discuss

Contributors

Archives by Month