January 18, 2025

Pseudonyms: The Natural State of Online Identity

I’ve been writing recently about the problems that arise when you try to use cryptography to verify who is at the other end of a network connection. The cryptographic math works, but that doesn’t mean you get the identity part right.

You might think, from this discussion, that crypto by itself does nothing — that cryptographic security can only be bootstrapped from some kind of real-world identity verification. That’s the way it works for website certificates, where a certificate authority has to check your bona fides before it will issue you a certificate.

But this intuition turns out to be wrong. There is one thing that crypto can do perfectly, without any real-world support: providing pseudonyms. Indeed, crypto is so good at supporting pseudonyms that we can practically say that pseudonyms are the natural state of identity online.

To explain why this is true, I need to offer a gentle introduction to a basic crypto operation: digital signatures. Suppose John Doe (“JD”) wants to use digital signatures. First, JD needs to create a private cryptographic key, which he does by generating some random numbers and combining them according to a special geeky recipe. The result is a unique private key that only JD knows. Next, JD uses a certain procedure to determine the public key that corresponds to his private key. He announces the public key to everyone. The math guarantees that (1) JD’s public key is unique and corresponds to JD’s private key, and (2) a person who knows JD’s public key can’t figure out JD’s private key.

Now JD can make digital signatures. If JD wants to “sign” a certain message M, he combines M with JD’s private key in a special way, and the result is JD’s “signature on M”. Now anybody can verify the signature, using JD’s public key. Only JD can make the signature, because only JD knows JD’s private key; but anybody can verify the signature.

At no point in this process does JD tell anybody who he is — I called him “John Doe” for a reason. Indeed, JD’s public key is a perfect pseudonym: it conveys nothing about JD’s actual identity, yet it has a distinct “owner” whose presence can be verified. (“You’re really the person who created this public key? Then you should be able to make a signature on the message ‘squeamish ossifrage’ for me….”)

Using this method, anybody can make up a fresh pseudonym whenever they want. If you can generate random numbers and do some math (or have your computer do those things for you), then you can make a fresh pseudonym. You can make as many as you want, without needing to coordinate with anybody. This is all easy to do.

These methods, pseudonyms and signatures, are used even in cases where we want to verify somebody’s real-world identity. When you connect to (say) https://mail.google.com, Google’s web server gives you its public key — a pseudonym — along with a digital certificate that attests that that public key — that pseudonym — belongs to Google Inc. Binding public keys — pseudonyms — to real-world identities is tedious and messy, but of course this is often necessary in practice.

Online, identities are hard to manage. Pseudonyms are easy.

China, the Internet and Google: what I planned to say

In the run-up to and aftermath of Google’s decision yesterday to remove its Chinese search engine from China, I wrote two posts on my personal blog: Chinese netizens’ open letter to the Chinese government and Google and “One Google, One World; One China, No Google”

Today, the Congressional Executive China Commission conducted a hearing titled Google and Internet Control in China: A Nexus Between Human Rights and Trade? They had originally invited me to testify in a similarly titled hearing, “China, the Internet and Google,” which was postponed and rescheduled twice: the first attempt was foiled by the Great Snowcalypse; the second attempt scheduled for March 1st was postponed again at the last minute for some reason that isn’t entirely clear. Meanwhile I had already gone and written my testimony, improved by very helpful input from the CITP community. Unfortunately, when they rescheduled the hearing they said I was no longer invited. They wanted the hearing to have different witnesses from recent related hearings in both the House and Senate. Given that I appeared in both hearings it seems reasonable that they’d want to hear from some other people.

Given the effort that went into my testimony, however, and since it drills down in a lot more detail on China than my testimony for the other hearings, I think there is some value in my sharing it with the world. Here is the PDF and here it is as a web page. Some highlights:

From the introduction:

China is pioneering a new kind of Internet-age authoritarianism. It is demonstrating how a non-democratic government can stay in power while simultaneously expanding domestic Internet and mobile phone use.  In China today there is a lot more give-and-take between government and citizens than in the pre-Internet age, and this helps bolster the regime’s legitimacy with many Chinese Internet users who feel that they have a new channel for public discourse. Yet on the
other hand, as this Commission’s 2009 Annual Report clearly outlined, Communist Party control over the bureaucracy and courts has strengthened over the past
decade, while the regime’s institutional commitments to protect the universal rights and freedoms of all its citizens have weakened.

Google’s public complaint about Chinese cyber-attacks and censorship occurred against this backdrop.  It reflects a recognition that China’s status quo – at least when it comes to censorship, regulation,and manipulation of the Internet – is unlikely to improve any time soon, and
may in fact continue to get worse.

Overview of Chinese Internet controls

Chinese government attempts to control online speech began in the late 1990’s with a focus on the filtering or “blocking” of Internet content. Today, the government deploys an expanding repertoire of tactics.

In other words, filtering is just one of many ways that the Chinese government limits and controls speech on the Internet. The full text then gives descriptions and explanations of the other tactics, but in brief they include:

  • deletion or removal of content at the source
  • device and local-level controls
  • domain name controls
  • localized disconnection or restriction
  • self-censorship due to surveillance
  • cyber-attacks
  • government “astro-turfing” and “outreach”
  • targeted police intimidation

I then describe a number of efforts by Chinese netizens to push back against these tactics, which include (see the full text for further explanation):

  • informal anti-censorship support networks
  • distributed web-hosting assistance networks
  • crowdsourced “opposition research”
  • preservation and redistribution of censored content
  • humorous “viral” protests
  • public persuasion efforts

I end with a set of recommendations. Once again, see the full text for explanations, but here is the basic list:

  • anti-censorship tools – including outreach and education in their use
  • anonymity and security tools – to help people better defend against cyber-attacks, spyware, and surveillance
  • platforms and networks for the capture, storage, and redistribution of content that gets deleted from domestic social networking and publishing services
  • support for “opposition research” – remember the Chinese netizens who deconstructed Green Dam?
  • corporate responsibility – see Global Network Initiative, but also appropriate legislation if American and other Western Internet companies fail to accept the idea that they have some obligations as far as free expression and privacy are concerned
  • private right of action – so that Chinese victims can sue U.S. companies in U.S. courts
  • incentives for innovation by the private sector that helps Chinese Internet users access blocked sites as well as protect themselves from attacks and surveillance.

My conclusion:

Many of China’s 384 million Internet users are engaged in passionate debates about their communities’ problems, public policy concerns, and their nation’s future. Unfortunately these public discussions are skewed, blinkered, and manipulated – thanks to political censorship and surveillance. The Chinese people are proud of their nation’s achievements and generally reject critiques by outsiders even if they agree with some of them. A democratic alternative to China’s Internet-age authoritarianism will only be viable if it is conceived and built by the Chinese people from within. In helping Chinese “netizens” conduct an un-manipulated and un-censored discourse about their future, the United States will not imposing its will on the Chinese people, but rather helping the Chinese people to take ownership over their own future.

CITP is a Google Summer of Code 2010 Mentoring Organization

The Google Summer of Code program provides student stipends for summer work on open source projects. CITP is thrilled to have been chosen as a mentoring organization for 2010, meaning that students will be working on some CITP projects this summer. We think that these projects are very interesting, and potential participants now have the opportunity to propose their ideas for what they’d like to work on. Applications accepted from March 29 to April 9.

You can browse our list of project ideas, read our overall description, and apply here.

Side-Channel Leaks in Web Applications

Popular online applications may leak your private data to a network eavesdropper, even if you’re using secure web connections, according to a new paper by Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. (Chen is at Microsoft Research; the others are at Indiana.) It’s a sobering result — yet another illustration of how much information can be leaked by ordinary web technologies. It’s also really clever.

Here’s the background: Secure web connections encrypt traffic so that only your browser and the web server you’re visiting can see the contents of your communication. Although a network eavesdropper can’t understand the requests your browser sends, nor the replies from the server, it has long been known that an eavesdropper can see the size of the request and reply messages, and that these sizes sometimes leak information about which page you’re viewing, if the request size (i.e., the size of the URL) or the reply size (i.e., the size of the HTML page you’re viewing) is distinctive.

The new paper shows that this inference-from-size problem gets much, much worse when pages are using the now-standard AJAX programming methods, in which a web “page” is really a computer program that makes frequent requests to the server for information. With more requests to the server, there are many more opportunities for an eavesdropper to make inferences about what you’re doing — to the point that common applications leak a great deal of private information.

Consider a search engine that autocompletes search queries: when you start to type a query, the search engine gives you a list of suggested queries that start with whatever characters you have typed so far. When you type the first letter of your search query, the search engine page will send that character to the server, and the server will send back a list of suggested completions. Unfortunately, the size of that suggested completion list will depend on which character you typed, so an eavesdropper can use the size of the encrypted response to deduce which letter you typed. When you type the second letter of your query, another request will go to the server, and another encrypted reply will come back, which will again have a distinctive size, allowing the eavesdropper (who already knows the first character you typed) to deduce the second character; and so on. In the end the eavesdropper will know exactly which search query you typed. This attack worked against the Google, Yahoo, and Microsoft Bing search engines.

Many web apps that handle sensitive information seem to be susceptible to similar attacks. The researchers studied a major online tax preparation site (which they don’t name) and found that it leaks a fairly accurate estimate of your Adjusted Gross Income (AGI). This happens because the exact set of questions you have to answer, and the exact data tables used in tax preparation, will vary based on your AGI. To give one example, there is a particular interaction relating to a possible student loan interest calculation, that only happens if your AGI is between $115,000 and $145,000 — so that the presence or absence of the distinctively-sized message exchange relating to that calculation tells an eavesdropper whether your AGI is between $115,000 and $145,000. By assembling a set of clues like this, an eavesdropper can get a good fix on your AGI, plus information about your family status, and so on.

For similar reasons, a major online health site leaks information about which medications you are taking, and a major investment site leaks information about your investments.

The paper goes on to consider possible mitigations. The most obvious mitigation is to add padding to messages so that their sizes are not so distinctive — for example, every message might be padded to make its size a multiple of 256 bytes. This turns out to be less effective than you might expect — significant information can still leak even if messages are generously padded — and the padded messages are slower and more expensive to transmit.

We don’t know which sites the researchers studied, but it seems like a safe bet that most, if not all, of the sites in these product categories have similar problems. It’s important to keep these attacks in perspective — bear in mind that they can only be carried out by someone who can eavesdrop on the network between you and the site you’re visiting.

It’s becoming increasingly clear that securing web-based applications is very difficult, and that the basic tools for developing web apps don’t do much to help. The industry, and researchers, will be struggling with web app security issues for years to come.

Domain Names Can't Defend Themselves

Today, the Kentucky Supreme Court handed down an opinion in the saga of Kentucky vs. 141 Domain Names (described a while back here on this blog). Here’s the opinion.

This case is fascinating. A quick recap: Kentucky attempted a property seizure of 141 domain names allegedly involved in gambling on the theory that the domain names themselves constituted “gambling devices” under Kentucky law and were therefore illegal. The state held a forfeiture hearing where anyone with an interest in the “property” could show up to defend their interest in the property; otherwise, the State would order the registrars to transfer “ownership” of the domain names to Kentucky. No individual claiming that they own one of the domain names showed up. Litigation began when two industry associations (iMEGA and IGC) claimed to represent unnamed persons who owned these domain names (and another lawyer showed up during litigation claiming representation of one specific domain name).

The subsequent litigation gets a bit complicated; suffice it to say that the issue of standing was what got to the KY Supreme Court: could an association that claimed it represented an owner of a domain name affected in this action properly represent this owner in court without identifying that owner and that the owner was indeed the owner of an affected domain name?

The Kentucky Supreme Court said no, that there needs to be at least one identified individual owner that will suffer harm before the association can stand in stead, ruling,

Due to the incapacity of domain names to contest their own seizure and the inability of iMEGA and IGC to litigate on behalf of anonymous registrants, the Court of Appeals is reversed and its writ is vacated.

And on the issue of whether a piece of property can represent itself:

“An Internet domain name does not have an interest in itself any more than a piece of land is interested in its own use.”

Anyway, it would seem that the options for next steps include, 1) identifying at least one owner that would suffer harm, then motion back up to the Supreme Court (given that merits had been argued at the Appeals level), or 2) decide that the anonymity of domain name ownership in this case is more important than the fight over this very weird seizure of domain names.

As a non-lawyer, I wonder if it’s possible to represent an owner as a John Doe with an affidavit of ownership of an affected domain name submitted.

UPDATE (2010-03-19T00:07:07 EDT): Check the comments for why a John Doe strategy won’t work when the interest in anonymity is to avoid personal liability rather than free expression.

A weird bonus for people that have read this far: if I open the PDF of the opinion on my Mac in Preview.app or Skim.app (two PDF readers), the “SPORTSBOOK.COM” entry in the listing of the parties on the first page is hyperlinked. However, I don’t see this in Adobe Acrobat Pro or Reader. Seems like the KY Supreme Court is, likely inadvertently, linking to one of the 141 domain names. Of course, Preview.app and Skim.app might be sharing the same library that causes this one URL to be linked… I’m not good-enough of a PDF sleuth to figure it out.