Tonight the Washington Post published a story about the NSA’s eavesdropping on the unique tracking cookies used by advertisers and analytics companies to identify their users. By capturing these unique identifiers the NSA was able to re-identify users whom it had seen earlier. In short, the NSA could piggyback on commercial tracking to track users itself.
The standard tracking methods used by companies today are vulnerable to this kind of surveillance—tracking causes your browser to transmit unique identifiers that can be used by an eavesdropper to identify and track you.
The easiest way to protect users against this threat is to refrain from tracking. But if sites are going to track users, this can be done in ways that avoid surveillance. In this post I’ll talk about how to do surveillance-proof tracking.
[First, a micro-tutorial on the most common tracking technology: When you visit a web page (at a “first party” site), the page typically includes content provided by other sites (“third parties”), for example to target ads or collect analytics (statistics) about your use of the Web. Any of these parties can send to your browser a unique identifier, which takes the form of a cookie. Your browser will store the cookie and send it back to the site when the browser connects to the same site later. The site can use the unique identifier to link together your activities over time, and keep a record of your activity. This kind of tracking is commonly done by both first parties and third parties.]
Before talking about methods that work, let’s take a minute to discuss one that doesn’t work: simple encryption of the tracking ID. We already know that the user is vulnerable if the site sends a cookie containing a unique ID. What if the site generates a secret key K, and encrypts the ID with the key k, then sends the resulting encrypted value as the cookie? This isn’t a solution, because the encrypted value will still serve to uniquely identify the user.
Another method that doesn’t work is to switch from cookie-based tracking to a different form of tracking technology such as browser fingerprinting. This doesn’t work because the tracker is still calculating a unique identifier for each user’s browser, and sending that identifier to the server on an unprotected connection.
An approach that does work is for the tracking entity to use https, the secure web protocol, for its communication with the user’s computer. This ensures that the unique ID that is transmitted is protected by encryption in a way that doesn’t leak to an eavesdropper any information about which connections are to the same user. Implementing https on a larger site is not as easy as it should be, but it seems to be the price of surveillance-proof tracking.
[Digression for crypto geeks: Other cryptographic approaches seem inferior. You might be tempted to switch to a randomized symmetric encryption scheme, but this doesn’t help if the ciphertext is stored in a cookie that will be echoed back by the client. Or you might want to push code to the client that symmetrically encrypts the key with a different random IV each time; but then you need to push the key to the client without exposing the key to the eavesdropper. (Remember that the eavesdropper can act as a client.) Once you decide to adopt public-key crypto, you might as well use the https standard that is already supported by the browser.]
Another approach to protecting users is to switch to a method that holds all of the stored information on the client side, that is, in the user’s browser. The idea is that rather than having the server accumulate a record of the user’s activities (or some kind of preference profile based on those activities), you would instead have the user’s browser store the same information for you. This approach is taken by some of the privacy-preserving behavioral advertising systems that have been proposed. If information is accumulated on the user’s own computer, there doesn’t need to be a unique identifier that is sent across the Internet every time the user accesses your site. Instead, you can send encrypted data only at the times you need it. This requires more aggressive re-engineering of an ad or analytics service, but it provides additional benefits to the user in terms of privacy and transparency.
In the medium term, the easiest way for trackers to protect their users is to switch to https. Until they do so, it is up to users to protect themselves. I’ll talk about users’ self-help options in the next post.