December 22, 2024

How to stop spies from piggybacking on commercial Web tracking

Tonight the Washington Post published a story about the NSA’s eavesdropping on the unique tracking cookies used by advertisers and analytics companies to identify their users. By capturing these unique identifiers the NSA was able to re-identify users whom it had seen earlier. In short, the NSA could piggyback on commercial tracking to track users itself.

The standard tracking methods used by companies today are vulnerable to this kind of surveillance—tracking causes your browser to transmit unique identifiers that can be used by an eavesdropper to identify and track you.

The easiest way to protect users against this threat is to refrain from tracking. But if sites are going to track users, this can be done in ways that avoid surveillance. In this post I’ll talk about how to do surveillance-proof tracking.

[First, a micro-tutorial on the most common tracking technology: When you visit a web page (at a “first party” site), the page typically includes content provided by other sites (“third parties”), for example to target ads or collect analytics (statistics) about your use of the Web. Any of these parties can send to your browser a unique identifier, which takes the form of a cookie. Your browser will store the cookie and send it back to the site when the browser connects to the same site later. The site can use the unique identifier to link together your activities over time, and keep a record of your activity. This kind of tracking is commonly done by both first parties and third parties.]

Before talking about methods that work, let’s take a minute to discuss one that doesn’t work: simple encryption of the tracking ID. We already know that the user is vulnerable if the site sends a cookie containing a unique ID. What if the site generates a secret key K, and encrypts the ID with the key k, then sends the resulting encrypted value as the cookie? This isn’t a solution, because the encrypted value will still serve to uniquely identify the user.

Another method that doesn’t work is to switch from cookie-based tracking to a different form of tracking technology such as browser fingerprinting. This doesn’t work because the tracker is still calculating a unique identifier for each user’s browser, and sending that identifier to the server on an unprotected connection.

An approach that does work is for the tracking entity to use https, the secure web protocol, for its communication with the user’s computer. This ensures that the unique ID that is transmitted is protected by encryption in a way that doesn’t leak to an eavesdropper any information about which connections are to the same user. Implementing https on a larger site is not as easy as it should be, but it seems to be the price of surveillance-proof tracking.

[Digression for crypto geeks: Other cryptographic approaches seem inferior. You might be tempted to switch to a randomized symmetric encryption scheme, but this doesn’t help if the ciphertext is stored in a cookie that will be echoed back by the client. Or you might want to push code to the client that symmetrically encrypts the key with a different random IV each time; but then you need to push the key to the client without exposing the key to the eavesdropper. (Remember that the eavesdropper can act as a client.) Once you decide to adopt public-key crypto, you might as well use the https standard that is already supported by the browser.]

Another approach to protecting users is to switch to a method that holds all of the stored information on the client side, that is, in the user’s browser. The idea is that rather than having the server accumulate a record of the user’s activities (or some kind of preference profile based on those activities), you would instead have the user’s browser store the same information for you. This approach is taken by some of the privacy-preserving behavioral advertising systems that have been proposed. If information is accumulated on the user’s own computer, there doesn’t need to be a unique identifier that is sent across the Internet every time the user accesses your site. Instead, you can send encrypted data only at the times you need it. This requires more aggressive re-engineering of an ad or analytics service, but it provides additional benefits to the user in terms of privacy and transparency.

In the medium term, the easiest way for trackers to protect their users is to switch to https. Until they do so, it is up to users to protect themselves. I’ll talk about users’ self-help options in the next post.

Comments

  1. We are also specialists in corporate mascots if you’ve
    got an unparalleled love for your team to match your sports passion.

    No matter what your age is, you can find some really stunning vampire costume
    ideas. The used costume doesn’t even have to be fully intact.

  2. Harry Johnston says

    I’m having trouble with your RSS feed, Sage complains about an XML parse error. Is that at your end or mine?

    • Harry Johnston says

      That’s the posts feed, the comments one is working.

      • It’s the feed.

        In the description for “Princeton CS research on secure communications”, there’s an illegal character just before “The new surveillance environment changes that, with companies racing”.

  3. To what extent do these techniques simply move the attack vector? If I were the NSA and wires were being closed to me (even with the ability to issue false certs) then it would be most logical to move to the machines attached to the wires. We already know that this is being done in limited cases, but what would be the situation if the NSA decided to compromise most servers or most user machines of potential interest?

    If you look at the ongoing rate of password breaches, this seems like the kind of thing that would be fairly plausible, and might actually yield a more-interesting-looking data stream.

  4. Hi
    There is a problem with your last recommendation, “hold all the stored information on the client side”. From the point of view of Google, you cannot trust and should not trust the information coming from the user. That is why one sanitizes inputs to a web form. Google would have to “sanitize” the info on the user side, or implement some method to ensure it is not tampered with. And what happens when a user is using several browsers (at home, work, mobile device, two browsers on the same computer to access two diferent google accounts, etc.
    Rather than accumulating a record of user’s activities, I would prefer to have a neutral module accumulate a record of user interests under the control of the user. It could monitor on my behalf what I am doing to change the interest profile which would be the one part accesible to advertisers.

  5. Regards for sharing this amazing website.

    • 1 thing that might help some is to limit the cookies allowed on your computer. You can do this by clicking on internet options/privacy/advanced. I personally require first party cookies to request to be allowed to place a cookie on my computer and I disallow all third party cookies.