March 26, 2017

How cookies can be used for global surveillance

Today we present an updated version of our paper [0] examining how the ubiquitous use of online tracking cookies can allow an adversary conducting network surveillance to target a user or surveil users en masse.  In the initial version of the study, summarized below, we examined the technical feasibility of the attack. Now we’ve made the attack model more complete and nuanced as well as analyzed the effectiveness of several browser privacy tools in preventing the attack. Finally, inspired by Jonathan Mayer and Ed Felten’s The Web is Flat study, we incorporate the geographic topology of the Internet into our measurements of simulated web traffic and our adversary model, providing a more realistic view of how effective this attack is in practice.

A passive network adversary may want to surveil users en masse over a time period where identifying information on users may change (e.g IP address, user agent string). In our earlier post, we summarized how the adversary can still succeed: “The diagram illustrates how the eavesdropper can use multiple third-party cookies to link traffic. When a user visits ‘,’ the response contains the embedded tracker X, with an ID cookie ‘xxx’. The visits to exampleA and to X are tied together by IP address, which typically doesn’t change within a single page visit [1]. Another page visited by the same user might embed tracker Y bearing the pseudonymous cookie ‘yyy’. If the two page visits were made from different IP addresses, an eavesdropper seeing these cookies can’t tell that the same browser made both visits. But if a third page, however, embeds both trackers X and Y, then the eavesdropper will know that IDs ‘xxx’ and ‘yyy’ belong to the same user.

Additionally, an adversary may be restricted in the traffic they are able to observe due to their location on the network or due to legal restrictions. For example, a state adversary may only have access to packets which pass through routers in their country. We use a combination of traceroute data, IP geolocation, and round trip times as described in Section 4.4 of our paper to build the view of a geographically restricted adversary.

Under the updated model, we find that the cookie linking attack is very effective against users in several different locations. Using OpenWPM, a web measurement framework, we simulate 25 users browsing from a US location over a three month timespan. We find that the adversary is able to link 62% of an average user’s page visits together using only HTTP header data. When we repeat the same measurements with a somewhat coarser, but localized browsing model for simulated users in Europe and Asia, the adversary is able to link slightly less of the overall traffic, but still a significant amount. Additionally, we show how the adversary can link these clusters of page visits to real-world identities since several sites leak the user’s identity in plaintext.

Next, we look at the NSA specifically, given what we know about the agency’s legal restrictions on surveillance. [2] The choice to examine the NSA is motivated by recent reports on their use of third-party cookies for surveillance and targeting (1, 2). We consider both US users and users browsing in Europe and Asia. Based on the NSA’s “one-end foreign” rule and the conservative assumption that most wiretapping happens in US borders or undersea cables, we assume that foreign-bound traffic originating in the US or US-bound traffic from other countries can potentially be surveilled. See the paper for a more detailed explanation.

Under this attack model, the adversary is able to link 13% of traffic from Europe and 20% of traffic from Asia, even when the simulated users browse the popular sites local to their region. This is due to the density of third-party services being located in the Unites States. On the other hand, we find that large clusters of traffic don’t emerge when the attack is carried out on a user browsing in the US. That said, nearly a quarter of page visits for the average simulated US user are visible outside the US through third-parties and referrers, which could enable surveillance through other means.

Effectiveness of browser privacy tools (lower values are better). Base = baseline, no protection. DNT = Do Not Track. Some 3p = Safari-style third-party cookie blocking. All 3p = Blocking all third-party cookies. HTTPS-E = HTTPS Everywhere.

How well can a user defend themselves against cookie linking? Since the identifying cookies that enable the attack are from embedded third-parties, we examine how effective the attack is when a user attempts to block third parties using browser privacy tools. As the chart shows, Ghostery is the most effective tool, but still allows a quarter of traffic to be surveilled.

Our work underscores the importance of HTTPS deployment on the web. It may not be apparent why third-party embedded content requires a secure connection, but our attack shows how it can enable significant vulnerabilities. As we’ve shown, users have the ability to reduce, but not eliminate, their exposure to surveillance through cookies. Websites and browser vendors must therefore share responsibility in protecting users. We’re excited about the proposal to mark HTTP as non-secure and hope that initiatives like Let’s Encrypt will lower the burden of HTTPS deployment and help build a safer web.

[0] We updated the link to the final version of the paper following its publication at WWW 2015. The original link went to this draft version, which contains minor differences.

[1] An exception is if the user routes traffic through Tor. Different requests can take different paths and the exit node IPs will be different. Thus, use of Tor with application-layer anonymization (e.g., Tor browser bundle) defeats our attack.

[2] One-end foreign wireline interceptions inside the United States are generally governed by Section 702 of the FISA Amendments Act. Two-end foreign interceptions inside the United States are generally governed by Executive Order 12333. Interceptions outside the United States are also generally governed by Executive Order 12333.