June 30, 2016

avatar

Do privacy studies help? A Retrospective look at Canvas Fingerprinting

It seems like every month we hear of some new online privacy violation in the news, on topics such as fingerprinting or web tracking. Many of these news stories highlight academic research. What we don’t see is whether these studies and the subsequent news stories have any impact on privacy.

Our 2014 canvas fingerprinting measurement offers an opportunity for me to provide that insight, as we ended up receiving a surprising amount of press coverage after releasing the paper. In this post I’ll examine the reaction to the paper and explore which aspects contributed to its positive impact on privacy. I’ll also explore how we can use this knowledge when designing future studies to maximize their impact.

What we found in 2014

The 2014 measurement paper, The Web Never Forgets, is a collaboration with researchers at KU Leuven. In it, we measured the prevalence of three persistent tracking techniques online: canvas fingerprinting, cookie respawning, and cookie syncing [1]. They are persistent in that are hard to control, hard to detect, and resilient to blocking or removing.

We found that 5% of the top 100,000 sites were utilizing the HTML5 Canvas API as a fingerprinting vector. The overwhelming majority of which, 97%, was caused by the top two providers. The ability to use the HTML5 Canvas as a fingerprinting vector was first introduced in a 2012 paper by Mowery and Shacham. In the time between that 2012 paper and our 2014 measurement, approximately 20 sites and trackers started using canvas to fingerprint their visitors.

Several examples of the text written to the canvas for fingerprinting purposes. Each of these images would be converted to strings and then hashed to create an identifier.

The reaction to our study

Shortly after we released our paper publicly, we saw a significant amount of press coverage, including articles on ProPublica, BBC, Der Spiegel, and more. The amount of coverage our paper received was a surprise for us; we weren’t the creators of the method, and we certainly weren’t the first to report on the fingerprintability of browsers [2]. Just two days later, AddThis stopped using canvas fingerprinting. The second largest provider at the time, Ligatus, also stopped using the technique.

As can be expected, we saw many users take their frustrations to Twitter. There are users who wondered why publishers would fingerprint them:

complained about AddThis:

and expressed their dislike for canvas fingerprinting in general:

We even saw a user question as to why Mozilla does not protect against canvas fingerprinting in Firefox:

However a general technical solution which preserves the API’s usefulness and usability doesn’t exist [3]. Instead the best solutions are either blocking the feature or blocking the trackers which use it.

The developer community responded by releasing canvas blocking extensions for Firefox and Chrome, tools which are used by over 18,000 users in total. AdBlockPlus and Disconnect both commented that the large trackers are already on their block lists, with Disconnect mentioning that the additional, lesser-known parties from our study would be added to their lists.

Why was our study so impactful?

Much of the online privacy problem is actually a transparency problem. By default, users have very little information on the privacy practices of the websites they visit, and of the trackers included on those sites. Without this information users are unable to differentiate between sites which take steps to protect their privacy and sites which don’t. This leads to less of an incentive for site owners to protect the privacy of their users, as online privacy often comes at the expense of additional ad revenue or third-party features.

With our study, we were not only able to remove this information asymmetry [4], but were able to do so in a way that was relatable to users. The visual representation of canvas fingerprinting proved particularly helpful in removing that asymmetry of information; it was very intuitive to see how the shapes drawn to a canvas could produce a unique image. The ProPublica article even included a demo where users could see their fingerprint built in real time.

While writing the paper we made it a point to include not only the trackers responsible for fingerprinting, but to also include the sites on which the fingerprinting was taking place. Instead of reading that tracker A was responsible for fingerprinting, they could understand that it occurs when they visit publishers X, Y and Z. If a user is frustrated by a technique, and is only familiar with the tracker responsible, there isn’t much they can do. By knowing the publishers on which the technique is used, they can voice their frustrations or choose to visit alternative sites. Publishes, which have in interest in keeping users, will then have an incentive to change their practices.

The technique wasn’t only news to users, even some site owners were unaware that it was being used on their sites. ProPublica updated their original story with a message from YouPorn stating, “[the website was] completely unaware that AddThis contained a tracking software…”, and had since removed it. This shows that measurement work can even help remove the information asymmetry between trackers and the sites upon which they track.

How are things now?

In a re-run of the measurements in January 2016 [5], I’ve observed that the number of distinct trackers utilizing canvas fingerprinting has more than doubled since our 2014 measurement. While the overall number of publisher sites on which the tracking occurs is still below that of our previous measurement, the use of the technique has at least partially revived since AddThis and Ligatus stopped the practice.

This made me curious if we see similar trends for other tracking techniques. In our 2014 paper we also studied cookie respawning [6]. This technique was well studied in the past, both in 2009 and 2011, making it a good candidate to analyze the longitudinal effects of measurement.  As is the case with our measurement, these studies also received a bit of press coverage when released.

The 2009 study, which found HTTP cookie respawning on 6 of the top 100 sites, resulted in a $2.4 million settlement. The 2011 follow-up study found that the use of respawning decreased to just 2 sites in the top 100, and likewise resulted in a $500 thousand settlement. In 2014 we observed respawning on 7 of the top 100 sites, however none of these sites or trackers were US-based entities. This suggests that lawsuits can have an impact, but that impact may be limited by the global nature of the web.

What we’ve learned

Providing transparency into privacy violations online has the potential for huge impact. We saw users unhappy with the trackers that use canvas fingerprinting, with the sites that include those trackers, and even with the browsers they use to visit those sites. It is important that studies visit a large number of sites, and list those on which the privacy violation occurs.

The pressure of transparency affects the larger players more than the long tail. A tracker which is present on a large number of sites, or is present on sites which receive more traffic is more likely to be the focus of news articles or subject to lawsuits. Indeed, our 2016 measurements support it: we’ve seen a large increase in the number of parties involved, but the increase is limited to parties with a much smaller presence.

In the absence of a lawsuit, policy change, or technical solution, we see that canvas fingerprinting use is beginning to grow again. Without constant monitoring and transparency, level of privacy violations can easily creep back to where they were. A single, well-connected tracker can re-introduce a tracking technique to a large number of first-parties.

The developer community will help, we just need to provide them with the data they need. Our detection methodology served as the foundation for blocking tools, which intercept the same calls we used for detection. The script lists we included in our paper and on our website were incorporated into current blocklists.

In a follow-up post, I’ll discuss the work we’re doing to make regular, large scale measurements of web tracking a reality. I’ll show how the tools we’ve built make it possible to run automated, million site crawls can run every month, and I’ll introduce some new results we’re planning to release.

 

[1] The paper’s website provides a short description of each of these techniques.

[2] See: the EFF’s Panopticlick, and academic papers Cookieless Monster and FPDetective.

[3] For example, adding noise to canvas readouts has the potential to cause problems for non-tracking use cases and can still be defeated by a determined tracker. The Tor Browser’s solution of prompting the user on certain canvas calls does work, however it requires a user’s understanding that the technique can be used for tracking and provides for a less than optimal user experience.

[4] For a nice discussion of information asymmetry and the web: Privacy and the Market for Lemons, or How Websites Are Like Used Cars

[5] These measurements were run using the canvas instrumentation portion of OpenWPM.

[6] For a detailed description of cookie respawning, I suggest reading through Ashkan Soltani’s blog post on the subject.

Thanks to Arvind Narayanan for his helpful comments.

avatar

How cookies can be used for global surveillance

Today we present an updated version of our paper [0] examining how the ubiquitous use of online tracking cookies can allow an adversary conducting network surveillance to target a user or surveil users en masse.  In the initial version of the study, summarized below, we examined the technical feasibility of the attack. Now we’ve made the attack model more complete and nuanced as well as analyzed the effectiveness of several browser privacy tools in preventing the attack. Finally, inspired by Jonathan Mayer and Ed Felten’s The Web is Flat study, we incorporate the geographic topology of the Internet into our measurements of simulated web traffic and our adversary model, providing a more realistic view of how effective this attack is in practice. [Read more…]

avatar

The hidden perils of cookie syncing

[Steven Englehardt is a first-year Ph.D. student in the computer security group at Princeton. In this post he talks about the implications of a recent study that we published in collaboration with researchers at KU Leuven, Belgium. — Arvind Narayanan]

Online tracking is becoming more sophisticated and thus increasingly difficult to block. Modern browsers expose many surfaces that enable users to be uniquely identified, including Flash cookies and browser fingerprints. In a new paper that will appear at ACM CCS, we present the first large scale study of three advanced tracking mechanisms — canvas fingerprinting, evercookies, and cookie syncing. We developed novel measurement techniques and found that these tracking mechanisms are used on a large number of sites. Our findings on canvas fingerprinting, in particular, have been in the news (Propublica, BBC, EFF).

In this blog post I’ll focus on a different part of our paper that looked at cookie syncing, the process by which two different trackers link the IDs they’ve given to the same user. The most common use of cookie syncing is to enable real-time bidding between several entities in an ad auction. It allows the bidder and the ad network to refer to the user by the same ID so that the bidder can place bids on a particular user in current and future auctions. Cookie syncing raises subtle yet serious privacy concerns, but due to the technical complexity of explaining it, didn’t receive much press coverage. In this post I’ll explain cookie syncing and why it’s worrisome — even more so than canvas fingerprinting.
[Read more…]

avatar

Web measurement for fairness and transparency

[This is the first in a series of posts giving some examples of security-related research in the Princeton computer science department. We’re actively recruiting top-notch students to enter our Ph.D. program, as well as postdocs and visiting scholars. We don’t have enough bandwidth here on the blog to feature everything we do, so we’ll be highlighting a few examples over the next couple of weeks.]

Everything we do on the web is tracked, profiled, and analyzed. But what do companies do with that information? To what extent do they use it in ways that benefit us, versus discriminatory ways? While many concerns have been raised, not much is known quantitatively. That’s why at Princeton we’re building an infrastructure to detect, measure and reverse engineer differential treatment of web users.

[Read more…]