July 25, 2017

AdNauseam, Google, and the Myth of the “Acceptable Ad”

Earlier this month, we (Helen Nissenbaum, Mushon Zer-Aviv, and I), released a new and improved AdNauseam 3.0. For those not familiar, AdNauseam is the adblocker that clicks every ad in an effort to obfuscate tracking profiles and inject doubt into the lucrative economic system that drives advertising-based surveillance. The 3.0 release contains some new features we’ve been excited to discuss with users and critics, but the discussion was quickly derailed when we learned that Google had banned AdNauseam from its store, where it had been available for the past year. We also learned that Google has disallowed users from manually installing or updating AdNauseam on Chrome, effectively locking them out of their own saved data, all without prior notice or warning.

Whether or not you are a fan of AdNauseam’s strategy, it is disconcerting to know that Google can quietly make one’s extensions and data disappear at any moment, without so much as a warning. Today it is a privacy tool that is disabled, but tomorrow it could be your photo album, chat app, or password manager. You don’t just lose the app, you lose your stored data as well: photos, chat transcripts, passwords, etc. For developers, who, incidentally, must pay a fee to post items in the Chrome store, this should cause one to think twice. Not only can your software be banned and removed without warning, with thousands of users left in the lurch, but all comments, ratings, reviews, and statistics are deleted as well.

When we wrote Google to ask the reason for the removal, they responded that AdNauseam had breached the Web Store’s Terms of Service, stating that “An extension should have a single purpose that is clear to users”[1]. However, the sole purpose of AdNauseam seems readily apparent to us—namely to resist the non-consensual surveillance conducted by advertising networks, of which Google is a prime example. Now we can certainly understand why Google would prefer users not to install AdNauseam, as it opposes their core business model, but the Web Store’s Terms of Service do not (at least thus far) require extensions to endorse Google’s business model. Moreover, this is not the justification cited for the software’s removal.

So we are left to speculate as to the underlying cause for the takedown. Our guess is that Google’s real objection is to our newly added support for the EFF’s Do Not Track mechanism[2]. For anyone unfamiliar, this is not the ill-fated DNT of yore, but a new, machine-verifiable (and potentially legally-binding) assertion on the part of websites that commit to not violating the privacy of users who choose to send the DNT header. A new generation of blockers including the EFF’s Privacy Badger, and now AdNauseam, have support for this mechanism built-in, which means that they don’t (by default) block ads and other resources from DNT sites, and, in the case of AdNauseam, don’t simulate clicks on these ads.

So why is this so threatening to Google? Perhaps because it could represent a real means for users, advertisers, and content-providers to move away from surveillance-based advertising. If enough sites commit to Do Not Track, there will be significant financial incentive for advertisers to place ads on those sites, and these too will be bound by DNT, as the mechanism also applies to a site’s third-party partners. And this could possibly set off a chain reaction of adoption that would leave Google, which has committed to surveillance as its core business model, out in the cold.

But wait, you may be thinking, why did the EFF develop this new DNT mechanism when there is AdBlock Plus’ “Acceptable Ads” programs, which Google and other major ad networks already participate in?

That’s because there are crucial differences between the two. For one, “Acceptable Ads” is pay-to-play; large ad networks pay Eyeo, the company behind Adblock Plus, to whitelist their sites. But the more important reason is that the program is all about aesthetics—so-called “annoying” or “intrusive” ads—which the ad industry would like us to believe is the only problem with the current system. An entity like Google is fine with “Acceptable Ads” because they have more than enough resources to pay for whitelisting[3] . Further, they are quite willing to make their ads more aesthetically acceptable to users (after all, an annoyed user is unlikely to click)[4]. What they refuse to change (though we hope we’re wrong about this) is their commitment to surreptitious tracking on a scale never before seen. And this, of course, is what we, the EFF, and a growing number of users find truly “unacceptable” about the current advertising landscape.

 

[1]  In the one subsequent email we received, a Google representative stated that a single extension should not perform both blocking and hiding. This is difficult to accept at face value as nearly all ad blockers (including uBlock, Adblock Plus, Adblock, Adguard, etc., all of which are allowed in the store) also perform blocking and hiding of ads, trackers, and malware. Update (Feb 17, 2017): it has been a month since we have received any message from Google despite repeated requests for clarification, and despite the fact that they claim, in a recent Consumerist article, to be “in touch with the developer to help them resubmit their extension to get included back in the store.”

[2] This is indeed speculation. However, as mention in [1], the stated reason for Google’s ban of AdNauseam does not hold up to scrutiny.

[3]  In September of this year, Eyeo announced that it would partner with a UK-based ad tech startup called ComboTag to launch the“Acceptable Ads Platform” with which they would act also as an ad exchange, selling placements for “Acceptable Ad” slots.  Google, as might be expected, reacted negatively, stating that it would no longer do business with ComboTag. Some assumed that this might also signal an end to their participation in“Acceptable Ads” as well. However, this does not appear to be the case. Google still comprises a significant portion of the exception list on which “Acceptable Ads” is based and, as one ad industry observer put it, “Google is likely Adblock Plus’ largest, most lucrative customer.”

[4]  Google is also a member of the “Coalition for Better Ads”, an industry-wide effort which, like “Acceptable Ads”, focuses exclusively on issues of aesthetics and user experience, as opposed to surveillance and data profiling.

 

Do privacy studies help? A Retrospective look at Canvas Fingerprinting

It seems like every month we hear of some new online privacy violation in the news, on topics such as fingerprinting or web tracking. Many of these news stories highlight academic research. What we don’t see is whether these studies and the subsequent news stories have any impact on privacy.

Our 2014 canvas fingerprinting measurement offers an opportunity for me to provide that insight, as we ended up receiving a surprising amount of press coverage after releasing the paper. In this post I’ll examine the reaction to the paper and explore which aspects contributed to its positive impact on privacy. I’ll also explore how we can use this knowledge when designing future studies to maximize their impact.

What we found in 2014

The 2014 measurement paper, The Web Never Forgets, is a collaboration with researchers at KU Leuven. In it, we measured the prevalence of three persistent tracking techniques online: canvas fingerprinting, cookie respawning, and cookie syncing [1]. They are persistent in that are hard to control, hard to detect, and resilient to blocking or removing.

We found that 5% of the top 100,000 sites were utilizing the HTML5 Canvas API as a fingerprinting vector. The overwhelming majority of which, 97%, was caused by the top two providers. The ability to use the HTML5 Canvas as a fingerprinting vector was first introduced in a 2012 paper by Mowery and Shacham. In the time between that 2012 paper and our 2014 measurement, approximately 20 sites and trackers started using canvas to fingerprint their visitors.

Several examples of the text written to the canvas for fingerprinting purposes. Each of these images would be converted to strings and then hashed to create an identifier.

The reaction to our study

Shortly after we released our paper publicly, we saw a significant amount of press coverage, including articles on ProPublica, BBC, Der Spiegel, and more. The amount of coverage our paper received was a surprise for us; we weren’t the creators of the method, and we certainly weren’t the first to report on the fingerprintability of browsers [2]. Just two days later, AddThis stopped using canvas fingerprinting. The second largest provider at the time, Ligatus, also stopped using the technique.

As can be expected, we saw many users take their frustrations to Twitter. There are users who wondered why publishers would fingerprint them:

complained about AddThis:

and expressed their dislike for canvas fingerprinting in general:

We even saw a user question as to why Mozilla does not protect against canvas fingerprinting in Firefox:

However a general technical solution which preserves the API’s usefulness and usability doesn’t exist [3]. Instead the best solutions are either blocking the feature or blocking the trackers which use it.

The developer community responded by releasing canvas blocking extensions for Firefox and Chrome, tools which are used by over 18,000 users in total. AdBlockPlus and Disconnect both commented that the large trackers are already on their block lists, with Disconnect mentioning that the additional, lesser-known parties from our study would be added to their lists.

Why was our study so impactful?

Much of the online privacy problem is actually a transparency problem. By default, users have very little information on the privacy practices of the websites they visit, and of the trackers included on those sites. Without this information users are unable to differentiate between sites which take steps to protect their privacy and sites which don’t. This leads to less of an incentive for site owners to protect the privacy of their users, as online privacy often comes at the expense of additional ad revenue or third-party features.

With our study, we were not only able to remove this information asymmetry [4], but were able to do so in a way that was relatable to users. The visual representation of canvas fingerprinting proved particularly helpful in removing that asymmetry of information; it was very intuitive to see how the shapes drawn to a canvas could produce a unique image. The ProPublica article even included a demo where users could see their fingerprint built in real time.

While writing the paper we made it a point to include not only the trackers responsible for fingerprinting, but to also include the sites on which the fingerprinting was taking place. Instead of reading that tracker A was responsible for fingerprinting, they could understand that it occurs when they visit publishers X, Y and Z. If a user is frustrated by a technique, and is only familiar with the tracker responsible, there isn’t much they can do. By knowing the publishers on which the technique is used, they can voice their frustrations or choose to visit alternative sites. Publishes, which have in interest in keeping users, will then have an incentive to change their practices.

The technique wasn’t only news to users, even some site owners were unaware that it was being used on their sites. ProPublica updated their original story with a message from YouPorn stating, “[the website was] completely unaware that AddThis contained a tracking software…”, and had since removed it. This shows that measurement work can even help remove the information asymmetry between trackers and the sites upon which they track.

How are things now?

In a re-run of the measurements in January 2016 [5], I’ve observed that the number of distinct trackers utilizing canvas fingerprinting has more than doubled since our 2014 measurement. While the overall number of publisher sites on which the tracking occurs is still below that of our previous measurement, the use of the technique has at least partially revived since AddThis and Ligatus stopped the practice.

This made me curious if we see similar trends for other tracking techniques. In our 2014 paper we also studied cookie respawning [6]. This technique was well studied in the past, both in 2009 and 2011, making it a good candidate to analyze the longitudinal effects of measurement.  As is the case with our measurement, these studies also received a bit of press coverage when released.

The 2009 study, which found HTTP cookie respawning on 6 of the top 100 sites, resulted in a $2.4 million settlement. The 2011 follow-up study found that the use of respawning decreased to just 2 sites in the top 100, and likewise resulted in a $500 thousand settlement. In 2014 we observed respawning on 7 of the top 100 sites, however none of these sites or trackers were US-based entities. This suggests that lawsuits can have an impact, but that impact may be limited by the global nature of the web.

What we’ve learned

Providing transparency into privacy violations online has the potential for huge impact. We saw users unhappy with the trackers that use canvas fingerprinting, with the sites that include those trackers, and even with the browsers they use to visit those sites. It is important that studies visit a large number of sites, and list those on which the privacy violation occurs.

The pressure of transparency affects the larger players more than the long tail. A tracker which is present on a large number of sites, or is present on sites which receive more traffic is more likely to be the focus of news articles or subject to lawsuits. Indeed, our 2016 measurements support it: we’ve seen a large increase in the number of parties involved, but the increase is limited to parties with a much smaller presence.

In the absence of a lawsuit, policy change, or technical solution, we see that canvas fingerprinting use is beginning to grow again. Without constant monitoring and transparency, level of privacy violations can easily creep back to where they were. A single, well-connected tracker can re-introduce a tracking technique to a large number of first-parties.

The developer community will help, we just need to provide them with the data they need. Our detection methodology served as the foundation for blocking tools, which intercept the same calls we used for detection. The script lists we included in our paper and on our website were incorporated into current blocklists.

In a follow-up post, I’ll discuss the work we’re doing to make regular, large scale measurements of web tracking a reality. I’ll show how the tools we’ve built make it possible to run automated, million site crawls can run every month, and I’ll introduce some new results we’re planning to release.

 

[1] The paper’s website provides a short description of each of these techniques.

[2] See: the EFF’s Panopticlick, and academic papers Cookieless Monster and FPDetective.

[3] For example, adding noise to canvas readouts has the potential to cause problems for non-tracking use cases and can still be defeated by a determined tracker. The Tor Browser’s solution of prompting the user on certain canvas calls does work, however it requires a user’s understanding that the technique can be used for tracking and provides for a less than optimal user experience.

[4] For a nice discussion of information asymmetry and the web: Privacy and the Market for Lemons, or How Websites Are Like Used Cars

[5] These measurements were run using the canvas instrumentation portion of OpenWPM.

[6] For a detailed description of cookie respawning, I suggest reading through Ashkan Soltani’s blog post on the subject.

Thanks to Arvind Narayanan for his helpful comments.

How cookies can be used for global surveillance

Today we present an updated version of our paper [0] examining how the ubiquitous use of online tracking cookies can allow an adversary conducting network surveillance to target a user or surveil users en masse.  In the initial version of the study, summarized below, we examined the technical feasibility of the attack. Now we’ve made the attack model more complete and nuanced as well as analyzed the effectiveness of several browser privacy tools in preventing the attack. Finally, inspired by Jonathan Mayer and Ed Felten’s The Web is Flat study, we incorporate the geographic topology of the Internet into our measurements of simulated web traffic and our adversary model, providing a more realistic view of how effective this attack is in practice. [Read more…]