December 14, 2017

No boundaries: Exfiltration of personal data by session-replay scripts

This is the first post in our “No Boundaries” series, in which we reveal how third-party scripts on websites have been extracting personal information in increasingly intrusive ways. [0]
by Steven Englehardt, Gunes Acar, and Arvind Narayanan

Update: we’ve released our data — the list of sites with session-replay scripts, and the sites where we’ve confirmed recording by third parties.

You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make.  But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.

[Read more…]

I never signed up for this! Privacy implications of email tracking

In this post I discuss a new paper that will appear at PETS 2018, authored by myself, Jeffrey Han, and Arvind Narayanan.

What happens when you open an email and allow it to display embedded images and pixels? You may expect the sender to learn that you’ve read the email, and which device you used to read it. But in a new paper we find that privacy risks of email tracking extend far beyond senders knowing when emails are viewed. Opening an email can trigger requests to tens of third parties, and many of these requests contain your email address. This allows those third parties to track you across the web and connect your online activities to your email address, rather than just to a pseudonymous cookie.

Illustrative example. Consider an email from the deals website LivingSocial (see details of the example email). When the email is opened, client will make requests to 24 third parties across 29 third-party domains.[1] A total of 10 third parties receive an MD5 hash of the user’s email address, including major data brokers Datalogix and Acxiom. Nearly all of the third parties (22 of the 24) set or receive cookies with their requests. In a webmail client the cookies are the same browser cookies used to track users on the web, and indeed many major web trackers (including domains belonging to Google, comScore, Adobe, and AOL) are loaded when the email is opened. While this example email has a large number of trackers relative to the average email in our corpus, the majority of emails (70%) embed at least one tracker.

How it works. Email tracking is possible because modern graphical email clients allow rendering a subset of HTML. JavaScript is invariably stripped, but embedded images and stylesheets are allowed. These are downloaded and rendered by the email client when the user views the email.[2] Crucially, many email clients, and almost all web browsers, in the case of webmail, send third-party cookies with these requests. The email address is leaked by being encoded as a parameter into these third-party URLs.

Diagram showing the process of tracking with email address

When the user opens the email, a tracking pixel from “tracker.com” is loaded. The user’s email address is included as a parameter within the pixel’s URL. The email client here is a web browser, so it automatically sends the tracking cookies for “tracker.com” along with the request. This allows the tracker to create a link between the user’s cookie and her email address. Later, when the user browses a news website, the browser sends the same cookie, and thus the new activity can be connected back to the email address. Email addresses are generally unique and persistent identifiers. So email-based tracking can be used for targeting online ads based on offline activity (say, to shoppers who used a loyalty card linked to an email address) and for linking different devices belonging to the same user.

[Read more…]

When the cookie meets the blockchain

Cryptocurrencies are portrayed as a more anonymous and less traceable method of payment than credit cards. So if you shop online and pay with Bitcoin or another cryptocurrency, how much privacy do you have? In a new paper, we show just how little.

Websites including shopping sites typically have dozens of third-party trackers per site. These third parties track sensitive details of payment flows, such as the items you add to your shopping cart, and their prices, regardless of how you choose to pay. Crucially, we find that many shopping sites leak enough information about your purchase to trackers that they can link it uniquely to the payment transaction on the blockchain. From there, there are well-known ways to further link that transaction to the rest of your Bitcoin wallet addresses. You can protect yourself by using browser extensions such as Adblock Plus and uBlock Origin, and by using Bitcoin anonymity techniques like CoinJoin. These measures help, but we find that linkages are still possible.

 

An illustration of the full scope of our attack. Consider three websites that happen to have the same embedded tracker. Alice makes purchases and pays with Bitcoin on the first two sites, and logs in on the third. Merchant A leaks a QR code of the transaction’s Bitcoin address to the tracker, merchant B leaks a purchase amount, and merchant C leaks Alice’s PII. Such leaks are commonplace today, and usually intentional. The tracker links these three purchases based on Alice’s browser cookie. Further, the tracker obtains enough information to uniquely (or near-uniquely) identify coins on the Bitcoin blockchain that correspond to the two purchases. However, Alice took the precaution of putting her bitcoins through CoinJoin before making purchases. Thus, either transaction individually could not have been traced back to Alice’s wallet, but there is only one wallet that participated in both CoinJoins, and is hence revealed to be Alice’s.

 

Using the privacy measurement tool OpenWPM, we analyzed 130 e-commerce sites that accept Bitcoin payments, and found that 53 of these sites leak transaction details to trackers. Many, but not all, of these leaks are by design, to enable advertising and analytics. Further, 49 sites leak personal identifiers to trackers: names, emails, usernames, and so on. This combination means that trackers can link real-world identities to Bitcoin addresses. To be clear, all of this leaked data is sitting in the logs of dozens of tracking companies, and the linkages can be done retroactively using past purchase data.

On a subset of these sites, we made real purchases using bitcoins that we first “mixed” using the CoinJoin anonymity technique.[1] We found that a tracker that observed two of our purchases — a common occurrence — would be able to identify our Bitcoin wallet 80% of the time. In our paper, we present the full details of our attack as well as a thorough analysis of its effectiveness.

Our findings are a reminder that systems without provable privacy properties may have unexpected information leaks and lurking privacy breaches. When multiple such systems interact, the leaks can be even more subtle. Anonymity in cryptocurrencies seems especially tricky, because it inherits the worst of both data anonymization (sensitive data must be publicly and permanently stored on the blockchain) and anonymous communication (privacy depends on subtle interactions arising from the behavior of users and applications).

[1] In this experiment we used 1–2 rounds of mixing. We provide evidence in the paper that while a higher mixing depth decreases the effectiveness of the attack, it doesn’t defeat it. There’s room for a more careful study of the tradeoffs here.