April 25, 2018

No boundaries: Exfiltration of personal data by session-replay scripts

This is the first post in our “No Boundaries” series, in which we reveal how third-party scripts on websites have been extracting personal information in increasingly intrusive ways. [0]
by Steven Englehardt, Gunes Acar, and Arvind Narayanan

Update: we’ve released our data — the list of sites with session-replay scripts, and the sites where we’ve confirmed recording by third parties.

You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make.  But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.

[Read more…]

I never signed up for this! Privacy implications of email tracking

In this post I discuss a new paper that will appear at PETS 2018, authored by myself, Jeffrey Han, and Arvind Narayanan.

What happens when you open an email and allow it to display embedded images and pixels? You may expect the sender to learn that you’ve read the email, and which device you used to read it. But in a new paper we find that privacy risks of email tracking extend far beyond senders knowing when emails are viewed. Opening an email can trigger requests to tens of third parties, and many of these requests contain your email address. This allows those third parties to track you across the web and connect your online activities to your email address, rather than just to a pseudonymous cookie.

Illustrative example. Consider an email from the deals website LivingSocial (see details of the example email). When the email is opened, client will make requests to 24 third parties across 29 third-party domains.[1] A total of 10 third parties receive an MD5 hash of the user’s email address, including major data brokers Datalogix and Acxiom. Nearly all of the third parties (22 of the 24) set or receive cookies with their requests. In a webmail client the cookies are the same browser cookies used to track users on the web, and indeed many major web trackers (including domains belonging to Google, comScore, Adobe, and AOL) are loaded when the email is opened. While this example email has a large number of trackers relative to the average email in our corpus, the majority of emails (70%) embed at least one tracker.

How it works. Email tracking is possible because modern graphical email clients allow rendering a subset of HTML. JavaScript is invariably stripped, but embedded images and stylesheets are allowed. These are downloaded and rendered by the email client when the user views the email.[2] Crucially, many email clients, and almost all web browsers, in the case of webmail, send third-party cookies with these requests. The email address is leaked by being encoded as a parameter into these third-party URLs.

Diagram showing the process of tracking with email address

When the user opens the email, a tracking pixel from “tracker.com” is loaded. The user’s email address is included as a parameter within the pixel’s URL. The email client here is a web browser, so it automatically sends the tracking cookies for “tracker.com” along with the request. This allows the tracker to create a link between the user’s cookie and her email address. Later, when the user browses a news website, the browser sends the same cookie, and thus the new activity can be connected back to the email address. Email addresses are generally unique and persistent identifiers. So email-based tracking can be used for targeting online ads based on offline activity (say, to shoppers who used a loyalty card linked to an email address) and for linking different devices belonging to the same user.

[Read more…]