March 19, 2024

A Peek at A/B Testing in the Wild

[Dillon Reisman was previously an undergraduate at Princeton when he worked on a neat study of the surveillance implications of cookies. Now he’s working with the WebTAP project again in a research + engineering role. — Arvind Narayanan]

In 2014, Facebook revealed that they had manipulated users’ news feeds for the sake of a psychology study looking at users’ emotions. We happen to know about this particular experiment because it was the subject of a publicly-released academic paper, but websites do “A/B testing” every day that is completely opaque to the end-user. Of course, A/B testing is often innocuous (say, to find a pleasing color scheme), but the point remains that the user rarely has any way of knowing in what ways their browsing experience is being modified, or why their experience is being changed in particular.

By testing websites over time and in a variety of conditions we could hope to discover how users’ browsing experience is manipulated in not-so-obvious ways. But one third-party service actually makes A/B testing and user tracking human-readable — no reverse-engineering or experimentation necessary! This is the widely-used A/B testing provider Optimizely; Jonathan Mayer had told us it would be an interesting target of study.* Their service is designed to expose in easily-parsable form how its clients segment users and run experiments on them directly in the JavaScript they embed on websites. In other words, if example.com uses Optimizely, the entire logic used by example.com for A/B testing is revealed to every visitor of example.com.

That means that the data collected by our large-scale web crawler OpenWPM contains the details of all the experiments that are being run across the web using Optimizely. In this post I’ll show you some interesting things we found by analyzing this data. We’ve also built a Chrome extension, Pessimizely, that you can download so you too can see a website’s Optimizely experiments. When a website uses Optimizely, the extension will alert you and attempt to highlight any elements on the page that may be subject to an experiment. If you visit nytimes.com, it will also show you alternative news headlines when you hover over a title. I suggest you give it a try!

 

The New York Times website, with headlines that may be subject to an experiment highlighted by Pessimizely.

 

The Optimizely Scripts

Our OpenWPM web crawler collects and stores javascript embedded on every page it visits. This makes it straightforward to make a query for every page that uses Optimizely and grab and analyze the code they get from Optimizely. Once collected, we investigated the scripts through regular expression-matching and manual analysis.


  "4495903114": {
      "code": …
      "name": "100000004129417_1452199599
               [A.] New York to Appoint Civilian to Monitor Police Surveillance -- 
               [B.] Sued Over Spying on Muslims, New York Police Get Oversight",
      "variation_ids": ["4479602534","4479602535"],
      "urls": [{
        "match": "simple",
        "value": "http://www.nytimes.com"
      }],
      "enabled_variation_ids": ["4479602534","4479602535"]
    },

An example of an experiment from nytimes.com that is A/B testing two variations of a headline in a link to an article.

From a crawl of the top 100k sites in January 2016, we found and studied 3,306 different websites that use Optimizely. The Optimizely script for each site contains a data object that defines:

  1. How the website owner wants to divide users into “audiences,” based on any number of parameters like location, cookies, or user-agent.
  2. Experiments that the users might experience, and what audiences should be targeted with what experiments.

The Optimizely script reads from the data object and then executes a javascript payload and sets cookies depending on if the user is in an experimental condition. The site owner populates the data object through Optimizely’s web interface – who on a website’s development team can access that interface and what they can do is a question for the site owner. The developer also helpfully provides names for their user audiences and experiments.

In total, we found around 51,471 experiments on the 3,306 websites in our dataset that use Optimizely. On average each website has approximately 15.2 experiments, and each experiment has about 2.4 possible variations. We have only scratched the surface of some of the interesting things sites use A/B testing for, and here I’ll share a couple of the more interesting examples:

 

News publishers test the headlines users see, with differences that impact the tone of the article

A widespread use of Optimizely among news publishers is “headline testing.” To use an actual recent example from the nytimes.com, a link to an article headlined:

“Turkey’s Prime Minister Quits in Rift With President”

…to a different user might appear as…

“Premier to Quit Amid Turkey’s Authoritarian Turn.”

The second headline suggests a much less neutral take on the news than the first. That sort of difference can paint a user’s perception of the article before they’ve read a single word. We found other examples of similarly politically-sensitive headlines changing, like the following from pjmedia.com:

“Judge Rules Sandy Hook Families Can Proceed with Lawsuit Against Remington”

…could appear to some users as…

“Second Amendment Under Assault by Sandy Hook Judge.”

While editorial concerns might inform how news publishers change headlines, it’s clear that a major motivation behind headline testing is the need to drive clicks. A third variation we found for the Sandy Hook headline above is the much vaguer sounding “Huge Development in Sandy Hook Gun Case.” The Wrap, an entertainment news outlet, experimented with replacing “Disney, Paramount  Had Zero LGBT Characters in Movies Last Year” with the more obviously “click-baity” headline “See Which 2 Major Studios Had Zero LGBT Characters in 2015 Movies.”

We were able to identify 17 different news websites in our crawl that in the past have done some form of headline testing. This is most likely an undercount in our crawl — most of these 17 websites use Optimizely’s integrations with other third-party platforms like Parse.ly and WordPress for their headline testing, making them more easily identified. The New York Times website, for instance, implements its own headline testing code.

Another limitation of what we’ve found so far is that the crawls that we analyzed only visit the homepage of each site. The OpenWPM crawler could be configured, however, to browse links from within a site’s homepage and collect data from those pages. A broader study of the practices of news publishers could use the tool to drill down deeper into news sites and study their headlines over time.

 

Websites identify and categorize users based on money and affluence

Many websites target users based on IP and geolocation. But when IP/geolocation are combined with notions of money the result is surprising. The website of a popular fitness tracker targets users that originate from a list of six hard-coded IP addresses labelled “IP addresses Spending more than $1000.” Two of the IP addresses appear to be larger enterprise customers — a medical research institute a prominent news magazine. Three belong to unidentified Comcast customers. These big-spending IP addresses were targeted in the past with an experiment presented the user a button that either suggested the user “learn more” about a particular product or “buy now.”

Connectify, a large vendor of networking software, uses geolocation on a coarser level — they label visitors from the US, Australia, UK, Canada, Netherlands, Switzerland, Denmark, and New Zealand as coming from “Countries that are Likely to Pay.”

Non-profit websites also experiment with money. charity: water (charitywater.org) and the Human Rights Campaign (hrc.org) both have experiments defined to change the default donation amount a user might see in a pre-filled text box.

 

Web developers use third-party tools for more than just their intended use

A developer following the path of least resistance might use Optimizely to do other parts of their job simply because it is the easiest tool available. Some of the more exceptional “experiments” deployed by websites are simple bug-fixes, described with titles like, “[HOTFIX][Core Commerce] Fix broken sign in link on empty cart,” or “Fix- Footer links errors 404.” Other experiments betray the haphazard nature of web development, with titles like “delete me,” “Please Delete this Experiment too,” or “#Bugfix.”

We might see these unusual uses because Optimizely allows developers to edit and rollout new code with little engineering overhead. With the inclusion of one third-party script, a developer can leverage the Optimizely web interface to do a task that might otherwise take more time or careful testing. This is one example of how third-parties have evolved to become integral to the entire functionality and development of the web, raising security and privacy concerns.

 

The need for transparency

Much of the web is curated by inscrutable algorithms running on servers, and a concerted research effort is needed to shed light on the less-visible practices of websites. Thanks to the Optimizely platform we can at least peek into that secret world.

We believe, however, that transparency should be the default on the web — not the accidental product of one third-party’s engineering decisions. Privacy policies are a start, but they generally only cover a website’s data collection and third-party usage on a coarse level. The New York Times Privacy Policy, for instance, does not even suggest that headline testing is something they might do, despite how it could drastically alter your consumption of the news. If websites had to publish more information about what third-parties they use and how they use them, regulators could use that information to better protect consumers on the web. Considering the potentially harmful effects of how websites might use third-parties, more transparency and oversight is essential.

 

@dillonthehuman


* This was a conversation a year ago, when Jonathan was a grad student at Stanford.