September 9, 2024

CITP’s OpenWPM privacy measurement tool moves to Mozilla

As part of my PhD at Princeton’s Center for Information Technology Policy (CITP), I led the development of OpenWPM, a tool for web privacy measurement, with the help of many contributors. My co-authors and I first released OpenWPM in 2014 with the goal of lowering the technical costs of large-scale web privacy measurement. The tool’s success exceeded our expectations; it has been used by over 30 academic studies since its release, in research areas ranging from computer science to law.

OpenWPM has a new home at Mozilla. After graduating in 2018, I joined Mozilla’s security engineering team to work on strengthening Firefox’s tracking protection. We’re committed to ensuring users are protected from tracking by default. To that end, we’ve migrated OpenWPM to Mozilla, where it will remain open source to ensure researchers have the tools required to discover privacy-infringing practices on the web. We are also using it ourselves to understand the implications of our new anti-tracking features, to discover fingerprinting scripts and add them to our tracking protection lists, as well as to collect data for a number of ongoing privacy research projects.

Over the past six months we’ve started a number of efforts to significantly improve OpenWPM:

1. Cloud-friendly data storage. OpenWPM has long used SQLite to store crawl data. This makes it easy for anyone to install the tool, run a small measurement, and inspect the dataset locally. However, this is very limiting for large-scale measurements. OpenWPM can now save data directly to Amazon S3 in Parquet format, making it possible to launch crawls on a cluster of machines.

2. Support for modern versions of Firefox. We are in the process of migrating all of OpenWPM’s instrumentation to WebExtensions, which is necessary to run measurements with Firefox 57+.

2. Modular instrumentation. OpenWPM’s instrumentation was previously deeply embedded in the crawler, making it difficult to use outside of a crawling context. We’ve now refactored the instrumentation into a separate npm package that can easily be imported by any Firefox WebExtension. In fact, we’ve already used the module to collect data in one of our user studies.

4. A standard set of analysis utilities. To further ease analyses on OpenWPM datasets, we’ve bundled the many small utility functions we’ve developed over the years into a single utilities package available on PyPI.

5. Data collection and release. Since 2015, CITP has collected monthly 1-million-site web measurements using OpenWPM. All of this data is available for download, but once Gunes Acar moves on from CITP in a few months, the CITP measurements will end. At Mozilla, we are exploring options to regularly collect and release new measurements.

All of these efforts are still underway, and we welcome community involvement as we continue to build upon them. You can find us hanging out in #openwpm on irc.mozilla.org.