August 21, 2019

CITP’s OpenWPM privacy measurement tool moves to Mozilla

As part of my PhD at Princeton’s Center for Information Technology Policy (CITP), I led the development of OpenWPM, a tool for web privacy measurement, with the help of many contributors. My co-authors and I first released OpenWPM in 2014 with the goal of lowering the technical costs of large-scale web privacy measurement. The tool’s success exceeded our expectations; it has been used by over 30 academic studies since its release, in research areas ranging from computer science to law.

OpenWPM has a new home at Mozilla. After graduating in 2018, I joined Mozilla’s security engineering team to work on strengthening Firefox’s tracking protection. We’re committed to ensuring users are protected from tracking by default. To that end, we’ve migrated OpenWPM to Mozilla, where it will remain open source to ensure researchers have the tools required to discover privacy-infringing practices on the web. We are also using it ourselves to understand the implications of our new anti-tracking features, to discover fingerprinting scripts and add them to our tracking protection lists, as well as to collect data for a number of ongoing privacy research projects.

Over the past six months we’ve started a number of efforts to significantly improve OpenWPM:

1. Cloud-friendly data storage. OpenWPM has long used SQLite to store crawl data. This makes it easy for anyone to install the tool, run a small measurement, and inspect the dataset locally. However, this is very limiting for large-scale measurements. OpenWPM can now save data directly to Amazon S3 in Parquet format, making it possible to launch crawls on a cluster of machines.

2. Support for modern versions of Firefox. We are in the process of migrating all of OpenWPM’s instrumentation to WebExtensions, which is necessary to run measurements with Firefox 57+.

2. Modular instrumentation. OpenWPM’s instrumentation was previously deeply embedded in the crawler, making it difficult to use outside of a crawling context. We’ve now refactored the instrumentation into a separate npm package that can easily be imported by any Firefox WebExtension. In fact, we’ve already used the module to collect data in one of our user studies.

4. A standard set of analysis utilities. To further ease analyses on OpenWPM datasets, we’ve bundled the many small utility functions we’ve developed over the years into a single utilities package available on PyPI.

5. Data collection and release. Since 2015, CITP has collected monthly 1-million-site web measurements using OpenWPM. All of this data is available for download, but once Gunes Acar moves on from CITP in a few months, the CITP measurements will end. At Mozilla, we are exploring options to regularly collect and release new measurements.

All of these efforts are still underway, and we welcome community involvement as we continue to build upon them. You can find us hanging out in #openwpm on irc.mozilla.org.

Do Mobile News Alerts Undermine Media’s Role in Democracy? Madelyn Sanfilippo at CITP

Why do different people sometimes get different articles about the same event, sometimes from the same news provider? What might that mean for democracy?

Speaking at CITP today is Dr. Madelyn Rose Sanfilippo, a postdoctoral research associate here at CITP. Madelyn empirically studies the governance of sociotechnical systems, as well as outcomes, inequality, and consequences within these systems–through mixed method research design.

Today, Madelyn tells us about a large scale project with Yafit Lev-Aretz  to examine how push notifications and personalized distribution and consumption of news might influence readers and democracy. The project is funded by the Tow Center for Digital Journalism at Columbia University and the Knight Foundation.

Why Do Push Notification Matters for Democracy?

Americans’ trust in media have been diverging in recent years, even as society worries about the risks to democracy from echo chambers. Madelyn also tells us about changes in how Americans get their news.

Push notifications are one of those changes– news organizations that send alerts to people’s computers and to our mobile phones about news they think are important. And we get a lot of them. In 2017, Tow Center researcher Pete Brown found that people get almost one push notification per minute on their phones– interrupting us with news.

In 2017, 85% of Americans were getting news via their mobile devices, and while it’s not clear how many of that came from push notifications, mobile phones tend to come with news apps that have push notifications enabled by default.

When Madelyn and Yafit  started to analyze push notifications, they noticed something fascinating: the same publisher often pushes different headlines to different platforms. They also found that news publishers use language with less objectivity and more subjective, emotional content in those notifications.

Madelyn and Yafit especially wanted to know if media outlets covered breaking news differently based on political affiliation of their readers. Comparing notifications of disasters, gun violence, and terrorism, they found differences in the number of push notifications published by publishers with higher and lower affiliation. They also found differences in the machine-coded subjectivity and objectivity of how these publishers covered those stories.

Composite subjectivity of different sources (higher is more subjective)

Do Push Notifications Create Political Filter Bubbles?

Finally, Madelyn and Yafit wanted to know if the personalization of push notifications shaped what people might be aware of. First, Madelyn explains to us that personalization takes multiple forms:

  • Curation: sometimes which articles we see is curated by personalized algorithms (like Google News)
  • Sometimes the content itself is personalized, where two people see very different text even though they’re reading the same article

Together, they found that location based personalization is common. Madelyn tells us about three different notifications that NBC news sent to people the morning after the Democratic primary. Not only did national audiences get different notifications, but different cities received notes that mentioned Democrat and Republican candidates differently. Aside from midterms, Madelyn and her colleagues found out that sports news is often location-personalized.

Behavioral Personalization

Madelyn tells us that many news publishers also personalize news articles based on information about their readers, including their reading behavior and surveys. They found that some news publishers personalize messages based on what they consider to be a person’s reading level. They also found evidence that publishers tailor news based on personal information that they never provided to the publisher.

Governing News Personalization

How can we ensure that news publishers are serving democracy in the decisions that they make and the knowledge they contribute to society? In many publishers, decisions about the structure of news personalization are made by the business side of the organization.

Madelyn tells us about future research she hopes to do. She’s looking at the means available to news readers to manage these notifications as well as policy avenues for governing news personalization.

Madelyn also thanks her funders for supporting this collaboration with Yafit Lev-Aretz: the Knight Foundation and the Tow Center for Digital Journalism.

The Third Workshop on Technology and Consumer Protection

Arvind Narayanan and I are pleased to announce that the Workshop on Technology and Consumer Protection (ConPro ’19) will return for a third year! The workshop will once again be co-located with the IEEE Symposium on Security and Privacy, occurring in May 2019.

ConPro is a forum for a diverse range of computer science research with consumer protection implications. Last year, papers covered topics ranging from online dating fraud to the readability of security guidance. Panelists and invited speakers explored topics from preventing caller-ID spoofing to protecting unique communities.

We see ConPro as a workshop in the classic sense, providing substantive feedback and new ideas. Presentations have sparked suggestions for follow-up work and collaboration opportunities. Attendees represent a wide range of research areas, spurring creative ideas and interesting conversation. For example, comments about crowdworker concerns this year led to discussion of best practices for research making use of those workers.

Although our community has grown, we aim to keep discussion and feedback a central part of the workshop. Our friends in the legal community have had some success with larger events focused on feedback and discussion, such as PLSC. We plan to take lessons from those cases.

The success of ConPro in past years—amazing research, attendees, discussion, and PCs—makes us excited for next year. The call for papers lists some relevant topics, but if you do computer science research with consumer protection implications, it’s relevant (but be sure those implications are clear). The submission deadline is January 23, 2019. We hope you’ll submit a paper and join us in San Francisco!