December 9, 2022

An Introduction to My Project: Algorithmic Amplification and Society

This article was originally published on the Knight Institute website at Columbia University.

The distribution of online speech today is almost wholly algorithm-mediated. To talk about speech, then, we have to talk about algorithms. In computer science, the algorithms driving social media are called recommendation systems, and they are the secret sauce behind Facebook and YouTube, with TikTok more recently showing the power of an almost purely algorithm-driven platform.

Relatively few technologists participate in the debates on the societal and legal implications of these algorithms. As a computer scientist, that makes me excited about the opportunity to help fill this gap by collaborating with the Knight First Amendment Institute at Columbia University as a visiting senior research scientist — I’m on sabbatical leave from Princeton this academic year. Over the course of the year, I’ll lead a major project at the Knight Institute focusing on algorithmic amplification.

This is a new topic for me, but it is at the intersection of many that I’ve previously worked on. My broad area is the societal impact of AI (I’m wrapping up a book on machine learning and fairness, and writing one about AI’s limits). I’ve done technical work to understand the social implications of recommender systems. And finally, I’ve done extensive research on platform accountability, including privacymisleading content, and ads.

Much of my writing will be about algorithmic amplification: roughly, the fact that algorithms increase the reach of some speech and suppress others. The term amplification is caught up in a definitional thicket. It’s tempting to define amplification with respect to some imagined neutral, but there is no neutral, because today’s speech platforms couldn’t exist in a recognizable form without algorithms at their core. Having previously worked on privacy and fairness—two terms that notoriously resist a consensus definition—I don’t see this as a problem. There are many possible definitions of amplification, and the most suitable definition will vary depending on the exact question one wants to answer.

It’s important to talk about amplification and to explore its variations. Much of the debate over online speech, particularly the problem of mis/disinformation, reduces the harms of social media platforms to a false binary—what should be taken down, what should be left up. However, the logic of engagement optimization rewards the production of content that may not be well-aligned with societal benefit, even if it’s not harmful enough to be taken down. This manifests differently in different areas. Instagram and TikTok have changed the nature of travel, with hordes of tourists in some cases trampling on historic or religious sites to make videos in the hopes of going viral. In science, facts or figures in papers can be selectively quoted out of context in service of a particular narrative.

Speech platforms are complex systems whose behavior emerges from feedback loops involving platform design and user behavior. As such, our techniques for studying them are in their infancy, and are further scuttled by the limited researcher access that platforms provide. I hope to both advocate for more access and transparency, and to push back against oversimplified narratives that have sometimes emerged from the research literature.

My output, in collaboration with the Knight Institute and others at Columbia, will take three forms: an essay series, a set of videos and interactives to illustrate technical concepts, and a major symposium in spring 2023. An announcement about the symposium is coming shortly.

How the National AI Research Resource can steward the datasets it hosts

Last week I participated on a panel about the National AI Research Resource (NAIRR), a proposed computing and data resource for academic AI researchers. The NAIRR’s goal is to subsidize the spiraling costs of many types of AI research that have put them out of reach of most academic groups.

My comments on the panel were based on a recent study by researchers Kenny Peng, Arunesh Mathur, and me (NeurIPS ‘21) on the potential harms of AI. We looked at almost 1,000 research papers to analyze how they used datasets, which are the engine of AI.

Let me briefly mention just two of the many things we found, and then I’ll present some ideas for NAIRR based on our findings. First, we found that “derived datasets” are extremely common. For example, there’s a popular facial recognition dataset called Labeled Faces in the Wild, and there are at least 20 new datasets that incorporate the original data and extend it in some way. One of them adds race and gender annotations. This means that a dataset may enable new harms over time. For example, once you have race annotations, you can use it to build a model that tracks the movement of ethnic minorities through surveillance cameras, which some governments seem to be doing.

We also found that dataset creators are aware of their potential for misuse, so they often have licenses restricting their use for research and not for commercial purposes. Unfortunately, we found evidence that many companies simply get around this by downloading a model pre-trained on that dataset (in a research context) and using that model in commercial products.

Stepping back, the main takeaway from our paper is that dataset creators can sometimes — but not always — anticipate the ways in which a dataset might be used or misused in harmful ways. So we advocate for what we call dataset stewarding, which is a governance process that lasts throughout the lifecycle of a dataset. Note that some prominent datasets see active use for decades.

I think NAIRR is ideally positioned to be the steward of the datasets that it hosts, and perform a vital governance role over datasets and, in turn, over AI research. Here are a few specific things NAIRR could do, starting with the most lightweight ones.

1. NAIRR should support a communication channel between a dataset creator and the researchers who use that dataset. For example, if ethical problems — or even scientific problems — are uncovered in a dataset, it should be possible to notify users about it. As trivial as this sounds, it is not always the case today. Prominent datasets have been retracted over ethical concerns without a way to notify the people who had downloaded it.

2. NAIRR should standardize dataset citation practices, for example, by providing Digital Object Identifiers (DOIs) for datasets. We found that citation practices are chaotic, and there is currently no good way to find all the papers that use a dataset to check for misuse.

3. NAIRR could publish standardized dataset licenses. Dataset creators aren’t legal experts, and most of the licenses don’t accomplish what dataset creators want them to accomplish, enabling misuse.

4. NAIRR could require some analog of broader impact statements as part of an application for data or compute resources. Writing a broader impact statement could encourage ethical reflection by the authors. (A recent study found evidence that the NeurIPS broader impact requirement did result in authors reflecting on the societal consequences of their technical work.) Such reflection is valuable even if the statements are not actually used for decision making about who is approved. 

5. NAIRR could require some sort of ethical review of proposals. This goes beyond broader impact statements by making successful review a condition of acceptance. One promising model is the Ethics and Society Review instituted at Stanford. Most ethical issues that arise in AI research fall outside the scope of Institutional Review Boards (IRBs), so even a lightweight ethical review process could help prevent obvious-in-hindsight ethical lapses.

6. If researchers want to use a dataset to build and release a derivative dataset or pretrained model, then there should be an additional layer of scrutiny, because these involve essentially republishing the dataset. In our research, we found that this is the start of an ethical slippery slope, because data and models can be recombined in various ways and the intent of the original dataset can be lost.

7. There should be a way for people to report to NAIRR that some ethics violation is going on. The current model, for lack of anything better, is vigilante justice: journalists, advocates, or researchers sometimes identify ethical issues in datasets, and if the resulting outcry is loud enough, dataset creators feel compelled to retract or modify them. 

8. NAIRR could effectively partner with other entities that have emerged as ethical regulators. For example, conference program committees have started to incorporate ethics review. If NAIRR made it easy for peer reviewers to check the policies for any given data or compute resource, that would let them verify that a submitted paper is compliant with those policies.

There is no single predominant model for ethical review of AI research analogous to the IRB model for biomedical research. It is unlikely that one will emerge in the foreseeable future. Instead, a patchwork is taking shape. The NAIRR is set up to be a central player in AI research in the United States and, as such, bears responsibility for ensuring that the research that it supports is aligned with societal values.

——–

I’m grateful to the NAIRR task force for inviting me and to my fellow panelists and moderators for a stimulating discussion.  I’m also grateful to Sayash Kapoor and Mihir Kshirsagar, with whom I previously submitted a comment on this topic to the relevant federal agencies, and to Solon Barocas for helpful discussions.

A final note: the aims of the NAIRR have themselves been contested and are not self-evidently good. However, my comments (and the panel overall) assumed that the NAIRR will be implemented largely as currently conceived, and focused on harm mitigation.

Faculty search in information technology policy

I’m happy to announce that Princeton University is recruiting a faculty member in information technology policy. The position is open rank — assistant, associate, or full professor — and we welcome applicants from any relevant discipline. The successful candidate will likely be jointly appointed in the School of Public and International Affairs and a disciplinary department, and be part of the CITP community.

We are reviewing applications on a rolling basis. We encourage interested candidates to apply by December. Apply here.

If you have questions about the position, you are welcome to reach out to me at .

Studying the societal impact of recommender systems using simulation

By Eli Lucherini, Matthew Sun, Amy Winecoff, and Arvind Narayanan.

For those interested in the impact of recommender systems on society, we are happy to share several new pieces:

  • a software tool for studying this interface via simulation
  • the accompanying paper
  • a short piece on methodological concerns in simulation research
  • a talk offering a critical take on research on filter bubbles.

We elaborate below.

Simulation is a valuable way to study the societal impact of recommender systems.

Recommender systems in social media platforms such as Facebook and Twitter have been criticized due to the risks they might pose to society, such as amplifying misinformation or creating filter bubbles. But there isn’t yet consensus on the scope of these concerns, the underlying factors, or ways to remedy them. Because these phenomena arise through repeated system interactions over time, methods that assess the system at a single time point provide minimal insight into the mechanisms behind them. In contrast, simulations can model how users, items, and algorithms interact over arbitrarily long timescales. As a result, simulation has proved to be a valuable tool in assessing the impact of recommendation systems on the content users consume and on society.

This is a burgeoning area of research. We identified over a dozen studies that use simulation to study questions such as filter bubbles and misinformation. As an example of a study we admire, Chaney et al. illustrate the detrimental effects of algorithmic confounding, which occurs when a recommendation algorithm is trained on user interaction data that is itself influenced by the prior recommendations of the algorithm. Like all simulation research, this is a statement about a model and not a real platform. But the benefit is that it helps isolate the variables of interest so that relationships between them can be probed deeply in a way that improves our scientific understanding of these systems.

T-RECS: A new tool for simulating recommender systems

So far, most simulation studies of algorithmic systems have relied upon ad-hoc code implemented from scratch, which is time consuming, raises the likelihood of bugs, and limits reproducibility. We present T-RECS (Tools for RECommender system Simulation), an open-source simulation tool designed to enable investigations of emerging complex phenomena caused by millions of individual actions and interactions in algorithmic systems including filter bubbles, political polarization, and (mis)information diffusion. In the accompanying paper, we describe its design in detail and present two case studies.

T-RECS is flexible and can simulate just about any system in which “users” interact with “items” mediated by an algorithm. This is broader than just recommender systems: for example, we used T-RECS to reproduce a study on the virality of online content. T-RECS also supports two-sided platforms, i.e., those that include both users and content creators. The system is not limited to social media either: it can also be used to study music recommender systems or e-commerce platforms. With T-RECS, researchers with expertise in social science but limited engineering expertise can still leverage simulation to answer important questions about the societal effects of algorithmic systems.

What’s wrong with current recsys simulation research?

In a companion paper to T-RECS, we offer a methodological critique of current recommender systems simulation research. First, we observe that each paper tends to operationalize constructs such as polarization in subtly different ways. Despite seemingly minor differences, the effects may be vastly different, making comparisons between papers infeasible. We acknowledge that this is natural in the early stages of a discipline and is not necessarily a crisis by itself. Unfortunately, we also observe low transparency: papers do not specify their constructs in enough detail to allow others to reproduce and build on them, and practices such as sharing code and data are not yet the norm in this community.

We advocate for the adoption of software tools such as T-RECS that would help address both issues. Researchers would be able to draw upon a standard library of models and constructs. Further, they would be easily able to share reproduction materials as notebooks, containing code, data, results, and documentation packaged together.

Why do we need simulation, again?

Given that it is tricky to do simulation correctly and even harder to do it in a way that allows us to draw meaningful conclusions that apply to the real world, one may wonder why we need simulation for understanding the societal impacts of recommender systems at all. Why not stick with auditing or observational studies of real platforms? A notable example of such a study is “Exposure to ideologically diverse news and opinion on Facebook” by Bakshy et al. The study found that while Facebook’s users primarily consume ideologically-aligned content, the role of Facebook’s news feed algorithm is minimal compared to users’ own choices.

In a recent talk, one of us (Narayanan) discussed the limitations of quantitative studies of real platforms, focusing on the question of filter bubbles. The argument is this: the question of interest is causal in nature, but we can’t answer causal questions because the entire system evolves as one unit over a long period of time. Faced with this inherent limitation, studies such as the Facebook study above inevitably study very narrow versions of the question, focusing on a snapshot in time and ignoring feedback loops and other complications. Thus, while there is nothing wrong with these studies, they tell us little about the questions we really care about, and yet are widely misinterpreted to mean more than they do.

In conclusion, every available method for studying the societal impact of recommender systems has severe limitations. Yet this is an urgent question with enormous consequences; the study of these questions has been called a crisis discipline. We need every tool in the toolbox, even if none is perfect for the job. We need auditing and observational studies; we need qualitative studies; and we need simulation. Through T-RECS and its accompanying papers, we hope to both systematize research in this area and provide foundational infrastructure.

Can the exfiltration of personal data by web trackers be stopped?

by Günes Acar, Steven Englehardt, and Arvind Narayanan.

In a series of posts on this blog in 2017/18, we revealed how web trackers exfiltrate personal information from web pages, browser password managers, form inputs, and the Facebook Login API. Our findings resulted in many fixes and privacy improvements to browsers, websites, third parties, and privacy protection tools. However, the root causes of these privacy failures remain unchanged, because they stem from the design of the web itself. In a paper at the 2020 Privacy Enhancing Technologies Symposium, we recap our findings and propose two potential paths forward.

Root causes of failures

In the two years since our research, no fundamental fixes have been implemented that will stop third parties from exfiltrating personal information. Thus, it remains possible that similar privacy vulnerabilities have been introduced even as the specific ones we identified have mostly been fixed. Here’s why.

The web’s security model allows a website to either fully trust a third party (by including the third-party script in a first party context) or not at all (by isolating the third party resource in an iframe). Unfortunately, this model does not capture the range of trust relationships that we see on the web. Since many third parties cannot provide their services (such as analytics) if they are isolated, the first parties have no choice but to give them full privileges even though they do not trust them fully.

The model also assumes transitivity of trust. But in reality, a user may trust a website and the website may trust a third party, but the user may not trust the third party. 

Another root cause is the economic reality of the web that drives publishers to unthinkingly adopt code from third parties. Here’s a representative quote from the marketing materials of FullStory, one of the third parties we found exfiltrating personal information:

Set-up is a thing of beauty. To get started, place one small snippet of code on your site. That’s it. Forever. Seriously.

A key reason for the popularity of these third parties is that the small publishers that rely on them lack the budget to hire technical experts internally to replicate their functionality. Unfortunately, while it’s possible to effortlessly embed a third party, auditing or even understanding the privacy implications requires time and expertise.

This may also help explain the limited adoption of technical solutions such as Caja or ConScript that better capture the partial trust relationship between first and third parties. Unfortunately, such solutions require the publisher to reason carefully about the privileges and capabilities provided to each individual third party. 

Two potential paths forward

In the absence of a fundamental fix, there are two potential approaches to minimize the risk of large-scale exploitation of these vulnerabilities. One approach is to regularly scan the web. After all, our work suggests that this kind of large-scale detection is possible and that vulnerabilities will be fixed when identified.

The catch is that it’s not clear who will do the thankless, time-consuming, and technically complex work of running regular scans, detecting vulnerabilities and leaks, and informing thousands of affected parties. As researchers, our role is to develop the methods and tools to do so, but running them on a regular basis does not constitute publishable research. While we have undertaken significant efforts to maintain our scanning tool OpenWPM as a resource for the community and make scan data available, regular and comprehensive vulnerability detection requires far more resources than we can afford.

In fact, in addition to researchers, many other groups have missions that are aligned with the goal of large-scale web privacy investigations: makers of privacy-focused browsers and mobile apps, blocklist maintainers, advocacy groups, and investigative journalists. Some examples of efforts that we’re aware of include Mozilla’s maintenance of OpenWPM since 2018, Brave’s research on tracking protection, and DuckDuckGo’s dataset about trackers. However, so far, these efforts fall short of addressing the full range of web privacy vulnerabilities, and data exfiltration in particular. Perhaps a coalition of these groups can create the necessary momentum.

The second approach is legal, namely, regulation that incentivizes first parties to take responsibility for the privacy violations on their websites. This is in contrast to the reactive approach above which essentially shifts the cost of privacy to public-interest groups or other external entities.

Indeed, our findings represent potential violations of existing laws, notably the GDPR in the EU and sectoral laws such as HIPAA (healthcare) and FERPA (education) in the United States. However, it is unclear whether the first parties or the third parties would be liable. According to several session replay companies, they are merely data processors and not joint controllers under the GDPR. In addition, since there are a large number of first parties exposing user data and each violation is relatively small in scope, regulators have not paid much attention to these types of privacy failures. In fact, they risk effectively normalizing such privacy violations due to the lack of enforcement. Thus, either stepped-up enforcement of existing laws or new laws that establish stricter rules could shift incentives, resulting in stronger preventive measures and regular vulnerability scanning by web developers themselves. 

Either approach will be an uphill battle. Regularly scanning the web requires a source of funding, whereas stronger regulation and enforcement will raise the cost of running websites and will face opposition from third-party service providers reliant on the current lax approach. But fighting uphill battles is nothing new for privacy advocates and researchers.

Update: Here’s the video of Günes Acar’s talk about this research at PETS 2020.