September 26, 2021

Facebook’s Illusory Promise of Transparency

By Orestis Papakyriakopoulos, Ashley Gorham, Eli Lucherini, Mihir Kshirsagar, and Arvind Narayanan.

Facebook’s latest move to obstruct academic research about its platform by disabling NYU’s Ad Observatory is deeply troubling. While Facebook claims to offer researchers access to its FORT Researcher Platform as an alternative, that is an illusory offer as we have recently learned first hand in connection with our ongoing research project that studies how the social media platforms amplified or moderated the distribution of political ads in the 2020 U.S. elections.

As part of our research, in March 2021, we attempted to gain access to the FORT dataset. We were told by Facebook that we had to sign a “strictly non-negotiable” agreement that was “mandated by Cambridge Analytica and the FTC.” We pushed back on this ‘take-it-or-leave-it’ approach, noting that there was nothing in the consent decree that mandated such an agreement. Facebook later conceded in a subsequent email that they were under no legal mandate and that their approach was simply based on their internal business justification. 

We then continued to attempt to negotiate the terms of access with Facebook. In particular, a few clauses in the agreement were problematic for us. The most prominent one was a pre-publication review. We sought to clarify whether Facebook would assert that information about how the Facebook advertising platform was used to target political ads in the 2020 elections is “Confidential Information” that the agreement would allow them to “remove” from our publication. Understandably, we did not want to expend time on research without some assurance that we could publish our work without Facebook’s permission. Indeed, as we subsequently discovered, one project had negotiated to exclude such a clause. But Facebook has, to date, not explained its position to us on the pre-publication review.

Separately, we had a more basic question about what additional data fields were available to researchers through the FORT Platform and whether there were any restrictions on the types of tools we could use to analyze the data. Despite promising that they would get back to us “shortly,” we are still waiting for a response since May, despite following up diligently. 

Our experience dealing with Facebook highlights their long running pattern of misdirection and doublespeak to dodge meaningful scrutiny of their actions. While researchers and investigative journalists have other means of analyzing the platform’s practices (e.g., Citizen Browser and Mozilla Rally), the reality is that Facebook has control over the information that the public needs to understand its powerful role in our society. And, if Facebook continues to hide behind illusory offers, we need legislation to force them to provide meaningful access. 

Studying the societal impact of recommender systems using simulation

By Eli Lucherini, Matthew Sun, Amy Winecoff, and Arvind Narayanan.

For those interested in the impact of recommender systems on society, we are happy to share several new pieces:

  • a software tool for studying this interface via simulation
  • the accompanying paper
  • a short piece on methodological concerns in simulation research
  • a talk offering a critical take on research on filter bubbles.

We elaborate below.

Simulation is a valuable way to study the societal impact of recommender systems.

Recommender systems in social media platforms such as Facebook and Twitter have been criticized due to the risks they might pose to society, such as amplifying misinformation or creating filter bubbles. But there isn’t yet consensus on the scope of these concerns, the underlying factors, or ways to remedy them. Because these phenomena arise through repeated system interactions over time, methods that assess the system at a single time point provide minimal insight into the mechanisms behind them. In contrast, simulations can model how users, items, and algorithms interact over arbitrarily long timescales. As a result, simulation has proved to be a valuable tool in assessing the impact of recommendation systems on the content users consume and on society.

This is a burgeoning area of research. We identified over a dozen studies that use simulation to study questions such as filter bubbles and misinformation. As an example of a study we admire, Chaney et al. illustrate the detrimental effects of algorithmic confounding, which occurs when a recommendation algorithm is trained on user interaction data that is itself influenced by the prior recommendations of the algorithm. Like all simulation research, this is a statement about a model and not a real platform. But the benefit is that it helps isolate the variables of interest so that relationships between them can be probed deeply in a way that improves our scientific understanding of these systems.

T-RECS: A new tool for simulating recommender systems

So far, most simulation studies of algorithmic systems have relied upon ad-hoc code implemented from scratch, which is time consuming, raises the likelihood of bugs, and limits reproducibility. We present T-RECS (Tools for RECommender system Simulation), an open-source simulation tool designed to enable investigations of emerging complex phenomena caused by millions of individual actions and interactions in algorithmic systems including filter bubbles, political polarization, and (mis)information diffusion. In the accompanying paper, we describe its design in detail and present two case studies.

T-RECS is flexible and can simulate just about any system in which “users” interact with “items” mediated by an algorithm. This is broader than just recommender systems: for example, we used T-RECS to reproduce a study on the virality of online content. T-RECS also supports two-sided platforms, i.e., those that include both users and content creators. The system is not limited to social media either: it can also be used to study music recommender systems or e-commerce platforms. With T-RECS, researchers with expertise in social science but limited engineering expertise can still leverage simulation to answer important questions about the societal effects of algorithmic systems.

What’s wrong with current recsys simulation research?

In a companion paper to T-RECS, we offer a methodological critique of current recommender systems simulation research. First, we observe that each paper tends to operationalize constructs such as polarization in subtly different ways. Despite seemingly minor differences, the effects may be vastly different, making comparisons between papers infeasible. We acknowledge that this is natural in the early stages of a discipline and is not necessarily a crisis by itself. Unfortunately, we also observe low transparency: papers do not specify their constructs in enough detail to allow others to reproduce and build on them, and practices such as sharing code and data are not yet the norm in this community.

We advocate for the adoption of software tools such as T-RECS that would help address both issues. Researchers would be able to draw upon a standard library of models and constructs. Further, they would be easily able to share reproduction materials as notebooks, containing code, data, results, and documentation packaged together.

Why do we need simulation, again?

Given that it is tricky to do simulation correctly and even harder to do it in a way that allows us to draw meaningful conclusions that apply to the real world, one may wonder why we need simulation for understanding the societal impacts of recommender systems at all. Why not stick with auditing or observational studies of real platforms? A notable example of such a study is “Exposure to ideologically diverse news and opinion on Facebook” by Bakshy et al. The study found that while Facebook’s users primarily consume ideologically-aligned content, the role of Facebook’s news feed algorithm is minimal compared to users’ own choices.

In a recent talk, one of us (Narayanan) discussed the limitations of quantitative studies of real platforms, focusing on the question of filter bubbles. The argument is this: the question of interest is causal in nature, but we can’t answer causal questions because the entire system evolves as one unit over a long period of time. Faced with this inherent limitation, studies such as the Facebook study above inevitably study very narrow versions of the question, focusing on a snapshot in time and ignoring feedback loops and other complications. Thus, while there is nothing wrong with these studies, they tell us little about the questions we really care about, and yet are widely misinterpreted to mean more than they do.

In conclusion, every available method for studying the societal impact of recommender systems has severe limitations. Yet this is an urgent question with enormous consequences; the study of these questions has been called a crisis discipline. We need every tool in the toolbox, even if none is perfect for the job. We need auditing and observational studies; we need qualitative studies; and we need simulation. Through T-RECS and its accompanying papers, we hope to both systematize research in this area and provide foundational infrastructure.

Warnings That Work: Combating Misinformation Without Deplatforming

Ben Kaiser, Jonathan Mayer, and J. Nathan Matias

This post originally appeared on Lawfare.

“They’re killing people.” President Biden lambasted Facebook last week for allowing vaccine misinformation to proliferate on its platform. Facebook issued a sharp rejoinder, highlighting the many steps it has taken to promote accurate public health information and expressing angst about government censorship.

Here’s the problem: Both are right. Five years after Russia’s election meddling, and more than a year into the COVID-19 pandemic, misinformation remains far too rampant on social media. But content removal and account deplatforming are blunt instruments fraught with free speech implications. Both President Biden and Facebook have taken steps to dial down the temperature since last week’s dustup, but the fundamental problem remains: How can platforms effectively combat misinformation with steps short of takedowns? As our forthcoming research demonstrates, providing warnings to users can make a big difference, but not all warnings are created equal.

The theory behind misinformation warnings is that if a social media platform provides an informative notice to a user, that user will then make more informed decisions about what information to read and believe. In the terminology of free speech law and policy, warnings could act as a form of counterspeech for misinformation. Facebook recognized as early as 2017 that warnings could alert users to untrustworthy content, provide relevant facts, and give context that helps users avoid being misinformed. Since then, Twitter, YouTube, and other platforms have adopted warnings as a primary tool for responding to misinformation about COVID-19, elections, and other contested topics.

But as academic researchers who study online misinformation, we unfortunately see little evidence that these types of misinformation warnings are working. Study after study has shown minimal effects for common warning designs. In our own laboratory research, appearing at next month’s USENIX Security Symposium, we found that many study participants didn’t even notice typical warnings—and when they did, they ignored the notices. Platforms sometimes claim the warnings work, but the drips of data they’ve released are unconvincing.

The fundamental problem is that social media platforms rely predominantly on “contextual” warnings, which appear alongside content and provide additional information as context. This is the exact same approach that software vendors initially took 20 years ago with security warnings, and those early warning designs consistently failed to protect users from vulnerabilities, scams, and malware. Researchers eventually realized that not only did contextual warnings fail to keep users safe, but they also formed a barrage of confusing indicators and popups that users learned to ignore or dismiss. Software vendors responded by collaborating closely with academic researchers to refine warnings and converge on measures of success; a decade of effort culminated in modern warnings that are highly effective and protect millions of users from security threats every day.

Social media platforms could have taken a similar approach, with transparent and fast-paced research. If they had, perhaps we would now have effective warnings to curtail the spread of vaccine misinformation. Instead, with few exceptions, platforms have chosen incrementalism over innovation. The latest warnings from Facebook and Twitter, and previews of forthcoming warnings, are remarkably similar in design to warnings Facebook deployed and then discarded four years ago. Like most platform warnings, these designs feature small icons, congenial styling, and discreet placement below offending content.

When contextual security warnings flopped, especially in web browsers, designers looked for alternatives. The most important development has been a new format of warning that interrupts users’ actions and forces them to make a choice about whether to continue. These “interstitial” warnings are now the norm in web browsers and operating systems.

In our forthcoming publication—a collaboration with Jerry Wei, Eli Lucherini, and Kevin Lee—we aimed to understand how contextual and interstitial disinformation warnings affect user beliefs and information-seeking behavior. We adapted methods from security warnings research, designing two studies where participants completed fact-finding tasks and periodically encountered disinformation warnings. We placed warnings on search results, as opposed to social media posts, to provide participants with a concrete goal (finding information) and multiple pathways to achieve that goal (different search results). This let us measure behavioral effects with two metrics: clickthrough, the rate at which participants bypassed the warnings, and the number of alternative visits, where after seeing a warning, a participant checked at least one more source before submitting an answer.

In the first study, we found that laboratory participants rarely noticed contextual disinformation warnings in Google Search results, and even more rarely took the warnings into consideration. When searching for information, participants overwhelmingly clicked on sources despite contextual warnings, and they infrequently visited alternative sources. In post-task interviews, more than two-thirds of participants told us they didn’t even realize they had encountered a warning.

For our second study, we hypothesized that interstitial warnings could be more effective. We recruited hundreds of participants on Mechanical Turk for another round of fact-finding tasks, this time using a simulated search engine to control the search queries and results. Participants could find the facts by clicking on relevant-looking search results, but they would first be interrupted by an interstitial warning, forcing them to choose whether to continue or go back to the search results. 

The results were stunning: Interstitial warnings dramatically changed what users chose to read. Users overwhelmingly noticed the warnings, considered the warnings, and then either declined to read the flagged content or sought out alternative information to verify it. Importantly, users also understood the interstitial warnings. When presented with an explanation in plain language, participants correctly described both why the warning appeared and what risk the warning was highlighting.

Platforms do seem to be—slowly—recognizing the promise of interstitial misinformation warnings. Facebook, Twitter, and Reddit have tested full-page interstitial warnings similar to the security warnings that inspired our work, and the platforms have also deployed other formats of interstitials. The “windowshade” warnings that Instagram pioneered are a particularly thoughtful design. Platforms are plainly searching for misinformation responses that are more effective than contextual warnings but also less problematic than permanent deplatforming. Marjorie Taylor Greene’s vaccine misinformation, for example, recently earned her a brief, 12-hour suspension from Twitter, restrictions on engagement with her tweets, and contextual warnings—an ensemble approach to content moderation.

But platforms remain extremely tentative with interstitial warnings. For the vast majority of mis- and disinformation that platforms identify, they still either apply tepid contextual warnings or resort to harsher moderation tools like deleting content or banning accounts.

Platforms may be concerned that interstitial warnings are too forceful, and that they go beyond counterspeech by nudging users to avoid misinformation. But the point is to have a spectrum of content moderation tools to respond to the spectrum of harmful content. Contextual warnings may be appropriate for lower-risk misinformation, and deplatforming may be the right move for serial disinformers. Interstitial warnings are a middle-ground option that deserve a place in the content moderation toolbox. Remember last year, when Twitter blocked a New York Post story from being shared because it appeared to be sourced from hacked materials? Amid cries of censorship, Twitter relented and simply labeled the content. An interstitial warning would have straddled that gulf, allowing the content on the platform while still making sure users knew the article was questionable. 

What platforms should pursue—and the Biden-Harris administration could constructively encourage—is an agenda of aggressive experimentalism to combat misinformation. Much like software vendors a decade ago, platforms should be rapidly trying out new approaches, publishing lessons learned, and collaborating closely with external researchers. Experimentation can also shed light on why certain warning designs work, informing free speech considerations. Misinformation is a public crisis that demands bold action and platform cooperation. In advancing the science of misinformation warnings, the government and platforms should see an opportunity for common ground.

We thank Alan Rozenshtein, Ross Teixeira and Rushi Shah for valuable suggestions on this piece. All views are our own.