September 29, 2022

How the National AI Research Resource can steward the datasets it hosts

Last week I participated on a panel about the National AI Research Resource (NAIRR), a proposed computing and data resource for academic AI researchers. The NAIRR’s goal is to subsidize the spiraling costs of many types of AI research that have put them out of reach of most academic groups.

My comments on the panel were based on a recent study by researchers Kenny Peng, Arunesh Mathur, and me (NeurIPS ‘21) on the potential harms of AI. We looked at almost 1,000 research papers to analyze how they used datasets, which are the engine of AI.

Let me briefly mention just two of the many things we found, and then I’ll present some ideas for NAIRR based on our findings. First, we found that “derived datasets” are extremely common. For example, there’s a popular facial recognition dataset called Labeled Faces in the Wild, and there are at least 20 new datasets that incorporate the original data and extend it in some way. One of them adds race and gender annotations. This means that a dataset may enable new harms over time. For example, once you have race annotations, you can use it to build a model that tracks the movement of ethnic minorities through surveillance cameras, which some governments seem to be doing.

We also found that dataset creators are aware of their potential for misuse, so they often have licenses restricting their use for research and not for commercial purposes. Unfortunately, we found evidence that many companies simply get around this by downloading a model pre-trained on that dataset (in a research context) and using that model in commercial products.

Stepping back, the main takeaway from our paper is that dataset creators can sometimes — but not always — anticipate the ways in which a dataset might be used or misused in harmful ways. So we advocate for what we call dataset stewarding, which is a governance process that lasts throughout the lifecycle of a dataset. Note that some prominent datasets see active use for decades.

I think NAIRR is ideally positioned to be the steward of the datasets that it hosts, and perform a vital governance role over datasets and, in turn, over AI research. Here are a few specific things NAIRR could do, starting with the most lightweight ones.

1. NAIRR should support a communication channel between a dataset creator and the researchers who use that dataset. For example, if ethical problems — or even scientific problems — are uncovered in a dataset, it should be possible to notify users about it. As trivial as this sounds, it is not always the case today. Prominent datasets have been retracted over ethical concerns without a way to notify the people who had downloaded it.

2. NAIRR should standardize dataset citation practices, for example, by providing Digital Object Identifiers (DOIs) for datasets. We found that citation practices are chaotic, and there is currently no good way to find all the papers that use a dataset to check for misuse.

3. NAIRR could publish standardized dataset licenses. Dataset creators aren’t legal experts, and most of the licenses don’t accomplish what dataset creators want them to accomplish, enabling misuse.

4. NAIRR could require some analog of broader impact statements as part of an application for data or compute resources. Writing a broader impact statement could encourage ethical reflection by the authors. (A recent study found evidence that the NeurIPS broader impact requirement did result in authors reflecting on the societal consequences of their technical work.) Such reflection is valuable even if the statements are not actually used for decision making about who is approved. 

5. NAIRR could require some sort of ethical review of proposals. This goes beyond broader impact statements by making successful review a condition of acceptance. One promising model is the Ethics and Society Review instituted at Stanford. Most ethical issues that arise in AI research fall outside the scope of Institutional Review Boards (IRBs), so even a lightweight ethical review process could help prevent obvious-in-hindsight ethical lapses.

6. If researchers want to use a dataset to build and release a derivative dataset or pretrained model, then there should be an additional layer of scrutiny, because these involve essentially republishing the dataset. In our research, we found that this is the start of an ethical slippery slope, because data and models can be recombined in various ways and the intent of the original dataset can be lost.

7. There should be a way for people to report to NAIRR that some ethics violation is going on. The current model, for lack of anything better, is vigilante justice: journalists, advocates, or researchers sometimes identify ethical issues in datasets, and if the resulting outcry is loud enough, dataset creators feel compelled to retract or modify them. 

8. NAIRR could effectively partner with other entities that have emerged as ethical regulators. For example, conference program committees have started to incorporate ethics review. If NAIRR made it easy for peer reviewers to check the policies for any given data or compute resource, that would let them verify that a submitted paper is compliant with those policies.

There is no single predominant model for ethical review of AI research analogous to the IRB model for biomedical research. It is unlikely that one will emerge in the foreseeable future. Instead, a patchwork is taking shape. The NAIRR is set up to be a central player in AI research in the United States and, as such, bears responsibility for ensuring that the research that it supports is aligned with societal values.

——–

I’m grateful to the NAIRR task force for inviting me and to my fellow panelists and moderators for a stimulating discussion.  I’m also grateful to Sayash Kapoor and Mihir Kshirsagar, with whom I previously submitted a comment on this topic to the relevant federal agencies, and to Solon Barocas for helpful discussions.

A final note: the aims of the NAIRR have themselves been contested and are not self-evidently good. However, my comments (and the panel overall) assumed that the NAIRR will be implemented largely as currently conceived, and focused on harm mitigation.

Holding Purveyors of “Dark Patterns” for Online Travel Bookings Accountable

Last week, my former colleagues at the New York Attorney General’s Office (NYAG), scored a $2.6 million settlement with Fareportal – a large online travel agency that used deceptive practices, known as “dark patterns,” to manipulate consumers to book online travel.

The investigation exposes how Fareportal, which operates under several brands, including CheapOair and OneTravel — used a series of deceptive design tricks to pressure consumers to buy tickets for flights, hotels, and other travel purchases. In this post, I share the details of the investigation’s findings and use them to highlight why we need further regulatory intervention to prevent similar conduct from becoming entrenched in other online services.

The NYAG investigation picks up on the work of researchers at Princeton’s CITP that exposed the widespread use of dark patterns on shopping websites. Using the framework we developed in a subsequent paper for defining dark patterns, the investigation reveals how the travel agency weaponized common cognitive biases to take advantage of consumers. The company was charged under the Attorney General’s broad authority to prohibit deceptive acts and practices. In addition to paying $2.6 million, the New York City-based company agreed to reform its practices.

Specifically, the investigation documents how Fareportal exploited the scarcity bias by displaying, next to the top two flight search results, a false and misleading message about the number of tickets left for those flights at the advertised price. It manipulated consumers through adding 1 to the number of tickets the consumer had searched for to show that there were only X+1 tickets left at that price. So, if you searched for one round trip ticket from Philadelphia to Chicago, the site would say “Only 2 tickets left” at that price, while a consumer searching for two such tickets would see a message stating “Only 3 tickets left” at the advertised price. 

In 2019, Fareportal added a design feature that exploited the bandwagon effect by displaying how many other people were looking at the same deal. The site used a computer-generated random number between 28 and 45 to show the number of other people “looking” at the flight. It paired this with a false countdown timer that displayed an arbitrary number that was unrelated to the availability of tickets. 

Similarly, Fareportal exported its misleading tactics to the making of hotel bookings on its mobile apps. The apps misrepresented the percentage of rooms shown that were “reserved” by using a computer-generated number keyed to when the customer was trying to book a room. So, for example, if the check-in date was 16-30 days away, the message would indicate that between 41-70% of the hotel rooms were booked, but if it was less than 7 days away, it showed that 81-99% of the rooms were reserved. But, of course, those percentages were pure fiction. The apps used a similar tactic for displaying the number of people “viewing” hotels in the area. This time, they generated the number based on the nightly rate for the fifth hotel returned in the search by using the difference between the numerical value of the dollar figure and the numerical value of the cents figure. (If the rate was $255.63, consumers were told 192 people were viewing the hotel listings in the area.)

Fareportal used these false scarcity indicators across its websites and mobile platforms for pitching products such as travel protection and seat upgrades, through inaccurately representing how many other consumers that had purchased the product in question. 

In addition, the NYAG charged Fareportal with using a pressure tactic of making consumers accept or decline purchase a travel protection policy to “protect the cost of [their] trip” before completing a purchase. This practice is described in the academic literature as a covert pattern that uses “confirmshaming” and “forced action” to influence choices. 

Finally, the NYAG took issue with how Fareportal manipulated price comparisons to suggest it was offering tickets at a discounted price, when in fact, most of the advertised tickets were never offered for sale at the higher comparison price. The NYAG rejected Fareportal’s attempt to use a small pop-up to cure the false impression conveyed by the visual slash-through image that conveyed the discount. Similarly, the NYAG called out how Fareportal hid its service fees by disguising them as being part of the “Base Price” of the ticket rather than the separate line item for “Taxes and Fees.” These tactics are described in the academic literature as using “misdirection” and “information hiding” to influence consumers. 


The findings from this investigation illustrate why dark patterns are not simply aggressive marketing practices, as some commentators contend, but require regulatory intervention. Specifically, such shady practices are difficult for consumers to spot and to avoid, and, as we argued, risk becoming entrenched across different travel sites who have the incentive to adopt similar practices. As a result, Fareportal, unfortunately, will not be the first or the last online service to deploy such tactics. But this creates an opportunity for researchers, consumer advocates, and design whistleblowers to step forward and spotlight such practices to protect consumers and help create a more trustworthy internet.    

Can Classes on Field Experiments Scale? Lessons from SOC412

Last semester, I taught a Princeton undergrad/grad seminar on the craft, politics, and ethics of behavioral experimentation. The idea was simple: since large-scale human subjects research is now common outside universities, we need to equip students to make sense of that kind of power and think critically about it.

In this post, I share lessons for teaching a class like this and how I’m thinking about next year.

Path diagram from SOC412 lecture on the Social Media Color Experiment

[Read more…]

Teaching the Craft, Ethics, and Politics of Field Experiments

How can we manage the politics and ethics of large-scale online behavioral research? When this question came up in April during a forum on Defending Democracy at Princeton, Ed Felten mentioned on stage that I was teaching a Princeton undergrad class on this very topic. No pressure!

Ed was right about the need: people with undergrad computer science degrees routinely conduct large-scale behavioral experiments affecting millions or billions of people. Since large-scale human subjects research is now common, universities need to equip students to make sense of and think critically about that kind of power.

[Read more…]

Princeton Dialogues of AI and Ethics: Launching case studies

Summary: We are releasing four case studies on AI and ethics, as part of the Princeton Dialogues on AI and Ethics.

The impacts of rapid developments in artificial intelligence (“AI”) on society—both real and not yet realized—raise deep and pressing questions about our philosophical ideals and institutional arrangements. AI is currently applied in a wide range of fields—such as medical diagnosis, criminal sentencing, online content moderation, and public resource management—but it is only just beginning to realize its potential to influence practically all areas of human life, including geopolitical power balances. As these technologies advance and increasingly come to mediate our everyday lives, it becomes necessary to consider how they may reflect prevailing philosophical perspectives and preferences. We must also assess how the architectural design of AI technologies today might influence human values in the future. This step is essential in order to identify the positive opportunities presented by AI and unleash these technologies’ capabilities in the most socially advantageous way possible while being mindful of potential harms. Critics question the extent to which individual engineers and proprietors of AI should take responsibility for the direction of these developments, or whether centralized policies are needed to steer growth and incentives in the right direction. What even is the right direction? How can it be best achieved?

Princeton’s University Center for Human Values (UCHV) and the Center for Information Technology Policy (CITP) are excited to announce a joint research project, “The Princeton Dialogues on AI and Ethics,” in the emerging field of artificial intelligence (broadly defined) and its interaction with ethics and political theory. The aim of this project is to develop a set of intellectual reasoning tools to guide practitioners and policy makers, both current and future, in developing the ethical frameworks that will ultimately underpin their technical and legislative decisions. More than ever before, individual-level engineering choices are poised to impact the course of our societies and human values. And yet there have been limited opportunities for AI technology actors, academics, and policy makers to come together to discuss these outcomes and their broader social implications in a systematic fashion. This project aims to provide such opportunities for interdisciplinary discussion, as well as in-depth reflection.

We convened two invitation-only workshops in October 2017 and March 2018, in which philosophers, political theorists, and machine learning experts met to assess several real-world case studies that elucidate common ethical dilemmas in the field of AI. The aim of these workshops was to facilitate a collaborative learning experience which enabled participants to dive deeply into the ethical considerations that ought to guide decision-making at the engineering level and highlight the social shifts they may be affecting. The first outcomes of these deliberations have now been published in the form of case studies. To access these educational materials, please see our dedicated website https://aiethics.princeton.edu. These cases are intended for use across university departments and in corporate training in order to equip the next generation of engineers, managers, lawyers, and policy makers with a common set of reasoning tools for working on AI governance and development.

In March 2018, we also hosted a public conference, titled “AI & Ethics,” where interested academics, policy makers, civil society advocates, and private sector representatives from diverse fields came to Princeton to discuss topics related to the development and governance of AI: “International Dimensions of AI” and “AI and Its Democratic Frontiers”. This conference sought to use the ethics and engineering knowledge foundations developed through the initial case studies to inspire discussion on AI technology’s wider social effects.

This project is part of a wider effort at Princeton University to investigate the intersection between AI technology, politics, and philosophy. There is a particular emphasis on the ways in which the interconnected forces of technology and its governance simultaneously influence and are influenced by the broader social structures in which they are situated. The Princeton Dialogues on AI and Ethics makes use of the university’s exceptional strengths in computer science, public policy, and philosophy. The project also seeks opportunities for cooperation with existing projects in and outside of academia.