July 3, 2022

How the National AI Research Resource can steward the datasets it hosts

Last week I participated on a panel about the National AI Research Resource (NAIRR), a proposed computing and data resource for academic AI researchers. The NAIRR’s goal is to subsidize the spiraling costs of many types of AI research that have put them out of reach of most academic groups.

My comments on the panel were based on a recent study by researchers Kenny Peng, Arunesh Mathur, and me (NeurIPS ‘21) on the potential harms of AI. We looked at almost 1,000 research papers to analyze how they used datasets, which are the engine of AI.

Let me briefly mention just two of the many things we found, and then I’ll present some ideas for NAIRR based on our findings. First, we found that “derived datasets” are extremely common. For example, there’s a popular facial recognition dataset called Labeled Faces in the Wild, and there are at least 20 new datasets that incorporate the original data and extend it in some way. One of them adds race and gender annotations. This means that a dataset may enable new harms over time. For example, once you have race annotations, you can use it to build a model that tracks the movement of ethnic minorities through surveillance cameras, which some governments seem to be doing.

We also found that dataset creators are aware of their potential for misuse, so they often have licenses restricting their use for research and not for commercial purposes. Unfortunately, we found evidence that many companies simply get around this by downloading a model pre-trained on that dataset (in a research context) and using that model in commercial products.

Stepping back, the main takeaway from our paper is that dataset creators can sometimes — but not always — anticipate the ways in which a dataset might be used or misused in harmful ways. So we advocate for what we call dataset stewarding, which is a governance process that lasts throughout the lifecycle of a dataset. Note that some prominent datasets see active use for decades.

I think NAIRR is ideally positioned to be the steward of the datasets that it hosts, and perform a vital governance role over datasets and, in turn, over AI research. Here are a few specific things NAIRR could do, starting with the most lightweight ones.

1. NAIRR should support a communication channel between a dataset creator and the researchers who use that dataset. For example, if ethical problems — or even scientific problems — are uncovered in a dataset, it should be possible to notify users about it. As trivial as this sounds, it is not always the case today. Prominent datasets have been retracted over ethical concerns without a way to notify the people who had downloaded it.

2. NAIRR should standardize dataset citation practices, for example, by providing Digital Object Identifiers (DOIs) for datasets. We found that citation practices are chaotic, and there is currently no good way to find all the papers that use a dataset to check for misuse.

3. NAIRR could publish standardized dataset licenses. Dataset creators aren’t legal experts, and most of the licenses don’t accomplish what dataset creators want them to accomplish, enabling misuse.

4. NAIRR could require some analog of broader impact statements as part of an application for data or compute resources. Writing a broader impact statement could encourage ethical reflection by the authors. (A recent study found evidence that the NeurIPS broader impact requirement did result in authors reflecting on the societal consequences of their technical work.) Such reflection is valuable even if the statements are not actually used for decision making about who is approved. 

5. NAIRR could require some sort of ethical review of proposals. This goes beyond broader impact statements by making successful review a condition of acceptance. One promising model is the Ethics and Society Review instituted at Stanford. Most ethical issues that arise in AI research fall outside the scope of Institutional Review Boards (IRBs), so even a lightweight ethical review process could help prevent obvious-in-hindsight ethical lapses.

6. If researchers want to use a dataset to build and release a derivative dataset or pretrained model, then there should be an additional layer of scrutiny, because these involve essentially republishing the dataset. In our research, we found that this is the start of an ethical slippery slope, because data and models can be recombined in various ways and the intent of the original dataset can be lost.

7. There should be a way for people to report to NAIRR that some ethics violation is going on. The current model, for lack of anything better, is vigilante justice: journalists, advocates, or researchers sometimes identify ethical issues in datasets, and if the resulting outcry is loud enough, dataset creators feel compelled to retract or modify them. 

8. NAIRR could effectively partner with other entities that have emerged as ethical regulators. For example, conference program committees have started to incorporate ethics review. If NAIRR made it easy for peer reviewers to check the policies for any given data or compute resource, that would let them verify that a submitted paper is compliant with those policies.

There is no single predominant model for ethical review of AI research analogous to the IRB model for biomedical research. It is unlikely that one will emerge in the foreseeable future. Instead, a patchwork is taking shape. The NAIRR is set up to be a central player in AI research in the United States and, as such, bears responsibility for ensuring that the research that it supports is aligned with societal values.

——–

I’m grateful to the NAIRR task force for inviting me and to my fellow panelists and moderators for a stimulating discussion.  I’m also grateful to Sayash Kapoor and Mihir Kshirsagar, with whom I previously submitted a comment on this topic to the relevant federal agencies, and to Solon Barocas for helpful discussions.

A final note: the aims of the NAIRR have themselves been contested and are not self-evidently good. However, my comments (and the panel overall) assumed that the NAIRR will be implemented largely as currently conceived, and focused on harm mitigation.

Holding Purveyors of “Dark Patterns” for Online Travel Bookings Accountable

Last week, my former colleagues at the New York Attorney General’s Office (NYAG), scored a $2.6 million settlement with Fareportal – a large online travel agency that used deceptive practices, known as “dark patterns,” to manipulate consumers to book online travel.

The investigation exposes how Fareportal, which operates under several brands, including CheapOair and OneTravel — used a series of deceptive design tricks to pressure consumers to buy tickets for flights, hotels, and other travel purchases. In this post, I share the details of the investigation’s findings and use them to highlight why we need further regulatory intervention to prevent similar conduct from becoming entrenched in other online services.

The NYAG investigation picks up on the work of researchers at Princeton’s CITP that exposed the widespread use of dark patterns on shopping websites. Using the framework we developed in a subsequent paper for defining dark patterns, the investigation reveals how the travel agency weaponized common cognitive biases to take advantage of consumers. The company was charged under the Attorney General’s broad authority to prohibit deceptive acts and practices. In addition to paying $2.6 million, the New York City-based company agreed to reform its practices.

Specifically, the investigation documents how Fareportal exploited the scarcity bias by displaying, next to the top two flight search results, a false and misleading message about the number of tickets left for those flights at the advertised price. It manipulated consumers through adding 1 to the number of tickets the consumer had searched for to show that there were only X+1 tickets left at that price. So, if you searched for one round trip ticket from Philadelphia to Chicago, the site would say “Only 2 tickets left” at that price, while a consumer searching for two such tickets would see a message stating “Only 3 tickets left” at the advertised price. 

In 2019, Fareportal added a design feature that exploited the bandwagon effect by displaying how many other people were looking at the same deal. The site used a computer-generated random number between 28 and 45 to show the number of other people “looking” at the flight. It paired this with a false countdown timer that displayed an arbitrary number that was unrelated to the availability of tickets. 

Similarly, Fareportal exported its misleading tactics to the making of hotel bookings on its mobile apps. The apps misrepresented the percentage of rooms shown that were “reserved” by using a computer-generated number keyed to when the customer was trying to book a room. So, for example, if the check-in date was 16-30 days away, the message would indicate that between 41-70% of the hotel rooms were booked, but if it was less than 7 days away, it showed that 81-99% of the rooms were reserved. But, of course, those percentages were pure fiction. The apps used a similar tactic for displaying the number of people “viewing” hotels in the area. This time, they generated the number based on the nightly rate for the fifth hotel returned in the search by using the difference between the numerical value of the dollar figure and the numerical value of the cents figure. (If the rate was $255.63, consumers were told 192 people were viewing the hotel listings in the area.)

Fareportal used these false scarcity indicators across its websites and mobile platforms for pitching products such as travel protection and seat upgrades, through inaccurately representing how many other consumers that had purchased the product in question. 

In addition, the NYAG charged Fareportal with using a pressure tactic of making consumers accept or decline purchase a travel protection policy to “protect the cost of [their] trip” before completing a purchase. This practice is described in the academic literature as a covert pattern that uses “confirmshaming” and “forced action” to influence choices. 

Finally, the NYAG took issue with how Fareportal manipulated price comparisons to suggest it was offering tickets at a discounted price, when in fact, most of the advertised tickets were never offered for sale at the higher comparison price. The NYAG rejected Fareportal’s attempt to use a small pop-up to cure the false impression conveyed by the visual slash-through image that conveyed the discount. Similarly, the NYAG called out how Fareportal hid its service fees by disguising them as being part of the “Base Price” of the ticket rather than the separate line item for “Taxes and Fees.” These tactics are described in the academic literature as using “misdirection” and “information hiding” to influence consumers. 


The findings from this investigation illustrate why dark patterns are not simply aggressive marketing practices, as some commentators contend, but require regulatory intervention. Specifically, such shady practices are difficult for consumers to spot and to avoid, and, as we argued, risk becoming entrenched across different travel sites who have the incentive to adopt similar practices. As a result, Fareportal, unfortunately, will not be the first or the last online service to deploy such tactics. But this creates an opportunity for researchers, consumer advocates, and design whistleblowers to step forward and spotlight such practices to protect consumers and help create a more trustworthy internet.    

Can Classes on Field Experiments Scale? Lessons from SOC412

Last semester, I taught a Princeton undergrad/grad seminar on the craft, politics, and ethics of behavioral experimentation. The idea was simple: since large-scale human subjects research is now common outside universities, we need to equip students to make sense of that kind of power and think critically about it.

In this post, I share lessons for teaching a class like this and how I’m thinking about next year.

Path diagram from SOC412 lecture on the Social Media Color Experiment

[Read more…]