October 2, 2022

How the National AI Research Resource can steward the datasets it hosts

Last week I participated on a panel about the National AI Research Resource (NAIRR), a proposed computing and data resource for academic AI researchers. The NAIRR’s goal is to subsidize the spiraling costs of many types of AI research that have put them out of reach of most academic groups.

My comments on the panel were based on a recent study by researchers Kenny Peng, Arunesh Mathur, and me (NeurIPS ‘21) on the potential harms of AI. We looked at almost 1,000 research papers to analyze how they used datasets, which are the engine of AI.

Let me briefly mention just two of the many things we found, and then I’ll present some ideas for NAIRR based on our findings. First, we found that “derived datasets” are extremely common. For example, there’s a popular facial recognition dataset called Labeled Faces in the Wild, and there are at least 20 new datasets that incorporate the original data and extend it in some way. One of them adds race and gender annotations. This means that a dataset may enable new harms over time. For example, once you have race annotations, you can use it to build a model that tracks the movement of ethnic minorities through surveillance cameras, which some governments seem to be doing.

We also found that dataset creators are aware of their potential for misuse, so they often have licenses restricting their use for research and not for commercial purposes. Unfortunately, we found evidence that many companies simply get around this by downloading a model pre-trained on that dataset (in a research context) and using that model in commercial products.

Stepping back, the main takeaway from our paper is that dataset creators can sometimes — but not always — anticipate the ways in which a dataset might be used or misused in harmful ways. So we advocate for what we call dataset stewarding, which is a governance process that lasts throughout the lifecycle of a dataset. Note that some prominent datasets see active use for decades.

I think NAIRR is ideally positioned to be the steward of the datasets that it hosts, and perform a vital governance role over datasets and, in turn, over AI research. Here are a few specific things NAIRR could do, starting with the most lightweight ones.

1. NAIRR should support a communication channel between a dataset creator and the researchers who use that dataset. For example, if ethical problems — or even scientific problems — are uncovered in a dataset, it should be possible to notify users about it. As trivial as this sounds, it is not always the case today. Prominent datasets have been retracted over ethical concerns without a way to notify the people who had downloaded it.

2. NAIRR should standardize dataset citation practices, for example, by providing Digital Object Identifiers (DOIs) for datasets. We found that citation practices are chaotic, and there is currently no good way to find all the papers that use a dataset to check for misuse.

3. NAIRR could publish standardized dataset licenses. Dataset creators aren’t legal experts, and most of the licenses don’t accomplish what dataset creators want them to accomplish, enabling misuse.

4. NAIRR could require some analog of broader impact statements as part of an application for data or compute resources. Writing a broader impact statement could encourage ethical reflection by the authors. (A recent study found evidence that the NeurIPS broader impact requirement did result in authors reflecting on the societal consequences of their technical work.) Such reflection is valuable even if the statements are not actually used for decision making about who is approved. 

5. NAIRR could require some sort of ethical review of proposals. This goes beyond broader impact statements by making successful review a condition of acceptance. One promising model is the Ethics and Society Review instituted at Stanford. Most ethical issues that arise in AI research fall outside the scope of Institutional Review Boards (IRBs), so even a lightweight ethical review process could help prevent obvious-in-hindsight ethical lapses.

6. If researchers want to use a dataset to build and release a derivative dataset or pretrained model, then there should be an additional layer of scrutiny, because these involve essentially republishing the dataset. In our research, we found that this is the start of an ethical slippery slope, because data and models can be recombined in various ways and the intent of the original dataset can be lost.

7. There should be a way for people to report to NAIRR that some ethics violation is going on. The current model, for lack of anything better, is vigilante justice: journalists, advocates, or researchers sometimes identify ethical issues in datasets, and if the resulting outcry is loud enough, dataset creators feel compelled to retract or modify them. 

8. NAIRR could effectively partner with other entities that have emerged as ethical regulators. For example, conference program committees have started to incorporate ethics review. If NAIRR made it easy for peer reviewers to check the policies for any given data or compute resource, that would let them verify that a submitted paper is compliant with those policies.

There is no single predominant model for ethical review of AI research analogous to the IRB model for biomedical research. It is unlikely that one will emerge in the foreseeable future. Instead, a patchwork is taking shape. The NAIRR is set up to be a central player in AI research in the United States and, as such, bears responsibility for ensuring that the research that it supports is aligned with societal values.

——–

I’m grateful to the NAIRR task force for inviting me and to my fellow panelists and moderators for a stimulating discussion.  I’m also grateful to Sayash Kapoor and Mihir Kshirsagar, with whom I previously submitted a comment on this topic to the relevant federal agencies, and to Solon Barocas for helpful discussions.

A final note: the aims of the NAIRR have themselves been contested and are not self-evidently good. However, my comments (and the panel overall) assumed that the NAIRR will be implemented largely as currently conceived, and focused on harm mitigation.

CITP Case Study on Regulating Facial Recognition Technology in Canada

Canada, like many jurisdictions in the United States, is grappling with the growing usage of facial recognition technology in the private and public sectors. This technology is being deployed at a rapid pace in airports, retail stores, social media platforms, and by law enforcement – with little oversight from the government. 

To help address this challenge, I organized a tech policy case study on the regulation of facial recognition technology with Canadian members of parliament – The Honorable Greg Fergus and Matthew Green. Both sit on the House of Commons’ Standing Committee on Access to Information, Privacy, and Ethics (ETHI) Committee and I served as a legislative aide to them through the Parliamentary Internship Programme before joining CITP. Our goal for the session was to put policymakers in conversation with subject matter experts. 

The core problem is that there is lack of accountability in the use of facial recognition technology that excarbates historical forms of discrimination and puts marginalized communities at risk for a wide range of harms. For instance, a recent story describes the fate of three black men who were wrongfully arrested because of being misidentified by facial recognition software. As the Canadian Civil Liberties Association argues, the police’s use of facial recognition technology, notably provided by the New York-based company, Clearview AI, “points to a larger crisis in police accountability when acquiring and using emerging surveillance tools.

A number of academics and researchers – such as DAIR Instititute’s Timnit Gebru and the Algorithmic Justice League’s Joy  Buolamwini, who documented the missclassification of darker-skinned women in a recent paper – are bringing attention to the discriminatory algorithms associated with facial recognition that have put racialized people, women, and members of the LGBTIQ community, at greater risk of false identification.  

Meanwhile, Canadian officials are beginning to tackle the real world consequences of the use of facial recognition. A year ago, the Office of the Privacy Commissioner found that Clearview AI, had scraped billions of images of people from from the internet in what “represented mass surveillance and was a clear violation of the privacy rights of Canadians.” 

Following that investigation, Clearview AI stopped providing services to the Canadian market, including the Royal Canadian Mounted Police. In light of these findings and the absence of dedicated legislation, the ETHI Committee began studying the uses of facial recognition technology in May 2021, and has recently resumed this work by focusing on the use by various levels of government in Canada, law enforcement agencies, and private corporations. 

The CITP case study session on March 24, began with a presentation by Angelina Wang, a graduate affiliate of CITP, who provided a technical overview where she explained the different functions and harms associated with this technology. Following Wang’s presentation, I provided a regulatory overview of how U.S. lawmakers have addressed facial recognition by noting the different legislative strategies deployed for law enforcement, private, and public sector uses. We then had a substantive, free-flowing discussion with CITP researchers and the policymakers about the challenges and opportunities for different regulatory strategies. 

Following CITP’s case study session, Wang and Dr. Elizabeth Anne Watkins, a CITP Fellow, were invited to testify before the ETHI committee in an April 4 hearing. Wang discussed the different tasks facial recognition technology can and cannot perform, how the models are created, why they are susceptible to adversarial attacks, and the ethical implications behind the creation of this technology. Dr. Watkins’ testimony provided an overview of the privacy, security, and safety concerns related to the private industry’s use of facial verification on workers as informed by her research.  The committee is expected to report its findings by the end of May 2022. 

We continue to do research on how Canada might regulate facial recognition technology and will publish those analyses in the coming months.

Calling for Investing in Equitable AI Research in Nation’s Strategic Plan

By Solon Barocas, Sayash Kapoor, Mihir Kshirsagar, and Arvind Narayanan

In response to the Request for Information to the Update of the National Artificial Intelligence Research and Development Strategic Plan (“Strategic Plan”) we submitted comments  providing suggestions for how the Strategic Plan for government funding priorities should focus resources to address societal issues such as equity, especially in communities that have been traditionally underserved. 

The Strategic Plan highlights the importance of investing in research about developing trust in AI systems, which includes requirements for robustness, fairness, explainability, and security. We argue that the Strategic Plan should go further by explicitly including a commitment to making investments in research that examines how AI systems can affect the equitable distribution of resources. Specifically, there is a risk that without such a commitment, we make investments in AI research that can marginalize communities that are disadvantaged. Or, even in cases where there is no direct harm to a community, the research support focuses on classes of problems that benefit the already advantaged communities, rather than problems facing disadvantaged communities.  

We make five recommendations for the Strategic Plan:  

First, we recommend that the Strategic Plan outline a mechanism for a broader impact review when funding AI research. The challenge is that the existing mechanisms for ethics review of research projects – Institutional Review Boards (“IRB”) –  do not adequately identify downstream harms stemming from AI applications. For example, on privacy issues, an IRB ethics review would focus on the data collection and management process. This is also reflected in the Strategic Plan’s focus on two notions of privacy: (i) ensuring the privacy of data collected for creating models via strict access controls, and (ii) ensuring the privacy of the data and information used to create models via differential privacy when the models are shared publicly. 

But both of these approaches are focused on the privacy of the people whose data has been collected to facilitate the research process, not the people to whom research findings might be applied. 

Take, for example, the potential impact of face recognition for detecting ethnic minorities. Even if the researchers who developed such techniques had obtained approval from the IRB for their research plan, secured the informed consent of participants, applied strict access control to the data, and ensured that the model was differentially private, the resulting model could still be used without restriction for surveillance of entire populations, especially as institutional mechanisms for ethics review such as IRBs do not consider downstream harms during their appraisal of research projects. 

We recommend that the Strategic Plan include as a research priority supporting the development of alternative institutional mechanisms to detect and mitigate the potentially negative downstream effects of AI systems. 

Second, we recommend that the Strategic Plan include provisions for funding research that would help us understand the impact of AI systems on communities, and how AI systems are used in practice. Such research can also provide a framework for informing decisions on which research questions and AI applications are too harmful to pursue and fund. 

We recognize that it may be challenging to determine what kind of impact AI research might have as it affects a broad range of potential applications. In fact, many AI research findings will have dual use: some applications of these findings may promise exciting benefits, while others would seem likely to cause harm. While it is worthwhile to weigh these costs and benefits, decisions about where to invest resources should also depend on distributional considerations: who are the people likely to suffer these costs and who are those who will enjoy the benefits? 

While there have been recent efforts to incorporate ethics review into the publishing processes of the AI research community, adding similar considerations to the Strategic Plan would help to highlight these concerns much earlier in the research process. Evaluating research proposals according to these broader impacts would help to ensure that ethical and societal considerations are incorporated from the beginning of a research project, instead of remaining an afterthought.

Third, our comments highlight the reproducibility crisis in fields adopting machine learning methods and the need for the government to support the creation of computational reproducibility infrastructure and a reproducibility clearinghouse that sets up benchmark datasets for measuring progress in scientific research that uses AI and ML. We suggest that the Strategic Plan borrow from the NIH’s practices to make government funding conditional on disclosing research materials, such as the code and data, that would be necessary to replicate a study.

Fourth, we focus attention on the industry phenomenon of using a veneer of AI to lend credibility to pseudoscience as AI snake oil. We see evaluating validity as a core component of ethical and responsible AI research and development. The strategic plan could support such efforts by prioritizing funding for setting standards for and making tools available to independent researchers to validate claims of effectiveness of AI applications. 


Fifth, we document the need to address the phenomenon of “runaway datasets” — the practice of broadly releasing datasets used for AI applications without mechanisms of oversight or accountability for how that information can be used. Such datasets raise serious privacy concerns and they may be used to support research that is counter to the intent of the people who have contributed to them. The Strategic Plan can play a pivotal role in mitigating these harms by establishing and supporting appropriate data stewardship models, which could include supporting the development of centralized data clearinghouses to regulate access to datasets. 

Bridging Tech-Military AI Divides in an Era of Tech Ethics: Sharif Calfee at CITP

In a time when U.S. tech employees are organizing against corporate-military collaborations on AI, how can the ethics and incentives of military, corporate, and academic research be more closely aligned on AI and lethal autonomous weapons?

Speaking today at CITP was Captain Sharif Calfee, a U.S. Naval Officer who serves as a surface warfare officer. He is a graduate of the U.S. Naval Academy and U.S. Naval Postgraduate School and a current MPP student at the Woodrow Wilson School.

Afloat, Sharif most recently served as the commanding officer, USS McCAMPBELL (DDG 85), an Aegis guided missile destroyer. Ashore, Sharif was most recently selected for the Federal Executive Fellowship program and served as the U.S. Navy fellow to the Center for Strategic & Budgetary Assessments (CSBA), a non-partisan, national security policy analysis think-tank in Washington, D.C..

Sharif spoke to CITP today with some of his own views (not speaking for the U.S. government) about how research and defense can more closely collaborate on AI.

Over the last two years, Sharif has been working on ways for the Navy to accelerate AI and adopt commercial systems to get more unmanned systems into the fleet. Toward this goal, he recently interviewed 160 people at 50 organizations. His talk today is based on that research.

Sharif next tells us about a rift between the U.S. government and companies/academia in AI. This rift is a symptom, he tells us, of a growing “civil-military divide” in the US. In previous generations, big tech companies have worked closely with the U.S. military, and a majority of elected representatives in Congress had prior military experience. That’s no longer true. As there’s a bifurcation in the experiences of Americans who serve in the military versus those who have. This lack of familiarity, he says, complicates moments when companies and academics discuss the potential of working with and for the U.S. military.

Next, Sharif says that conversations about tech ethics in the technology industry are creating a conflict that making it difficult for the U.S. military to work with them. He tells us about Project Maven, a project that Google and the Department of Defense worked on together to analyze drone footage using AI. Their purpose was to reduce the number of casualties to civilians who are not considered battlefield combatants. This project, which wasn’t secret, burst into public awareness after a New York Times article and a letter from over three thousand employees. Google declined to renew the DOD contract and update their motto.

U.S. Predator Drone (via Wikimedia Commons)

On the heels of their project Maven decision, Google also faced criticism for working with the Chinese government to provide services in China in ways that enabled certain kinds of censorship. Suddenly, Google found themselves answering questions about why they were collaborating with China on AI and not with the U.S. military.

How do we resolve this impasse in collaboration?

  • The defense acquisition process is hard for small, nimble companies to engage in
  • Defense contracts are too slow, too expensive, too bureaucratic, and not profitable
  • Companies aren’t not necessarily interested in the same type of R&D products as the DOD wants
  • National security partnerships with gov’t might affect opportunities in other international markets.
  • The Cold War is “ancient history” for the current generation
  • Global, international corporations don’t want to take sides on conflicts
  • Companies and employees seek to create good. Government R&D may conflict with that ethos

Academics also have reasons not to work for the government:

  • Worried about how their R&D will be utilized
  • Schools of faculty may philoisophically disagree with the government
  • Universities are incubators of international talent, and government R&D could be divisive, not inclusive
  • Government R&D is sometimes kept secret, which hurts academic careers

Faced with this, according to Sharif, the U.S. government is sometimes baffled by people’s ideological concerns. Many in the government remember the Cold War and knew people who lived and fought in World War Two. They can sometimes be resentful about a cold shoulder from academics and companies, especially since the military funded the foundational work in computer science and AI.

Sharif tells us that R&D reached an inflection point in the 1990s. During the Cold War, new technologies were developed through defense funding (the internet, GPS, nuclear technology) and then they reached industry. Now the reverse happens. Now technologies like AI are being developed by the commercial sector and reaching government. That flow is not very nimble. DOD acquisition systems are designed for projects that take 91 months to complete (like a new airplane), while companies adopt AI technologies in 6-9 months (see this report by the Congressional Research Service).

Conversations about policy and law also constrain the U.S. government from developing and adopting lethal autonomous weapons systems, says Sharif. Even as we have important questions about the ethical risks of AI, Sharif tells us that other governments don’t have the same restrictions. He asks us to imagine what would have happened if nuclear weapons weren’t developed first by the U.S..

How can divides between the U.S. government and companies/academia be bridged? Sharif suggests:

  • The U.S. government must substantially increase R&D funding to help regain influence
  • Establish a prestigious DOD/Government R&D one-year fellowship program with top notch STEM grads prior to joining the commercial sector
  • Expand on the Defense Innovation Unit
  • Elevate the Defense Innovation Board in prominence and expand the project to create conversations that bridge between ideological divides. Organize conversations at high levels and middle management levels to accelerate this familiarization.
  • Increase DARPA and other collaborations with commercial and academic sectors
  • Establish joint DOD and Commercial Sector exchange programs
  • Expand the number of DOD research fellows and scientists present on university campuses in fellowship programs
  • Continue to reform DOD acquisition processes to streamline for sectors like AI

Sharif has also recommended to the U.S. Navy that they create an Autonomy Project Office to enable the Navy to better leverage R&D. The U.S. Navy has used structures like this for previous technology transformations on nuclear propulsion, the Polaris submarine missiles, naval aviation, and the Aegis combat system.

At the end of the day, says Sharif, what happens in a conflict where the U.S. does not have the technological overmatch and is overmatched by someone else? What are the real life consequences? That’s what’s at stake in collaborations between researchers, companies, and the U.S. department of defense.

Princeton Dialogues of AI and Ethics: Launching case studies

Summary: We are releasing four case studies on AI and ethics, as part of the Princeton Dialogues on AI and Ethics.

The impacts of rapid developments in artificial intelligence (“AI”) on society—both real and not yet realized—raise deep and pressing questions about our philosophical ideals and institutional arrangements. AI is currently applied in a wide range of fields—such as medical diagnosis, criminal sentencing, online content moderation, and public resource management—but it is only just beginning to realize its potential to influence practically all areas of human life, including geopolitical power balances. As these technologies advance and increasingly come to mediate our everyday lives, it becomes necessary to consider how they may reflect prevailing philosophical perspectives and preferences. We must also assess how the architectural design of AI technologies today might influence human values in the future. This step is essential in order to identify the positive opportunities presented by AI and unleash these technologies’ capabilities in the most socially advantageous way possible while being mindful of potential harms. Critics question the extent to which individual engineers and proprietors of AI should take responsibility for the direction of these developments, or whether centralized policies are needed to steer growth and incentives in the right direction. What even is the right direction? How can it be best achieved?

Princeton’s University Center for Human Values (UCHV) and the Center for Information Technology Policy (CITP) are excited to announce a joint research project, “The Princeton Dialogues on AI and Ethics,” in the emerging field of artificial intelligence (broadly defined) and its interaction with ethics and political theory. The aim of this project is to develop a set of intellectual reasoning tools to guide practitioners and policy makers, both current and future, in developing the ethical frameworks that will ultimately underpin their technical and legislative decisions. More than ever before, individual-level engineering choices are poised to impact the course of our societies and human values. And yet there have been limited opportunities for AI technology actors, academics, and policy makers to come together to discuss these outcomes and their broader social implications in a systematic fashion. This project aims to provide such opportunities for interdisciplinary discussion, as well as in-depth reflection.

We convened two invitation-only workshops in October 2017 and March 2018, in which philosophers, political theorists, and machine learning experts met to assess several real-world case studies that elucidate common ethical dilemmas in the field of AI. The aim of these workshops was to facilitate a collaborative learning experience which enabled participants to dive deeply into the ethical considerations that ought to guide decision-making at the engineering level and highlight the social shifts they may be affecting. The first outcomes of these deliberations have now been published in the form of case studies. To access these educational materials, please see our dedicated website https://aiethics.princeton.edu. These cases are intended for use across university departments and in corporate training in order to equip the next generation of engineers, managers, lawyers, and policy makers with a common set of reasoning tools for working on AI governance and development.

In March 2018, we also hosted a public conference, titled “AI & Ethics,” where interested academics, policy makers, civil society advocates, and private sector representatives from diverse fields came to Princeton to discuss topics related to the development and governance of AI: “International Dimensions of AI” and “AI and Its Democratic Frontiers”. This conference sought to use the ethics and engineering knowledge foundations developed through the initial case studies to inspire discussion on AI technology’s wider social effects.

This project is part of a wider effort at Princeton University to investigate the intersection between AI technology, politics, and philosophy. There is a particular emphasis on the ways in which the interconnected forces of technology and its governance simultaneously influence and are influenced by the broader social structures in which they are situated. The Princeton Dialogues on AI and Ethics makes use of the university’s exceptional strengths in computer science, public policy, and philosophy. The project also seeks opportunities for cooperation with existing projects in and outside of academia.