October 19, 2021

Bridging Tech-Military AI Divides in an Era of Tech Ethics: Sharif Calfee at CITP

In a time when U.S. tech employees are organizing against corporate-military collaborations on AI, how can the ethics and incentives of military, corporate, and academic research be more closely aligned on AI and lethal autonomous weapons?

Speaking today at CITP was Captain Sharif Calfee, a U.S. Naval Officer who serves as a surface warfare officer. He is a graduate of the U.S. Naval Academy and U.S. Naval Postgraduate School and a current MPP student at the Woodrow Wilson School.

Afloat, Sharif most recently served as the commanding officer, USS McCAMPBELL (DDG 85), an Aegis guided missile destroyer. Ashore, Sharif was most recently selected for the Federal Executive Fellowship program and served as the U.S. Navy fellow to the Center for Strategic & Budgetary Assessments (CSBA), a non-partisan, national security policy analysis think-tank in Washington, D.C..

Sharif spoke to CITP today with some of his own views (not speaking for the U.S. government) about how research and defense can more closely collaborate on AI.

Over the last two years, Sharif has been working on ways for the Navy to accelerate AI and adopt commercial systems to get more unmanned systems into the fleet. Toward this goal, he recently interviewed 160 people at 50 organizations. His talk today is based on that research.

Sharif next tells us about a rift between the U.S. government and companies/academia in AI. This rift is a symptom, he tells us, of a growing “civil-military divide” in the US. In previous generations, big tech companies have worked closely with the U.S. military, and a majority of elected representatives in Congress had prior military experience. That’s no longer true. As there’s a bifurcation in the experiences of Americans who serve in the military versus those who have. This lack of familiarity, he says, complicates moments when companies and academics discuss the potential of working with and for the U.S. military.

Next, Sharif says that conversations about tech ethics in the technology industry are creating a conflict that making it difficult for the U.S. military to work with them. He tells us about Project Maven, a project that Google and the Department of Defense worked on together to analyze drone footage using AI. Their purpose was to reduce the number of casualties to civilians who are not considered battlefield combatants. This project, which wasn’t secret, burst into public awareness after a New York Times article and a letter from over three thousand employees. Google declined to renew the DOD contract and update their motto.

U.S. Predator Drone (via Wikimedia Commons)

On the heels of their project Maven decision, Google also faced criticism for working with the Chinese government to provide services in China in ways that enabled certain kinds of censorship. Suddenly, Google found themselves answering questions about why they were collaborating with China on AI and not with the U.S. military.

How do we resolve this impasse in collaboration?

  • The defense acquisition process is hard for small, nimble companies to engage in
  • Defense contracts are too slow, too expensive, too bureaucratic, and not profitable
  • Companies aren’t not necessarily interested in the same type of R&D products as the DOD wants
  • National security partnerships with gov’t might affect opportunities in other international markets.
  • The Cold War is “ancient history” for the current generation
  • Global, international corporations don’t want to take sides on conflicts
  • Companies and employees seek to create good. Government R&D may conflict with that ethos

Academics also have reasons not to work for the government:

  • Worried about how their R&D will be utilized
  • Schools of faculty may philoisophically disagree with the government
  • Universities are incubators of international talent, and government R&D could be divisive, not inclusive
  • Government R&D is sometimes kept secret, which hurts academic careers

Faced with this, according to Sharif, the U.S. government is sometimes baffled by people’s ideological concerns. Many in the government remember the Cold War and knew people who lived and fought in World War Two. They can sometimes be resentful about a cold shoulder from academics and companies, especially since the military funded the foundational work in computer science and AI.

Sharif tells us that R&D reached an inflection point in the 1990s. During the Cold War, new technologies were developed through defense funding (the internet, GPS, nuclear technology) and then they reached industry. Now the reverse happens. Now technologies like AI are being developed by the commercial sector and reaching government. That flow is not very nimble. DOD acquisition systems are designed for projects that take 91 months to complete (like a new airplane), while companies adopt AI technologies in 6-9 months (see this report by the Congressional Research Service).

Conversations about policy and law also constrain the U.S. government from developing and adopting lethal autonomous weapons systems, says Sharif. Even as we have important questions about the ethical risks of AI, Sharif tells us that other governments don’t have the same restrictions. He asks us to imagine what would have happened if nuclear weapons weren’t developed first by the U.S..

How can divides between the U.S. government and companies/academia be bridged? Sharif suggests:

  • The U.S. government must substantially increase R&D funding to help regain influence
  • Establish a prestigious DOD/Government R&D one-year fellowship program with top notch STEM grads prior to joining the commercial sector
  • Expand on the Defense Innovation Unit
  • Elevate the Defense Innovation Board in prominence and expand the project to create conversations that bridge between ideological divides. Organize conversations at high levels and middle management levels to accelerate this familiarization.
  • Increase DARPA and other collaborations with commercial and academic sectors
  • Establish joint DOD and Commercial Sector exchange programs
  • Expand the number of DOD research fellows and scientists present on university campuses in fellowship programs
  • Continue to reform DOD acquisition processes to streamline for sectors like AI

Sharif has also recommended to the U.S. Navy that they create an Autonomy Project Office to enable the Navy to better leverage R&D. The U.S. Navy has used structures like this for previous technology transformations on nuclear propulsion, the Polaris submarine missiles, naval aviation, and the Aegis combat system.

At the end of the day, says Sharif, what happens in a conflict where the U.S. does not have the technological overmatch and is overmatched by someone else? What are the real life consequences? That’s what’s at stake in collaborations between researchers, companies, and the U.S. department of defense.

What Are Machine Learning Models Hiding?

Machine learning is eating the world. The abundance of training data has helped ML achieve amazing results for object recognition, natural language processing, predictive analytics, and all manner of other tasks. Much of this training data is very sensitive, including personal photos, search queries, location traces, and health-care records.

In a recent series of papers, we uncovered multiple privacy and integrity problems in today’s ML pipelines, especially (1) online services such as Amazon ML and Google Prediction API that create ML models on demand for non-expert users, and (2) federated learning, aka collaborative learning, that lets multiple users create a joint ML model while keeping their data private (imagine millions of smartphones jointly training a predictive keyboard on users’ typed messages).

Our Oakland 2017 paper, which has just received the PET Award for Outstanding Research in Privacy Enhancing Technologies, concretely shows how to perform membership inference, i.e., determine if a certain data record was used to train an ML model.  Membership inference has a long history in privacy research, especially in genetic privacy and generally whenever statistics about individuals are released.  It also has beneficial applications, such as detecting inappropriate uses of personal data.

We focus on classifiers, a popular type of ML models. Apps and online services use classifier models to recognize which objects appear in images, categorize consumers based on their purchase patterns, and other similar tasks.  We show that if a classifier is open to public access – via an online API or indirectly via an app or service that uses it internally – an adversary can query it and tell from its output if a certain record was used during training.  For example, if a classifier based on a patient study is used for predictive health care, membership inference can leak whether or not a certain patient participated in the study. If a (different) classifier categorizes mobile users based on their movement patterns, membership inference can leak which locations were visited by a certain user.

There are several technical reasons why ML models are vulnerable to membership inference, including “overfitting” and “memorization” of the training data, but they are a symptom of a bigger problem. Modern ML models, especially deep neural networks, are massive computation and storage systems with millions of high-precision floating-point parameters. They are typically evaluated solely by their test accuracy, i.e., how well they classify the data that they did not train on.  Yet they can achieve high test accuracy without using all of their capacity.  In addition to asking if a model has learned its task well, we should ask what else has the model learned? What does this “unintended learning” mean for the privacy and integrity of ML models?

Deep networks can learn features that are unrelated – even statistically uncorrelated! – to their assigned task.  For example, here are the features learned by a binary gender classifier trained on the “Labeled Faces in the Wild” dataset.

While the upper layer of this neural network has learned to separate inputs by gender (circles and triangles), the lower layers have also learned to recognize race (red and blue), a property uncorrelated with the task.

Our more recent work on property inference attacks shows that even simple binary classifiers trained for generic tasks – for example, determining if a review is positive or negative or if a face is male or female – internally discover fine-grained features that are much more sensitive. This is especially important in collaborative and federated learning, where the internal parameters of each participant’s model are revealed during training, along with periodic updates to these parameters based on the training data.

We show that a malicious participant in collaborative training can tell if a certain person appears in another participant’s photos, who has written the reviews used by other participants for training, which types of doctors are being reviewed, and other sensitive information. Notably, this leakage of “extra” information about the training data has no visible effect on the model’s test accuracy.

A clever adversary who has access to the ML training software can exploit the unused capacity of ML models for nefarious purposes. In our CCS 2017 paper, we show that a simple modification to the data pre-processing, without changing the training procedure at all, can cause the model to memorize its training data and leak it in response to queries. Consider a binary gender classifier trained in this way.  By submitting special inputs to this classifier and observing whether they are classified as male or female, the adversary can reconstruct the actual images on which the classifier was trained (the top row is the ground truth):

Federated learning, where models are crowd-sourced from hundreds or even millions of users, is an even juicier target. In a recent paper, we show that a single malicious participant in federated learning can completely replace the joint model with another one that has the same accuracy but also incorporates backdoor functionality. For example, it can intentionally misclassify images with certain features or suggest adversary-chosen words to complete certain sentences.

When training ML models, it is not enough to ask if the model has learned its task well.  Creators of ML models must ask what else their models have learned. Are they memorizing and leaking their training data? Are they discovering privacy-violating features that have nothing to do with their learning tasks? Are they hiding backdoor functionality? We need least-privilege ML models that learn only what they need for their task – and nothing more.

This post is based on joint research with Eugene Bagdasaryan, Luca Melis, Reza Shokri, Congzheng Song, Emiliano de Cristofaro, Deborah Estrin, Yiqing Hua, Thomas Ristenpart, Marco Stronati, and Andreas Veit.

Thanks to Arvind Narayanan for feedback on a draft of this post.

Princeton Dialogues of AI and Ethics: Launching case studies

Summary: We are releasing four case studies on AI and ethics, as part of the Princeton Dialogues on AI and Ethics.

The impacts of rapid developments in artificial intelligence (“AI”) on society—both real and not yet realized—raise deep and pressing questions about our philosophical ideals and institutional arrangements. AI is currently applied in a wide range of fields—such as medical diagnosis, criminal sentencing, online content moderation, and public resource management—but it is only just beginning to realize its potential to influence practically all areas of human life, including geopolitical power balances. As these technologies advance and increasingly come to mediate our everyday lives, it becomes necessary to consider how they may reflect prevailing philosophical perspectives and preferences. We must also assess how the architectural design of AI technologies today might influence human values in the future. This step is essential in order to identify the positive opportunities presented by AI and unleash these technologies’ capabilities in the most socially advantageous way possible while being mindful of potential harms. Critics question the extent to which individual engineers and proprietors of AI should take responsibility for the direction of these developments, or whether centralized policies are needed to steer growth and incentives in the right direction. What even is the right direction? How can it be best achieved?

Princeton’s University Center for Human Values (UCHV) and the Center for Information Technology Policy (CITP) are excited to announce a joint research project, “The Princeton Dialogues on AI and Ethics,” in the emerging field of artificial intelligence (broadly defined) and its interaction with ethics and political theory. The aim of this project is to develop a set of intellectual reasoning tools to guide practitioners and policy makers, both current and future, in developing the ethical frameworks that will ultimately underpin their technical and legislative decisions. More than ever before, individual-level engineering choices are poised to impact the course of our societies and human values. And yet there have been limited opportunities for AI technology actors, academics, and policy makers to come together to discuss these outcomes and their broader social implications in a systematic fashion. This project aims to provide such opportunities for interdisciplinary discussion, as well as in-depth reflection.

We convened two invitation-only workshops in October 2017 and March 2018, in which philosophers, political theorists, and machine learning experts met to assess several real-world case studies that elucidate common ethical dilemmas in the field of AI. The aim of these workshops was to facilitate a collaborative learning experience which enabled participants to dive deeply into the ethical considerations that ought to guide decision-making at the engineering level and highlight the social shifts they may be affecting. The first outcomes of these deliberations have now been published in the form of case studies. To access these educational materials, please see our dedicated website https://aiethics.princeton.edu. These cases are intended for use across university departments and in corporate training in order to equip the next generation of engineers, managers, lawyers, and policy makers with a common set of reasoning tools for working on AI governance and development.

In March 2018, we also hosted a public conference, titled “AI & Ethics,” where interested academics, policy makers, civil society advocates, and private sector representatives from diverse fields came to Princeton to discuss topics related to the development and governance of AI: “International Dimensions of AI” and “AI and Its Democratic Frontiers”. This conference sought to use the ethics and engineering knowledge foundations developed through the initial case studies to inspire discussion on AI technology’s wider social effects.

This project is part of a wider effort at Princeton University to investigate the intersection between AI technology, politics, and philosophy. There is a particular emphasis on the ways in which the interconnected forces of technology and its governance simultaneously influence and are influenced by the broader social structures in which they are situated. The Princeton Dialogues on AI and Ethics makes use of the university’s exceptional strengths in computer science, public policy, and philosophy. The project also seeks opportunities for cooperation with existing projects in and outside of academia.