January 28, 2021

What Are Machine Learning Models Hiding?

Machine learning is eating the world. The abundance of training data has helped ML achieve amazing results for object recognition, natural language processing, predictive analytics, and all manner of other tasks. Much of this training data is very sensitive, including personal photos, search queries, location traces, and health-care records.

In a recent series of papers, we uncovered multiple privacy and integrity problems in today’s ML pipelines, especially (1) online services such as Amazon ML and Google Prediction API that create ML models on demand for non-expert users, and (2) federated learning, aka collaborative learning, that lets multiple users create a joint ML model while keeping their data private (imagine millions of smartphones jointly training a predictive keyboard on users’ typed messages).

Our Oakland 2017 paper, which has just received the PET Award for Outstanding Research in Privacy Enhancing Technologies, concretely shows how to perform membership inference, i.e., determine if a certain data record was used to train an ML model.  Membership inference has a long history in privacy research, especially in genetic privacy and generally whenever statistics about individuals are released.  It also has beneficial applications, such as detecting inappropriate uses of personal data.

We focus on classifiers, a popular type of ML models. Apps and online services use classifier models to recognize which objects appear in images, categorize consumers based on their purchase patterns, and other similar tasks.  We show that if a classifier is open to public access – via an online API or indirectly via an app or service that uses it internally – an adversary can query it and tell from its output if a certain record was used during training.  For example, if a classifier based on a patient study is used for predictive health care, membership inference can leak whether or not a certain patient participated in the study. If a (different) classifier categorizes mobile users based on their movement patterns, membership inference can leak which locations were visited by a certain user.

There are several technical reasons why ML models are vulnerable to membership inference, including “overfitting” and “memorization” of the training data, but they are a symptom of a bigger problem. Modern ML models, especially deep neural networks, are massive computation and storage systems with millions of high-precision floating-point parameters. They are typically evaluated solely by their test accuracy, i.e., how well they classify the data that they did not train on.  Yet they can achieve high test accuracy without using all of their capacity.  In addition to asking if a model has learned its task well, we should ask what else has the model learned? What does this “unintended learning” mean for the privacy and integrity of ML models?

Deep networks can learn features that are unrelated – even statistically uncorrelated! – to their assigned task.  For example, here are the features learned by a binary gender classifier trained on the “Labeled Faces in the Wild” dataset.

While the upper layer of this neural network has learned to separate inputs by gender (circles and triangles), the lower layers have also learned to recognize race (red and blue), a property uncorrelated with the task.

Our more recent work on property inference attacks shows that even simple binary classifiers trained for generic tasks – for example, determining if a review is positive or negative or if a face is male or female – internally discover fine-grained features that are much more sensitive. This is especially important in collaborative and federated learning, where the internal parameters of each participant’s model are revealed during training, along with periodic updates to these parameters based on the training data.

We show that a malicious participant in collaborative training can tell if a certain person appears in another participant’s photos, who has written the reviews used by other participants for training, which types of doctors are being reviewed, and other sensitive information. Notably, this leakage of “extra” information about the training data has no visible effect on the model’s test accuracy.

A clever adversary who has access to the ML training software can exploit the unused capacity of ML models for nefarious purposes. In our CCS 2017 paper, we show that a simple modification to the data pre-processing, without changing the training procedure at all, can cause the model to memorize its training data and leak it in response to queries. Consider a binary gender classifier trained in this way.  By submitting special inputs to this classifier and observing whether they are classified as male or female, the adversary can reconstruct the actual images on which the classifier was trained (the top row is the ground truth):

Federated learning, where models are crowd-sourced from hundreds or even millions of users, is an even juicier target. In a recent paper, we show that a single malicious participant in federated learning can completely replace the joint model with another one that has the same accuracy but also incorporates backdoor functionality. For example, it can intentionally misclassify images with certain features or suggest adversary-chosen words to complete certain sentences.

When training ML models, it is not enough to ask if the model has learned its task well.  Creators of ML models must ask what else their models have learned. Are they memorizing and leaking their training data? Are they discovering privacy-violating features that have nothing to do with their learning tasks? Are they hiding backdoor functionality? We need least-privilege ML models that learn only what they need for their task – and nothing more.

This post is based on joint research with Eugene Bagdasaryan, Luca Melis, Reza Shokri, Congzheng Song, Emiliano de Cristofaro, Deborah Estrin, Yiqing Hua, Thomas Ristenpart, Marco Stronati, and Andreas Veit.

Thanks to Arvind Narayanan for feedback on a draft of this post.

Princeton Dialogues of AI and Ethics: Launching case studies

Summary: We are releasing four case studies on AI and ethics, as part of the Princeton Dialogues on AI and Ethics.

The impacts of rapid developments in artificial intelligence (“AI”) on society—both real and not yet realized—raise deep and pressing questions about our philosophical ideals and institutional arrangements. AI is currently applied in a wide range of fields—such as medical diagnosis, criminal sentencing, online content moderation, and public resource management—but it is only just beginning to realize its potential to influence practically all areas of human life, including geopolitical power balances. As these technologies advance and increasingly come to mediate our everyday lives, it becomes necessary to consider how they may reflect prevailing philosophical perspectives and preferences. We must also assess how the architectural design of AI technologies today might influence human values in the future. This step is essential in order to identify the positive opportunities presented by AI and unleash these technologies’ capabilities in the most socially advantageous way possible while being mindful of potential harms. Critics question the extent to which individual engineers and proprietors of AI should take responsibility for the direction of these developments, or whether centralized policies are needed to steer growth and incentives in the right direction. What even is the right direction? How can it be best achieved?

Princeton’s University Center for Human Values (UCHV) and the Center for Information Technology Policy (CITP) are excited to announce a joint research project, “The Princeton Dialogues on AI and Ethics,” in the emerging field of artificial intelligence (broadly defined) and its interaction with ethics and political theory. The aim of this project is to develop a set of intellectual reasoning tools to guide practitioners and policy makers, both current and future, in developing the ethical frameworks that will ultimately underpin their technical and legislative decisions. More than ever before, individual-level engineering choices are poised to impact the course of our societies and human values. And yet there have been limited opportunities for AI technology actors, academics, and policy makers to come together to discuss these outcomes and their broader social implications in a systematic fashion. This project aims to provide such opportunities for interdisciplinary discussion, as well as in-depth reflection.

We convened two invitation-only workshops in October 2017 and March 2018, in which philosophers, political theorists, and machine learning experts met to assess several real-world case studies that elucidate common ethical dilemmas in the field of AI. The aim of these workshops was to facilitate a collaborative learning experience which enabled participants to dive deeply into the ethical considerations that ought to guide decision-making at the engineering level and highlight the social shifts they may be affecting. The first outcomes of these deliberations have now been published in the form of case studies. To access these educational materials, please see our dedicated website https://aiethics.princeton.edu. These cases are intended for use across university departments and in corporate training in order to equip the next generation of engineers, managers, lawyers, and policy makers with a common set of reasoning tools for working on AI governance and development.

In March 2018, we also hosted a public conference, titled “AI & Ethics,” where interested academics, policy makers, civil society advocates, and private sector representatives from diverse fields came to Princeton to discuss topics related to the development and governance of AI: “International Dimensions of AI” and “AI and Its Democratic Frontiers”. This conference sought to use the ethics and engineering knowledge foundations developed through the initial case studies to inspire discussion on AI technology’s wider social effects.

This project is part of a wider effort at Princeton University to investigate the intersection between AI technology, politics, and philosophy. There is a particular emphasis on the ways in which the interconnected forces of technology and its governance simultaneously influence and are influenced by the broader social structures in which they are situated. The Princeton Dialogues on AI and Ethics makes use of the university’s exceptional strengths in computer science, public policy, and philosophy. The project also seeks opportunities for cooperation with existing projects in and outside of academia.

Artificial Intelligence and the Future of Online Content Moderation

Yesterday in Berlin, I attended a workshop on the use of artificial intelligence in governing communication online, hosted by the Humboldt Institute for Internet and Society.

Context

In the United States and Europe, many platforms that host user content, such as Facebook, YouTube, and Twitter, have enjoyed safe harbor protections for the content they host, under laws such as Section 230 of the Communications Decency Act (CDA), the Digital Millenium Copyright Act (DMCA), and in Europe, Articles 12–15 of the eCommerce Directive. Some of these laws, such as the DMCA, provide immunity to platforms for copyright damages if the platforms remove content based on knowledge that it is unlawful. Section 230 of the CDA provides broad immunity to platforms, with the express goals of promoting economic development and free expression. Daphne Keller has a good summary of the legal landscape on intermediary liability.

Platforms are now facing increasing pressure to detect and remove illegal (and, in some cases, legal-but-objectionable) content. In the United States, for example, bills in the House and Senate would remove safe harbor protection for platforms that do not remove illegal content related to sex trafficking. The European Union has also considering laws that would limit the immunity of platforms who do not remove illegal content, which in the EU includes four categories: child sex abuse, incitement to terrorism, certain types of hate speech, and intellectual property or copyright infringement.

The mounting pressure on platforms to moderate online content coincides with increasing attention to algorithms that can automate the process of content moderation (“AI”) for the detection and ultimate removal of illegal (or unwanted) content.

The focus of yesterday’s workshop was to explore questions surrounding the role of AI in moderating content online, and the possible implications of AI for the moderation of online content and how online content moderation is governed.

Setting the Tone: Challenges for Automated Filtering

Malavika Jayaram from Digital Asia Hub and I delivered the two opening “impulse statements” for the day. Malavika talked about some of the inherent limitations of AI for automated detection (with a reference to the infamous “Not Hot Dog” app) and pointed out some of the tools that platforms are being pressured to use automated content moderation tools.

I spoke about our long line of research on applying machine learning to detect a wide range of unwanted traffic, ranging from spam to botnets to bulletproof scam hosting sites. I then talked about how the dialog has in some ways used the technical community’s past success in spam filtering to suggest that automated filtering of other types of content should be as easy as flipping a switch. Since spam detection was something we knew how to do, then surely the platforms could also ferret out everything from copyright violations to hate speech, right?

In practice Evan Engstrom and I previously wrote about the difficulty of applying automated filtering algorithms to copyrighted content.  In short, even with a database that matches fingerprints of audio and video content against fingerprints of known copyrighted content, the methods are imperfect. When framing the problem in terms of incitement to violence or hate speech, automated detection becomes even more challenging, due to “corner cases” such as parody, fair use, irony, and so forth. A recent article from James Grimmelmann summarizes some of these challenges.

What I Learned

Over the course of the day, I learned many things about automated filtering that I hadn’t previously thought about.

  • Regulators and platforms are under tremendous pressure to act, based on the assumption that the technical problems are easy.  Regulators and platforms alike are facing increasing pressure to act, as I previously mentioned. Part of the pressure comes from a perception that detection of unwanted content is a solved problem. This myth is sometimes perpetuated by the designers of the original content fingerprinting technologies, some of which are now in widespread use. But, there’s a big difference between testing fingerprints of content against a database of known offending content and building detection algorithms that can classify the semantics of content that has never been seen before. An area where technologists can contribute to this dialog is in studying and demonstrating the capabilities and limitations of automated filtering, both in terms of scale and accuracy. Technologists might study existing automated filtering techniques or design new ones entirely.
  • Takedown requests are a frequent instrument for censorship. I learned about the prevalence of “snitching”, whereby one user may request that a platform take down objectionable content by flagging the content or otherwise complaining about it—in such instances, oppressed groups (e.g., Rohingya Muslims) can be disproportionately targeted by large campaigns of takedown requests. (It was not known whether such campaigns to flag content have been automated on a large scale, but my intuition is that they likely are.) In such cases, the platforms err on the side of removing content, and the process for “remedy” (i.e., restoring the content) can be slow and tedious. This process creates a lever for censorship and suppression of speech.The trends are troubling: according to a recent article, a year ago, Facebook removed 50 percent of content that Israel requested be removed; now that figure is 95 percent.  Jillian York runs a site where users can report these types of takedowns, but these reports and statistics are all self-reported. A useful project might be to automate the measurement of takedowns for some portion of the ecosystem or group of users.
  • The larger platforms share content hashes of unwanted content, but the database and process are opaque. About nine months ago, Twitter, Facebook, YouTube, and Microsoft formed the Global Internet Forum to Counter Terrorism. Essentially, the project relies on something called the Shared Industry Hash Database. It’s very challenging to find anything about this database online aside from a few blog posts from the companies, although it does seem in some way associated with Tech Against Terrorism.The secretive nature of the shared hash database and the process itself has a couple of implications. First, the database is difficult to audit—if content is wrongly placed in the database, removing it would appear next to impossible. Second, only the member companies can check content against the database, essentially preventing smaller companies (e.g., startups) from benefitting from the information. Such limits in knowledge could ultimately prove to be a significant disadvantage if the platforms are ultimately held liable for the content that they are hosting. As I discovered throughout the day, the opaque nature of commercial content moderation proves to be a recurring theme, which I’ll return to later.
  • Different countries have very different definitions of unlawful content. The patchwork of laws governing speech on the Internet makes regulation complicated, as different countries have different laws and restrictions on speech. For example, “incitement to violence” or “hate speech” might mean a different thing in Germany (where Nazi propaganda is illegal) than it does in Spain (where it is illegal to insult the king) or France (which recently vowed to ferret out racist content on social media). When applying this observation to automated detection of illegal content, things become complicated. It becomes impossible to train a single classifier that can be applied generally; essentially, each jurisdiction needs its own classifier.
  • Norms and speech evolve over time, often rapidly. Several attendees observed that most of the automated filtering techniques today boil down to flagging content based on keywords. Such a model can be incredibly difficult to maintain, particularly when it comes to detecting certain types of content such as hate speech. For one, norms and language evolve; a word that was innocuous or unremarkable today could take on an entirely new meaning tomorrow. Complicating matters further, sometimes people try to regain control in an online discussion by co-opting a slur; therefore, a model that bases classification on the presence of certain keywords can produce unexpected false positives, especially in the absence of context.

Takeaways

Aside from the information I learned above, I also took away a few themes about the state of online content moderation:

  • There will likely always be a human in the loop. We must figure out what role the human should play. Detection algorithms are only as good as their input data. If the data is biased, if norms and language evolve, or if data is mislabeled (an even more likely occurrence, since a label like “hate speech” could differ by country), then the outputs will be incorrect. Additionally, algorithms can only detect proxies for semantics and meaning (e.g., an ISIS flag, a large swath of bare skin) but have much more difficulty assessing context, fair use, parody, and other nuance. In short, on a technical front, we have our work cut out for us. It was widely held that humans will always need to be in the loop, and that AI should merely be an assistive technology, for triage, scale, and improving human effectiveness and efficiency when making decisions about moderation. Figuring out the appropriate division of labor between machines and humans is a challenging technical, social, and legal problem.
  • Governance and auditing is currently challenging because decision-making is secretive. The online platforms currently control all aspects of content moderation and governance. They have the data; nobody else has it. They know the classification algorithms they use and the features they use as input to those algorithms; nobody else knows them. They also are the only ones who have insight into the ultimate decision-making process. This situation is different from other unwanted traffic detection problems that the computer science research community has worked on, where it was relatively easy to get a trace of email spam or denial of service traffic, either by generating it or by working with an ISP. In this situation, everything is under lock and key.This lack of public access to data and information makes it difficult to audit the process that platforms are currently using, and it also raises important questions about governance:
    • Should the platforms be the ultimate arbiter in takedown and moderation?
    • Is that an acceptable situation, even if we don’t know the rules that they are using to make those decisions?
    • Who trains the algorithms, and with what data?
    • Who gets access to the models and algorithms? How does disclosure work?
    • How does a user learn that his or her content was taken down, as well as why it was taken down?
    • What are the steps to remedy an incorrect, unlawful, or unjust takedown request?
    • How can we trust the platforms to make the right decisions when in some cases it is in their financial interests to suppress speech? History has suggested that trusting the platforms to do the right thing in these situations can lead to restrictions on speech.

Many of the above questions are regulatory. Yet, technologists can play a role for some aspects of these questions. For example, measurement tools might detect and evaluate removal and takedowns of content for some well-scoped forum or topic. A useful starting point for the design of such a measurement system could be a platform such as Politwoops, which monitors tweets that politicians have deleted.

Summary

The workshop was enlightening. I came as a technologist wanting to learn more about how computer science might be applied to the social and legal problems concerning content moderation; I came away with a few ideas, fueled by exciting discussion. The attendees were an healthy mix of computer scientists, regulators, practitioners, legal scholars, and human rights activists. I’ve worked on censorship of Internet protocols for many years, but in some sense measuring censorship can feel a little bit like looking for one’s key under the lamppost—my sense is that the real power over speech is now held by the platforms, and as a community we need new mechanisms—technical, legal, economic, and social—to hold them to account.