December 11, 2018

What Are Machine Learning Models Hiding?

Machine learning is eating the world. The abundance of training data has helped ML achieve amazing results for object recognition, natural language processing, predictive analytics, and all manner of other tasks. Much of this training data is very sensitive, including personal photos, search queries, location traces, and health-care records.

In a recent series of papers, we uncovered multiple privacy and integrity problems in today’s ML pipelines, especially (1) online services such as Amazon ML and Google Prediction API that create ML models on demand for non-expert users, and (2) federated learning, aka collaborative learning, that lets multiple users create a joint ML model while keeping their data private (imagine millions of smartphones jointly training a predictive keyboard on users’ typed messages).

Our Oakland 2017 paper, which has just received the PET Award for Outstanding Research in Privacy Enhancing Technologies, concretely shows how to perform membership inference, i.e., determine if a certain data record was used to train an ML model.  Membership inference has a long history in privacy research, especially in genetic privacy and generally whenever statistics about individuals are released.  It also has beneficial applications, such as detecting inappropriate uses of personal data.

We focus on classifiers, a popular type of ML models. Apps and online services use classifier models to recognize which objects appear in images, categorize consumers based on their purchase patterns, and other similar tasks.  We show that if a classifier is open to public access – via an online API or indirectly via an app or service that uses it internally – an adversary can query it and tell from its output if a certain record was used during training.  For example, if a classifier based on a patient study is used for predictive health care, membership inference can leak whether or not a certain patient participated in the study. If a (different) classifier categorizes mobile users based on their movement patterns, membership inference can leak which locations were visited by a certain user.

There are several technical reasons why ML models are vulnerable to membership inference, including “overfitting” and “memorization” of the training data, but they are a symptom of a bigger problem. Modern ML models, especially deep neural networks, are massive computation and storage systems with millions of high-precision floating-point parameters. They are typically evaluated solely by their test accuracy, i.e., how well they classify the data that they did not train on.  Yet they can achieve high test accuracy without using all of their capacity.  In addition to asking if a model has learned its task well, we should ask what else has the model learned? What does this “unintended learning” mean for the privacy and integrity of ML models?

Deep networks can learn features that are unrelated – even statistically uncorrelated! – to their assigned task.  For example, here are the features learned by a binary gender classifier trained on the “Labeled Faces in the Wild” dataset.

While the upper layer of this neural network has learned to separate inputs by gender (circles and triangles), the lower layers have also learned to recognize race (red and blue), a property uncorrelated with the task.

Our more recent work on property inference attacks shows that even simple binary classifiers trained for generic tasks – for example, determining if a review is positive or negative or if a face is male or female – internally discover fine-grained features that are much more sensitive. This is especially important in collaborative and federated learning, where the internal parameters of each participant’s model are revealed during training, along with periodic updates to these parameters based on the training data.

We show that a malicious participant in collaborative training can tell if a certain person appears in another participant’s photos, who has written the reviews used by other participants for training, which types of doctors are being reviewed, and other sensitive information. Notably, this leakage of “extra” information about the training data has no visible effect on the model’s test accuracy.

A clever adversary who has access to the ML training software can exploit the unused capacity of ML models for nefarious purposes. In our CCS 2017 paper, we show that a simple modification to the data pre-processing, without changing the training procedure at all, can cause the model to memorize its training data and leak it in response to queries. Consider a binary gender classifier trained in this way.  By submitting special inputs to this classifier and observing whether they are classified as male or female, the adversary can reconstruct the actual images on which the classifier was trained (the top row is the ground truth):

Federated learning, where models are crowd-sourced from hundreds or even millions of users, is an even juicier target. In a recent paper, we show that a single malicious participant in federated learning can completely replace the joint model with another one that has the same accuracy but also incorporates backdoor functionality. For example, it can intentionally misclassify images with certain features or suggest adversary-chosen words to complete certain sentences.

When training ML models, it is not enough to ask if the model has learned its task well.  Creators of ML models must ask what else their models have learned. Are they memorizing and leaking their training data? Are they discovering privacy-violating features that have nothing to do with their learning tasks? Are they hiding backdoor functionality? We need least-privilege ML models that learn only what they need for their task – and nothing more.

This post is based on joint research with Eugene Bagdasaryan, Luca Melis, Reza Shokri, Congzheng Song, Emiliano de Cristofaro, Deborah Estrin, Yiqing Hua, Thomas Ristenpart, Marco Stronati, and Andreas Veit.

Thanks to Arvind Narayanan for feedback on a draft of this post.

Getting serious about research ethics: AI and machine learning

[This blog post is a continuation of our series about research ethics in computer science.]

The widespread deployment of artificial intelligence and specifically machine learning algorithms causes concern for some fundamental values in society, such as employment, privacy, and discrimination. While these algorithms promise to optimize social and economic processes, research in this area has exposed some major deficiencies in the social consequences of their operation. Some consequences may be invisible or intangible, such as erecting computational barriers to social mobility through a variety of unintended biases, while others may be directly life threatening. At the CITP’s recent conference on computer science ethics, Joanna Bryson, Barbara Engelhardt, and Matt Salganik discussed how their research led them to work on machine learning ethics.

Joanna Bryson has made a career researching artificial intelligence, machine learning, and understanding their consequences on society. She has found that people tend to identify with the perceived consciousness of artificially intelligent artifacts, such as robots, which then complicates meaningful conversations about the ethics of their development and use. By equating artificially intelligent systems to humans or animals, people deduce its moral status and can ignore their engineered nature.

While the cognitive power of AI systems can be impressive, Bryson argues they do not equate to humans and should not be regulated as such. On the one hand, she demonstrates the power of an AI system to replicate societal biases in a recent paper (co-authored with CITP’s Aylin Caliskan and Arvind Narayanan) by letting systems trained on a corpus of text from the World Wide Web learn the implicit biases around the gender of certain professions. On the other hand, she argues that machines cannot ‘suffer’ in the same way as humans do, which is one of the main deterrents for humans in current legal systems. Bryson proposes we understand both AI and ethics as human-made artifacts. It is therefore appropriate to rely ethics – rather than science – to determine the moral status of artificially intelligent systems.

Barbara Engelhardt’s work focuses on machine learning in computational biology, specifically genomics and medicine. Her main area of concern is the reliance on recommendation systems, such as we encounter on Amazon and Netflix, to make decisions in other domains such as healthcare, financial planning, and career decisions. These machine learning systems rely on data as well as social networks to make inferences.

Engelhardt describes examples where using patient records to inform medical decisions can lead to erroneous recommendation systems for diagnosis as well as harmful medical interventions. For example, the symptoms of heart disease differ substantially between men and women, and so do their appropriate treatments. Most data collected about this condition was from men, leaving a blind spot for the diagnosis of heart disease in women. Bias, in this case, is useful and should be maintained for correct medical interventions. In another example, however, data was collected from a variety of hospitals in somewhat segregated poor and wealthy areas. The data appear to show that cancers in children from hispanic and caucasian races develop differently. However, inferences based on this data fail to take into account the biasing effect of economic status in determining at which stage of symptoms different families decide seek medical help. In turn, this determines the stage of development at which the oncological data is collected. The recommendation system with this type of bias confuses race with economic barriers to medical help, which will lead to harmful diagnosis and treatments.

Matt Salganik proposes that the machine learning community draws some lessons from ethics procedures in social science. Machine learning is a powerful tool the can be used responsibly or inappropriately. He proposes that it can be the task of ethics to guide researchers, engineers, and developers to think carefully about the consequences of their artificially intelligent inventions. To this end, Salganik proposes a hope-based and principle-based approach to research ethics in machine learning. This is opposed to a fear-based and rule-based approach in social science, or the more ad hoc ethics culture that we encounter in data and computer science. For example, machine learning ethics should include pre-research review through forms that are reviewed by third parties to avoid groupthink and encourage researchers’ reflexivity. Given the fast pace of development, though, the field should avoid a compliance mentality typically found at institutional review boards of univeristies. Any rules to be complied with are unlikely to stand the test of time in the fast-moving world of machine learning, which would result in burdensome and uninformed ethics scrutiny. Salganik develops these themes in his new book Bit By Bit: Social Research in the Digital Age, which has an entire chapter about ethics.”

See a video of the panel here.

Language necessarily contains human biases, and so will machines trained on language corpora

I have a new draft paper with Aylin Caliskan-Islam and Joanna Bryson titled Semantics derived automatically from language corpora necessarily contain human biases. We show empirically that natural language necessarily contains human biases, and the paradigm of training machine learning on language corpora means that AI will inevitably imbibe these biases as well.

Specifically, we look at “word embeddings”, a state-of-the-art language representation used in machine learning. Each word is mapped to a point in a 300-dimensional vector space so that semantically similar words map to nearby points.

We show that a wide variety of results from psychology on human bias can be replicated using nothing but these word embeddings. We primarily look at the Implicit Association Test (IAT), a widely used and accepted test of implicit bias. The IAT asks subjects to pair concepts together (e.g., white/black-sounding names with pleasant or unpleasant words) and measures reaction times as an indicator of bias. In place of reaction times, we use the semantic closeness between pairs of words.

In short, we were able to replicate every single result that we tested, with high effect sizes and low p-values.

These include innocuous, universal associations (flowers are associated with pleasantness and insects with unpleasantness), racial prejudice (European-American names are associated with pleasantness and African-American names with unpleasantness), and a variety of gender stereotypes (for example, career words are associated with male names and family words with female names).

But we go further. We show that information about the real world is recoverable from word embeddings to a striking degree. The figure below shows that for 50 occupation words (doctor, engineer, …), we can accurately predict the percentage of U.S. workers in that occupation who are women using nothing but the semantic closeness of the occupation word to feminine words!

These results simultaneously show that the biases in question are embedded in human language, and that word embeddings are picking up the biases.

Our finding of pervasive, human-like bias in AI may be surprising, but we consider it inevitable. We mean “bias” in a morally neutral sense. Some biases are prejudices, which society deems unacceptable. Others are facts about the real world (such as gender gaps in occupations), even if they reflect historical injustices that we wish to mitigate. Yet others are perfectly innocuous.

Algorithms don’t have a good way of telling these apart. If AI learns language sufficiently well, it will also learn cultural associations that are offensive, objectionable, or harmful. At a high level, bias is meaning. “Debiasing” these machine models, while intriguing and technically interesting, necessarily harms meaning.

Instead, we suggest that mitigating prejudice should be a separate component of an AI system. Rather than altering AI’s representation of language, we should alter how or whether it acts on that knowledge, just as humans are able to learn not to act on our implicit biases. This requires a long-term research program that includes ethicists and domain experts, rather than formulating ethics as just another technical constraint in a learning system.

Finally, our results have implications for human prejudice. Given how deeply bias is embedded in language, to what extent does the influence of language explain prejudiced behavior? And could transmission of language explain transmission of prejudices? These explanations are simplistic, but that is precisely our point: in the future, we should treat these as “null hypotheses’’ to be eliminated before we turn to more complex accounts of bias in humans.