May 22, 2017

Language necessarily contains human biases, and so will machines trained on language corpora

I have a new draft paper with Aylin Caliskan-Islam and Joanna Bryson titled Semantics derived automatically from language corpora necessarily contain human biases. We show empirically that natural language necessarily contains human biases, and the paradigm of training machine learning on language corpora means that AI will inevitably imbibe these biases as well.

Specifically, we look at “word embeddings”, a state-of-the-art language representation used in machine learning. Each word is mapped to a point in a 300-dimensional vector space so that semantically similar words map to nearby points.

We show that a wide variety of results from psychology on human bias can be replicated using nothing but these word embeddings. We primarily look at the Implicit Association Test (IAT), a widely used and accepted test of implicit bias. The IAT asks subjects to pair concepts together (e.g., white/black-sounding names with pleasant or unpleasant words) and measures reaction times as an indicator of bias. In place of reaction times, we use the semantic closeness between pairs of words.

In short, we were able to replicate every single result that we tested, with high effect sizes and low p-values.

These include innocuous, universal associations (flowers are associated with pleasantness and insects with unpleasantness), racial prejudice (European-American names are associated with pleasantness and African-American names with unpleasantness), and a variety of gender stereotypes (for example, career words are associated with male names and family words with female names).

But we go further. We show that information about the real world is recoverable from word embeddings to a striking degree. The figure below shows that for 50 occupation words (doctor, engineer, …), we can accurately predict the percentage of U.S. workers in that occupation who are women using nothing but the semantic closeness of the occupation word to feminine words!

These results simultaneously show that the biases in question are embedded in human language, and that word embeddings are picking up the biases.

Our finding of pervasive, human-like bias in AI may be surprising, but we consider it inevitable. We mean “bias” in a morally neutral sense. Some biases are prejudices, which society deems unacceptable. Others are facts about the real world (such as gender gaps in occupations), even if they reflect historical injustices that we wish to mitigate. Yet others are perfectly innocuous.

Algorithms don’t have a good way of telling these apart. If AI learns language sufficiently well, it will also learn cultural associations that are offensive, objectionable, or harmful. At a high level, bias is meaning. “Debiasing” these machine models, while intriguing and technically interesting, necessarily harms meaning.

Instead, we suggest that mitigating prejudice should be a separate component of an AI system. Rather than altering AI’s representation of language, we should alter how or whether it acts on that knowledge, just as humans are able to learn not to act on our implicit biases. This requires a long-term research program that includes ethicists and domain experts, rather than formulating ethics as just another technical constraint in a learning system.

Finally, our results have implications for human prejudice. Given how deeply bias is embedded in language, to what extent does the influence of language explain prejudiced behavior? And could transmission of language explain transmission of prejudices? These explanations are simplistic, but that is precisely our point: in the future, we should treat these as “null hypotheses’’ to be eliminated before we turn to more complex accounts of bias in humans.


  1. Hal Daume III says:

    This is very cool — I really like the idea of running IATs on learned models!

    There were a few things that I think I’m not understanding correctly about the framing/interpretation, though. (The results make total sense to me.) These are:

    1. I don’t really understand what “meaning” means. I was expecting to see a definition in Section 2 (section 1 says “we begin by explaining meaning”) but I couldn’t find something that looked like a def. It seems that under the definition you’re using that “male” is part of the meaning of “doctor” and “female” is part of the meaning of “nurse”? I’m a bit hard pressed to see gender aspects as part of the definitions of those words. There’s certainly a correlation (which is the cool figure you show in this blog post), but this is (to me) not part of the word’s meaning. I’m not saying it’s not possible to have a notion of “meaning” under which these *are* part of the meaning, just that it’s not the one I usually think about, and it would be helpful to have it spelled out.

    2. Related, there’s a comment toward the end of the article and in the blog post that “bias is meaning.” I cannot quite figure out what this means, especially given (1). The best I can get is some sort of circular definition of “meaning” in (1) that tautologically means that meaning is bias, but then it’s hard to understand the “We show that ‘bias is meaning.'” if it’s just part of the definition.

    3. I don’t understand this statement: “We demonstrate here for the first time what some have long suspected (Quine, 1960)—that semantics, the meaning of words, necessarily reflects regularities latent in our culture.” Again, I guess this goes back to the “I don’t know what ‘meaning’ means” question from (1). I think I’m partially thrown by the Quine reference because this seems relatively different from what I understand Word & Object to be talking about (which admittedly I haven’t ever read in full, so it’s very possible I’m just flat out wrong here).

    (There are also a few things that read weirdly to me, but aren’t as germane to the main point: a. I don’t know what universality of computation has to do with ML in this context; b. the assertion that “[word embedding algorithms] are exposed to language much like any human would be” seems dubious… the web is very different from, say, speech. There are a few other things but they’re less interesting from a scientific perspective.)

  2. conjugateprior says:

    On 3, I think the two elements that prompt the Quine ref are the idea, from ‘Word and Object’, that “different persons growing up in the same language are like different bushes trimmed and trained to take the shape of identical elephants”, and from ‘Two Dogmas of Empiricism’, that we cannot actually draw the line between syntactic regularities that are ‘just’ the structure of the language (‘analytic’) and semantic regularities that are about things in the world, including the attitudes people take to them (‘synthetic’).

  3. Joanna Bryson says:

    Maybe it’s going too far to say that all bias is meaning. But the idea here is that the meaning of a term is exactly the expectations you have for its use. That’s it. This is kind of radical if you are used to embodied ideas of meaning, where a term is more or less tethered to objects in the real world it relates to. In a way, that is true of some terms that do reference categories of objects or actions, but the tether is still expectation.

  4. Ugly Duckling says:

    One can’t have induction without bias

  5. Harald Korneliussen says:

    This is a great paper! Finally someone who goes deeper instead of just pointing out correlations and act as if their meaning was obvious.

    It has long bothered me that e.g. a text discussing (or ranting about!) the association between men and programming, would probably just strengthen that association if it was included in a corpora used to train a word embedding.

  6. Alice Thwaite says:

    Great article! Thanks for doing the research and sharing.

  7. pranav kumar says:

    Maybe ai needs a supplementry dose of “i identify with x and i find this video offensive” youtube comments to correct for the bias.

  8. Paul Harland says:

    Great stuff – hope you can continue to explore the idea of language awareness and ethical action. Regarding terms being tethered to objects, I feel language/communication fragments (including non-word stuff like tone and gesture) are embodied in the sense that they refer to the situation in which they’re used and previous instances where the listener has heard the fragment used in a way that appears related. There might be clues in the situation or in the exchange of utterances that frame a particular set of meanings. But then there’s only so much of the stream of info you can keep in memory so there are trade-offs that may become develop a grammar. Also, you’re trying to follow the intent, of course. Computer poetry can sometimes look convincing in the associations between words but lack development of any ideas. This is just me speculating on something I find interesting, but presumably the present approach won’t capture all biases or may pick up on some ironic uses, for instance. So it would be great if you could keep looking at how bias gets into our communications.
    Nice blog post – only flicked through the paper so there might well be more there!

  9. Tom Welsh says:

    “Specifically, we look at “word embeddings”, a state-of-the-art language representation used in machine learning. Each word is mapped to a point in a 300-dimensional vector space so that semantically similar words map to nearby points”.

    I can’t begin to understand what this means in practical terms. However I am suspicious that it describes a process that is entirely circular. The key question is “how are words ‘mapped’ ‘so that semantically similar words map to nearby points’? If you have created this vector space yourselves, then any prejudice that is found in it comes from your own minds.

    Further, I forcefully reject the proposition that bias is somehow implicit in language. If I say something like, “Hydrogen is the commonest element in the universe, according to the scientific consensus as of 2016” – or even “Force is equal to mass times acceleration”, where is the bias?

  10. > Tom Welsh: how are words ‘mapped’ ‘so that semantically similar words map to nearby points’? If you have created this vector space yourselves, then any prejudice that is found in it comes from your own minds.

    The vectors are constructed by an algorithm ( that tries to ensure that words that occur in similar contexts (have similar collocations in a large Web corpus) map to nearby points. This algorithm doesn’t really have a notion of semantic similarity baked into it; the fact that semantically similar words tend to occur in similar contexts is an empirical one (“You shall know a word by the company it keeps” –J.R. Firth).

    There isn’t much room for the researchers to inject their own preconceptions into the vectors, which sometimes leads people to claim that the algorithm is “unbiased”. Instead, the bias comes from the corpus: people on the Internet use language in biased ways, and the paper in this post shows that the resulting vectors faithfully reflect those biases.

    • Tom Welsh says:

      Many thanks for the clarification! So it seems that the origin of your data lies in “a large Web corpus”. That doesn’t really tell me much, but I can plausibly infer that the mapping algorithm takes as input verbal material written by a large number of human authors. So the general idea must be that if words often occur near to one another (on the page, as it were) they are somehow associated in the mind of the writer. In a way, I suppose it is trying to read the minds of the writers.

      It’s an interesting idea, and I must learn more about it.

    • Joanna Bryson says:

      Will – Exactly. Tom – I encourage you to read the Wikipedia page on Latent Semantic Analysis, our full paper, and/or Quine, depending on your level of interest.

      I actually have another academic paper about these concepts (but without the data) which may make this part clearer to you. I wrote it initially as a PhD student so it is a little sloppier and funnier, but again hopefully it will communicate more about this idea of semantics. As Will said, there’s really no question about whether it works – besides the data presented in the model (which soonish we hope we can release the code, ultimately for sure all the replication materials will be public and all the software) you couldn’t do web searches if this stuff didn’t work.

      • Tom Welsh says:

        Thanks, Joanna. I may have misinterpreted TFA, being unfamiliar with the field but interested in language and meaning from a more philosophical background.

  11. Has there been a robust analysis of whether Implicit Association Test (IAT) is a comprehensive enough tool for measuring implicit bias?

    It follows the Osgood semantic differential approach of two polar binaries that limit user inputs of their biases.

    What would happen if instead of Pleasant / Unpleasant, the options were Pleasant / Strange / Unpleasant?

    How would that affect the Pearson correlations for word-pair embeddings?


  12. I’m glad that you’re treating your explanations as “null” hypotheses. This is a starting point for further research, not an end result in its own.