July 25, 2017

Language necessarily contains human biases, and so will machines trained on language corpora

I have a new draft paper with Aylin Caliskan-Islam and Joanna Bryson titled Semantics derived automatically from language corpora necessarily contain human biases. We show empirically that natural language necessarily contains human biases, and the paradigm of training machine learning on language corpora means that AI will inevitably imbibe these biases as well.

Specifically, we look at “word embeddings”, a state-of-the-art language representation used in machine learning. Each word is mapped to a point in a 300-dimensional vector space so that semantically similar words map to nearby points.

We show that a wide variety of results from psychology on human bias can be replicated using nothing but these word embeddings. We primarily look at the Implicit Association Test (IAT), a widely used and accepted test of implicit bias. The IAT asks subjects to pair concepts together (e.g., white/black-sounding names with pleasant or unpleasant words) and measures reaction times as an indicator of bias. In place of reaction times, we use the semantic closeness between pairs of words.

In short, we were able to replicate every single result that we tested, with high effect sizes and low p-values.

These include innocuous, universal associations (flowers are associated with pleasantness and insects with unpleasantness), racial prejudice (European-American names are associated with pleasantness and African-American names with unpleasantness), and a variety of gender stereotypes (for example, career words are associated with male names and family words with female names).

But we go further. We show that information about the real world is recoverable from word embeddings to a striking degree. The figure below shows that for 50 occupation words (doctor, engineer, …), we can accurately predict the percentage of U.S. workers in that occupation who are women using nothing but the semantic closeness of the occupation word to feminine words!

These results simultaneously show that the biases in question are embedded in human language, and that word embeddings are picking up the biases.

Our finding of pervasive, human-like bias in AI may be surprising, but we consider it inevitable. We mean “bias” in a morally neutral sense. Some biases are prejudices, which society deems unacceptable. Others are facts about the real world (such as gender gaps in occupations), even if they reflect historical injustices that we wish to mitigate. Yet others are perfectly innocuous.

Algorithms don’t have a good way of telling these apart. If AI learns language sufficiently well, it will also learn cultural associations that are offensive, objectionable, or harmful. At a high level, bias is meaning. “Debiasing” these machine models, while intriguing and technically interesting, necessarily harms meaning.

Instead, we suggest that mitigating prejudice should be a separate component of an AI system. Rather than altering AI’s representation of language, we should alter how or whether it acts on that knowledge, just as humans are able to learn not to act on our implicit biases. This requires a long-term research program that includes ethicists and domain experts, rather than formulating ethics as just another technical constraint in a learning system.

Finally, our results have implications for human prejudice. Given how deeply bias is embedded in language, to what extent does the influence of language explain prejudiced behavior? And could transmission of language explain transmission of prejudices? These explanations are simplistic, but that is precisely our point: in the future, we should treat these as “null hypotheses’’ to be eliminated before we turn to more complex accounts of bias in humans.

Robots don't threaten, but may be useful threats

Hi, I’m Joanna Bryson, and I’m just starting as a fellow at CITP, on sabbatical from the University of Bath.  I’ve been blogging about natural and artificial intelligence since 2007, increasingly with attention to public policy.  I’ve been writing about AI ethics since 1998.  This is my first blog post for Freedom to Tinker.

Will robots take our jobs?  Will they kill us in war?  The answer to these questions depends not (just) on technological advances – for example in the area of my own expertise, AI – but in how we as a society determine to view what it means to be a moral agent.  This may sound esoteric, and indeed the term moral agent comes from philosophy.  An agent is something that changes its environment (so chemical agents cause reactions).  A moral agent is something society holds responsible for the changes it effects.

Should society hold robots responsible for taking jobs or killing people?  My argument is “no”.  The fact that humans have full authorship over robots‘ capacities, including their goals and motivations, means that transferring responsibility to them would require abandoning, ignoring or just obscuring the obligations of humans and human institutions that create the robots.  Using language like “killer robots” can confuse the tax-paying public already easily lead by science fiction and runaway agency detection to believing that robots are sentient competitors.  This belief ironically serves to protect the people and organisations that are actually the moral actors.

So robots don’t kill or replace people; people use robots to kill or replace each other.  Does that mean there’s no problem with robots?  Of course not. Asking whether robots (or any other tools) should be subject to policy and regulation is a very sensible question.

In my first paper about robot ethics (you probably want to read the 2011 update for IJCAI, Just an Artifact: Why Machines are Perceived as Moral Agents), Phil Kime and I argued that as we gain greater experience of robots, we will stop reasoning about them so naïvely, and stop ascribing moral agency (and patiency [PDF, draft]) to them.  Whether or not we were right is an empirical question I think would be worth exploring – I’m increasingly doubting whether we were.  Emotional engagement with something that seems humanoid may be inevitable.  This is why one of the five Principles of Robotics (a UK policy document I coauthored, sponsored by the British engineering and humanities research councils) says “Robots are manufactured artefacts. They should not be designed in a deceptive way to exploit vulnerable users; instead their machine nature should be transparent.” Or in ordinary language, “Robots are artifacts; they should not be designed to exploit vulnerable users by evoking an emotional response or dependency. It should always be possible to tell a robot from a human.”

Nevertheless, I hope that by continuing to educate the public, we can at least help people make sensible conscious decisions about allocating their resources (such as time or attention) between real humans versus machines.  This is why I object to language like “killer robots.”  And this is part of the reason why my research group works on increasing the transparency of artificial intelligence.

However, maybe the emotional response we have to the apparently human-like threat of robots will also serve some useful purposes.  I did sign the “killer robot” letter, because although I dislike the headlines associated with it, the actual letter (titled “Autonomous Weapons: an Open Letter from AI & Robotics Researchers“) makes clear the nature of the threat of taking humans out of the loop on real-time kill decisions.   Similarly, I am currently interested in understanding the extent to which information technology, including AI, is responsible for the levelling off of wages since 1978.  I am still reading and learning about this; I think it’s quite possible that the problem is not information technology per se, but rather culture, politics and policy more generally.  However, 1978 was a long time ago.  If more pictures of the Terminator get more people attending to questions of income inequality and the future of labour, maybe that’s not a bad thing.