January 11, 2025

What does it mean to ask for an “explainable” algorithm?

One of the standard critiques of using algorithms for decision-making about people, and especially for consequential decisions about access to housing, credit, education, and so on, is that the algorithms don’t provide an “explanation” for their results or the results aren’t “interpretable.”  This is a serious issue, but discussions of it are often frustrating. The reason, I think, is that different people mean different things when they ask for an explanation of an algorithm’s results.

Before unpacking the different flavors of explainability, let’s stop for a moment to consider that the alternative to algorithmic decisionmaking is human decisionmaking. And let’s consider the drawbacks of relying on the human brain, a mechanism that is notoriously complex, difficult to understand, and prone to bias. Surely an algorithm is more knowable than a brain. After all, with an algorithm it is possible to examine exactly which inputs factored in to a decision, and every detailed step of how these inputs were used to get to a final result. Brains are inherently less transparent, and no less biased. So why might we still complain that the algorithm does not provide an explanation?

We should also dispense with cases where the algorithm is just inaccurate–where a well-informed analyst can understand the algorithm but will see it as producing answers that are wrong. That is a problem, but it is not a problem of explainability.

So what are people asking for when they say they want an explanation? I can think of at least four types of explainability problems.

The first type of explainability problem is a claim of confidentiality. Somebody knows relevant information about how a decision was made, but they choose to withhold it because they claim it is a trade secret, or that disclosing it would undermine security somehow, or that they simply prefer not to reveal it. This is not a problem with the algorithm, it’s an institutional/legal problem.

The second type of explainability problem is complexity. Here everything about the algorithm is known, but somebody feels that the algorithm is so complex that they cannot understand it. It will always be possible to answer what-if questions, such as how the algorithm’s result would have been different had the person been one year older, or had an extra $1000 of annual income, or had one fewer prior misdemeanor conviction, or whatever. So complexity can only be a barrier to big-picture understanding, not to understanding which factors might have changed a particular person’s outcome.

The third type of explainability problem is unreasonableness. Here the workings of the algorithm are clear, and are justified by statistical evidence, but the result doesn’t seem to make sense. For example, imagine that an algorithm for making credit decisions considers the color of a person’s socks, and this is supported by unimpeachable scientific studies showing that sock color correlates with defaulting on credit, even when controlling for other factors. So the decision to factor in sock color may be justified on a rational basis, but many would find it unreasonable, even if it is not discriminatory in any way. Perhaps this is not a complaint about the algorithm but a complaint about the world–the algorithm is using a fact about the world, but nobody understands why the world is that way. What is difficult to explain in this case is not the algorithm, but the world that it is predicting.

The fourth type of explainability problem is injustice. Here the workings of the algorithm are understood but we think they are unfair, unjust, or morally wrong. In this case, when we say we have not received an explanation, what we really mean is that we have not received an adequate justification for the algorithm’s design.  The problem is not that nobody has explained how the algorithm works or how it arrived at the result it did. Instead, the problem is that it seems impossible to explain how the algorithm is consistent with law or ethics.

It seems useful, when discussing the explanation problem for algorithms, to distinguish these four cases–and any others that people might come up with–so that we can zero in on what the problem is. In the long run, all of these types of complaints are addressable–so that perhaps explainability is not a unique problem for algorithms but rather a set of commonsense principles that any system, algorithmic or not, must attend to.

Comments

  1. Tom Dietterich says

    When the algorithm that is being explained is the output (a decision tree, a neural network) of a machine learning process, an important role of explanation is to reveal the regularities that the machine learning algorithm has discovered in the data. We want to assess those regularities to decide whether they reflect some causal relationship (e.g., low income and high existing debt payments might predict higher default rates for a new loan) or some potentially spurious correlation (e.g., green socks, sex, race). Non-causal regularities could be instances of explanation types 2, 3, and 4. In making high stakes decisions, we want our algorithms to base those decisions on the fundamental causes rather than surface correlations. We want this because it is just, because it is impossible to elude (e.g., by changing socks), and because it is more likely to be robust to changes in the context (which is part of what I interpret “tz” as discussing).

    The designers of machine learning applications often use input variables of convenience (race, sex, age, sock color) because they are easy to measure rather than using the true causal variables (if those are known). This is fundamentally lazy, and we need to teach ML engineers and data scientists to think carefully about which variable to include.

    But it isn’t always just laziness, because in many cases, the true causal variables are difficult to observe directly. In such cases, one can apply probabilistic modeling strategies to put the causal variables into the model as latent variables and then attempt to estimate them from the other variables that are easier to observe. There are many technical issues with this (e.g., are the variables identifiable? Are their effective semantics the same as the intended semantics?), but if this works, then an explanation could be something like “You were denied the loan because we believe you are carrying lots of other debt. We don’t know exactly how much debt, but we do see $1000 per month in loan payments.”

    I don’t think Ed’s four categories really capture this sort of explanation, so I agree with other commenters that we need to refine these categories.

  2. I think you are changing the issue with the “complexity”. In my opinion, the questions aren’t in a form “what if the person was 1 year older?” etc.. The questions which need to be answered for the algorithm to be interpretable is “how did you come up with this particular decision?”, expecting a logical set of steps based on facts

  3. It is important to separate business choices protected by law (not sharing the algorithm formula as IP) from limits on explanation imposed by law (an algorithm developed by NSA is a national secret). I am betting most decisions not to explain an algorithm are business decisions, not ones required by law.

  4. Hugh Claffey says

    Dont agree with your view that human decisionmaking means we have to explore the brain. Human decision making is illustrated by written documents, individuals use brains to write and review human decision making, but text is the way it is manifest to other humans (admittedly in horribly complex, jargon-filled ways.

    also is there a category between inaccuracy and unreasonableness? I’m thinking about unacknowledged bias. Something more complex than ‘the examples used to populate the ML database were all Western undergrads’ but just short of ‘our examples were all male, because the we believe only males can do (whatever)’. Lets say we exclude convicted felons from a particular database population (for a justifiable reason), would the algorithm then be able to accommodate miscarriages of justice (or mistakes) in whatever outcomes it would be programmed for?

  5. jmorahan says

    There’s a second aspect to the confidentiality problem though. Suppose there’s a correlation between sock colour and defaulting on credit. Specifically, people who wear green socks are found to be more reliable in paying back their debts. Banks start to use this in their decision making. Customers demand an explanation, and the banks tell them: you weren’t wearing green socks. This becomes widely known. Now everybody will wear green socks to the bank, and the indicator becomes useless.

    For a real-world example of this effect, consider the SEO industry.

    So, arguably it *is* at least in part a problem with the algorithm itself. An explainable algorithm must use criteria that do not lose their value as indicators when they are explained.

  6. Kush R. Varshney says

    Consider submitting to and attending the Second Annual ICML Workshop on Human Interpretability in Machine Learning (WHI 2017). https://sites.google.com/view/whi2017/home

  7. Kevin Werbach says

    Is injustice really an explanation problem for algorithms? Seems like much of the time it can be explained quite well either by bias in the training dataset or redundant encoding. In the first case, the problem is failure to sufficiently interrogate the data, e.g. the the case of the company where women perform worse because they mistreat women, and the hiring algorithm picks up the outcomes as a reason not to hire them. In the second case, if more black people live in poor areas, or have friends with poor credit, then we face an accuracy/fairness tradeoff that explanation can’t fix. The explanation is unsatisfying for the problem we’re concerned about.

    • Ed Felten says

      It’s a good question. I agree with you asking why a particular outcome occurred is not the same as asking what that outcome is just or fair or reasonable.

      My point in the post is just that sometimes when people say that someone is owed an explanation, what they really mean is that that person is entitled to hear a reason why the outcome of a process is justified. For example, this is what lawyers seem to mean when they say that a person convicted of a crime has a right to an explanation of the sentence they receive. It’s useful to recognize that in these settings what is wanted is not an explanation of what caused the result, but rather a rationale for why the result is reasonable, based on agreed-upon principles.

      Explanation and justification are often conflated in the law. Ask a judge for an explanation of their decision, and what you’ll get is a justification. With an algorithm, we can more cleanly separate the question of how a result was arrived at from the question of whether that result is justified.

      • “Ask a judge for an explanation of their decision, and what you’ll get is a justification.”

        Yes and no, perhaps? A judge will provide two accounts, one substantive in response to the question, “What does the law prescribe in the face of certain facts about a matter in dispute?,” and the other procedural in response to the question, “What does the law prescribe given a posture of a case at a certain stage?” (In both accounts, “prescribe” could also mean “permit.”) The former looks something like: When we know person X performed conduct Y under circumstance Z the law will disapprove of and punish that conduct. (This begs second order questions: Why does the law disapprove of XYZ? This is the same as asking why the algorithms correlate credit defaults with red socks.) The latter, procedural determination looks like this: At this stage in a dispute, X legally can (or cannot) demur, compel disclosure, cross-examine X’s opponent, etc.

        Explanation and justification *are* often conflated in the law, because fairness (in theory) is supposed to motivate legal outcomes. Hence, a judge’s explanation is inherently also a justification. If I read this post and comments correctly, the assumption is that fairness is not expected to motivate outcomes of algorithms. Fairness is (in theory) a goal of the people who develop and deploy the algorithms.

        • Ed Felten says

          I mean something slightly different. You can ask for a procedural justification–what does the law prescribe in these circumstances? Or you can ask for a justification from first principles–what outcome follows from fundamental tenets of fairness and justice? To me, these both qualify as justifications, although of different sorts.

          But what if the judge’s decision was affected, in a causal sense, by the fact that the judge was in a bad mood because their spouse recently crashed the car, or because the weather was bad, or because the judge’s bum knee was painful? Evidence suggests that factors like these can affect decisions–no surprise, because judges are human. A good judge who is affected by such factors will give a justification based on law or justice, and the judge will sincerely believe in that explanation–even if it is not complete.

          The requirement that decisions come with justifications is helpful even if the justification is not always complete, because the requirement helps judges keep their mental focus on the factors that are supposed to matter, and away from those that are supposed to be irrelevant.

          But one of the effects of switching to a machine decision, where these things cannot be fully opaque, is to highlight the distinction between causal explanation of a decision and how the decision is justified.

          • …or what the judge had for breakfast that morning, the classic case inspiring Legal Realism! Arguably, one corrective for this issue is quasi-procedural: in certain circumstances the law prohibits abuse of judicial discretion, and no justification will do. But that’s a high standard and obviously the legal algorithm is not airtight. A judge can usually state a plausible legal justification without admitting that soggy Cheerios prompted the decision.

            Another case for an explainability problem: irrelevance. This is similar to injustice, but it operates in situations lacking moral valence. It’s also similar to unreasonableness, although one can imagine irrelevant decisions that nevertheless seem reasonable. Take relevancy rankings of searches, which purport to “decide” which few among massive numbers of documents are most relevant to a search query. Here, given a transparent and not too complex algorithm, we can always explain the outcome. Can we justify it? Not without admitting that the algorithm is inadequate, I think. The justification will address extrinsic factors, such as proxies for relevance (e.g., keywords in titles), cost/benefit, searcher sophistication, etc. Unlike the red socks scenario, in which a seemingly absurd result turns out to be grounded in good science, a justification stands no chance of persuading a searcher that the top results are in fact the most relevant. Granted, most wearers of red socks denied credit will not take satisfaction in the science, but it could happen. No searcher apprised of the algorithm would accept its authority respecting results 1-5 when the searcher decides that result 20 is most relevant.

  8. First, every financial algorithm has been backtested, but can’t predict where any stock market index will be two weeks from now.

    Second, Neural networks don’t have algorithms, but they are being used to detect credit card fraud and how does Google AI Go decide its moves?

    Third, the whole things is trying to replace one kind of knowledge with another. Wissen v.s. kennen. I forget the other split in other languages, but words like savior (fr), gnosis (greek). There is knowing someONE v.s. knowing someTHING.

    I live in a small town, the people KNOW me, at the bar and church, or just walking about. They know better than any algorithm what I’m likely to do. But that isn’t efficient or scalable, no we need algorithmic FICO scores – not just for finance, but to set bail and sentencing.

    Yet this is the fundamental error. We are not seeing a person (You are number 6 – I’m not a number I’m a human being – hahahahah – from the old Prisoner series with Patrick McGoohan). We are seeing an aggregation of data, often breaking the 5th amendment (testifying against one’s self – what color are your socks), and the fourth, and anything resembling due process. We have gone from convincing reasonable persons or a judge to some totally circumstantial score.

    And that score is on a very narrow basis. You either defaulted or didn’t. It doesn’t care that you were in an accident and missed payments because you were in the hospital or that you blew it and partied in Las Vegas.

    But algorithms, including neural networks are perfectly fair, no matter how unjust (Joker: That’s the thing about Chaos, it’s fair). Women tend to interrupt their career to have children, they don’t do dirty, dangerous, or difficult jobs – how many work in mines or oilfields, and on road crews, who’s holding the signs. Plug the data into an algorithm and maybe they are overpaid. The algorithm may perfectly predict the actual (statistical) productivity difference, but what then?

    Persons are individuals. Algorithms are racist, sexist, homophobic, etc. because they are bare statistics and don’t look beyond correlations. Even if you try to only do causation, you still end up with the blind algorithms missing actual causation which is harder.

    We normally worry about intent – Is whateer negative or positive intentional or accidental? Are you rich because you worked hard or won the lottery? Did you fight not to default or intentionally use bankruptcy as a soft method of defrauding. Algorithms can’t tell character. Or worse if they do, the wrong people might be found wanting. What if sexual license also makes one less likely to repay debts?

    And correlations do work. Do you really have time to know (as in person) you are renting to, or do you just want to know if they won’t trash the place and pay rent on time, and what if the algorithmic background check – with insured guarantee – rates them as 90% likely to be good or bad? And the know-person might be even worse – what if everyone you know are reliable are white traditional families, and the unreliable are single mothers and/or minorities?

    The problem with Martin Luther King Jr.’s “Content of our Character” is that it correlates to the “Color of our skin”, and at least today, not in a good way. And we want to switch standards when we don’t like the outcome. Ah fairness v.s. justice, chaos v.s. order again.

    This needs philosophy, epistemology (how do we know what we know, or if it is truthful), and some decision as to the nature of man.

  9. > “So complexity can only be a barrier to big-picture understanding, not to understanding which factors might have changed a particular person’s outcome”

    I feel that this is a substantial, significant problem. Being able to test an algorithm with hypothetical inputs doesn’t address genuine concerns about that algorithm’s complexity. “Big-picture understanding” is important!

    Current data processing and AI techniques will only make this worse. Both large statistical models and neural net algorithms are used in decision-making tools, but these are largely opaque to big-picture understanding. And yes, they can be tested with specific hypothetical inputs, but these systems can easily be implemented in ways that result in a chaotic result. That is, small changes in the inputs result in large, unpredictable changes in the output. With such algorithms, knowing e.g. the output for a person with an extra $500, $1000, and $1500 of income doesn’t tell you what the output would be for a person with an extra $501, $1001, or $1501 of income.

    I think it will be useful to study the ways that algorithms may or may not be understandable. But I have to say, frankly, I’m not convinced that these four categories adequately sum up the range of possible concerns, nor even if they do, do they adequately address our desire to understand algorithms (i.e. these categories largely come across as dismissive of a person’s concerns about understanding the algorithm, but IMHO not in a rigorous way that is convincing).

  10. Peter Leppik says

    Even if you have a machine that, in aggregate, makes better decisions than people, the machine is still likely to fail in ways that people won’t (and vice-versa).

    In this context, explainability is useful as a tool to help understand the limits of your algorithm. In the real world, problem spaces tend to be messy. Understanding why an algorithm made the decision it made can let you see when the algorithm might have strayed into an edge-case it can’t handle well.

    This might be a subset of your “unreasonableness” category, but to me it seems like a whole new category more like “validation.” There’s a lot of fields (especially where there’s safety issues) where it’s critically important to know when not to trust the computer.

    • Ed Felten says

      Validation is certainly important. We want to know that an algorithm’s results will be accurate, reliably enough that we are willing to rely on it over any alternatives.

      But accuracy doesn’t necessarily require understanding. In some settings the error rate of an algorithm can be estimated statistically with sufficient accuracy to have (enough) confidence in its future accuracy. Or the algorithm can be designed to be self-monitoring, in the sense that it will shut itself down and/or switch to a simpler algorithm if its error rate starts to creep up. In cases like these, we might have confidence in its accuracy even without a detailed understanding of how it works.

      How useful statistical testing or analysis can be will depend on the circumstances of a particular application.

      • Peter Leppik says

        What would you do in a situation where self-monitoring and statistical analysis is insufficient?

        For example, if there’s a risk of a catastrophic failure (a jetliner crashes, a nuclear reactor goes critical, etc.), you need to have some way of knowing that the algorithm has strayed out of its envelope. I see explainability as a useful tool–though not the only tool–for helping to define the boundaries of where you can trust the machine.

  11. Mike Weiksner says

    Yes, lots of prior work to draw on. ML is like advanced correlation, so it’s just correlation vs causation. Think: red socks are fashionable, fashionable people are prone to default. But fashion changes!

  12. Why not just ask for the source code (at least as an opening gambit)?

    From the perspective of the IA (information asymmetry) arbitrageurs, the request for explainability will read as zero respect for trade secrets, anyway. The posture of the #OpenData movement has to be “as a matter of fact you don’t have a moral claim on trade secrecy.”

    If they won’t reveal their business model, reverse engineer it. And for the love of knowledge, publish your findings! Tom Slee’s research on AirBnB is an important step in that direction.