January 21, 2025

Building a Bridge with Concrete… Examples

Thanks to Annette Zimmermann and Arvind Narayanan for their helpful feedback on this post.

Algorithmic bias is currently generating a lot of lively public and scholarly debate, especially amongst computer scientists and philosophers. But do these two groups really speak the same language—and if not, how can they start to do so?

I noticed at least two different ways of thinking about algorithmic bias during a recent research workshop on the ethics of algorithmic decision-making at Princeton University’s Center for Human Values, organized by political philosopher Dr. Annette Zimmermann. Philosophers are thinking about algorithmic bias in terms of things like the inherent value of explanation, the fairness and accountability rights afforded to humans, and whether groups that have been systematically affected by unfair systems should bear the burden for integration when transitioning to a uniform system. Computer scientists, by contrast, are thinking about algorithmic bias in terms of things like running a gradient backwards to visualize a heat map, projecting features into various subspaces devoid of protected attributes, and tuning hyperparameters to better satisfy a new loss function. Of course these are vast generalizations about the two fields, and there are plenty of researchers doing excellent work at the intersection, but it seems that for the most part while philosophers are debating which sets of ethical axioms ought to underpin algorithmic decision-making system, computer scientists are in the meantime already deploying these systems into the real world.

In formulating loss functions, consequentialists might prioritize maximizing accurate outcomes for the largest possible number of people, even if that is at the cost of fair treatment, whereas deontologists might prioritize treating everyone fairly, even if that is at the cost of optimality. But there isn’t a definitive “most moral” answer, and if something like equalizing false positive rates were the key to fairness, we would not be having the alarming headlines of algorithmic bias that we have today.

Inundated with various conflicting definitions of fairness, scientists are often optimizing for metrics they believe to be best and proceeding onwards. For example, one might reasonably think that the way to ensure fairness of an algorithm between different racial groups could be to enforce predictive parity (equal likelihood of accurate positive predictions), or to equalize false error rates, or just to treat similar individuals similarly. However, it is actually mathematically impossible to simultaneously satisfy seemingly reasonable fairness criteria like these in most real world settings. It is unclear how to choose amongst the criteria, and even more unclear how one would go about translating complex ideas that may require consideration, such as systematic oppression, into a world of optimizers and gradients.

Since concrete mappings between a mathematical loss function and moral concepts are likely impossible to dictate, and philosophers are unlikely to settle on an ultimate theory of fairness, perhaps for now we can adopt a strategy that is, at least, not impossible to implement: a purposefully created, context- and application-specific validation/test set. The motivation behind this is that even if philosophers and ethicists cannot decisively articulate a set of general, static fairness desiderata, perhaps they can make more domain-specific, dynamic judgements: for instance whether one should prefer a system that gives person A with a set of attributes and features a loan or not. And they can also say that for person B and C and so on. Of course there will not be unanimous agreement, but at least a general consensus towards a particular outcome as preferable over the other. One could then create a whole set of such examples. Concepts like the idea that similar people should be treated similarly in a given decision scenario—the ‘like cases maxim’ in legal philosophy—could be encoded into this test set by having groups of people that differ only in a protected attribute be given the same result, and even concepts like equal accuracy rates across protected groups could be encoded in by having the test set be represented by equal numbers of people from each group rather than proportional to the real world majority/minority representations. However, the test set is not a constructually valid way to enforce these fairness constraints, and it shouldn’t be either, because the reason why such a test set would exist is that the right fairness criteria are not actually known, otherwise it would just be explicitly formulated into the loss function.

At this juncture, ethicists and computer scientists could usefully engage in complementary work: ethicists could identify difficult edge cases that challenge what we think about moral questions and incorporate this into the test set, and computer scientists could work on optimizing accuracy rates on a given validation set. There are a few crucial differences, however, from similar collaborative approaches in other domains like when doctors are called on to provide expert labels on medical data so models can be trained to detect things like eye diseases. There is now the new notion that the distribution of the test set, in addition to just the labels, are going to be specifically decided upon by domain experts. Further, this collaboration would last beyond just the labeling of the data. Failure cases should be critically investigated earlier in the machine learning pipeline in an iterative and reflective way to ensure things like overfitting are not happening. Whether performing well on the hidden test set requires learning fairer representations in the feature space or thresholding different groups differently, scientists will build context-specific models that encompass certain moral values defined by ethicists, who are grounding the test set in examples of realizations of such values.

But does this proposal mean adopting a potentially dangerous, ethically objectionable “the ends justify the means” logic? Not necessarily. With algorithm developers working in conjunction with ethicists to ensure the means are not unsavory, this could be a way to bridge the divide between abstract notions of fairness, and concrete ways of implementing systems.

This may not be a long-term ideal way to deal with the problem of algorithmic fairness because of the difficulty in generalizing between applications, and in situations where creating an expert-curated test set is too expensive or not scalable, not preferred over satisfying one of the many mathematical definitions of fairness, but it could be one possible way to incorporate philosophical notions of fairness into the development of algorithms. Because technologists are not going to hold off and wait on deploying machine learning systems until they are in a state of fairness everyone agrees on, finding a way of incorporating philosophical views about central moral values like fairness and justice into algorithmic systems right now is an urgent problem.

Supervised machine learning has traditionally been focused on predicting based on historical and existing data, but maybe we can structure our data in a way that is a model not of the society we actually live in, but of the one we hope to live in. Translating complex philosophical values into representative examples is not an easy task, but it is one that ethicists have been doing a version of for centuries in order to investigate moral concepts—and perhaps it can also be the way to convey some sense of our morals to machines.