October 24, 2018

Ethics Education in Data Science

Data scientists in academia and industry are increasingly recognizing the importance of integrating ethics into data science curricula. Recently, a group of faculty and students gathered at New York University before the annual FAT* conference to discuss the promises and challenges of teaching data science ethics, and to learn from one another’s experiences in the classroom. This blog post is the first of two which will summarize the discussions had at this workshop.

There is general agreement that data science ethics should be taught, but less consensus about what its goals should be or how they should be pursued. Because the field is so nascent, there is substantial room for innovative thinking about what data science ethics ought to mean. In some respects, its goal may be the creation of “future citizens” of data science who are invested in the welfare of their communities and the world, and understand the social and political role of data science therein. But there are other models, too: for example, an alternative goal is to equip aspiring data scientists with technical tools and organizational processes for doing data science work that aligns with social values (like privacy and fairness). The group worked to identify some of the biggest challenges in this field, and when possible, some ways to address these tensions.

One approach to data science ethics education is including a standalone ethics course in the program’s curriculum. Another option is embedding discussions of ethics into existent courses in a more integrated way. There are advantages and disadvantages to both options. Standalone ethics courses may attract a wider variety of students from different disciplines than technical classes alone, which provides potential for rich discussions. They allow professors to cover basic normative theories before diving into specific examples without having to skip the basic theories or worry that students covered them in other course modules. Independent courses about ethics do not necessarily require cooperation from multiple professors or departments, making them easier to organize. However, many worry that teaching ethics separately from technical topics may marginalize ethics and make students perceive it as unimportant. Further, standalone courses can either be elective or mandatory. If elective, they may attract a self-selecting group of students, potentially leaving out other students who could benefit from exposure to the material; mandatory ethics classes may be seen as displacing other technical training students want and need. Embedding ethics within existent CS courses may avoid some of these problems and can also elevate the discourse around ethical dilemmas by ensuring that students are well-versed in the specific technical aspects of the problems they discuss.

Beyond course structure, ethics courses can be challenging for data science faculty to teach effectively. Many students used to more technical course material are challenged by the types of learning and engagement required in ethics courses, which are often reading-heavy. And the “answers” in ethics courses are almost never clear-cut. The lack of clear answers or easily constructed rubrics can complicate grading, since both students and faculty in computer science may be used to grading based on more objective criteria. However, this problem is certainly not insurmountable – humanities departments have dealt with this for centuries, and dialogue with them may illuminate some solutions to this problem. Asking students to complete frequent but short assignments rather than occasional long ones may make grading easier, and also encourages students to think about ethical issues on a more regular basis.

Institutional hurdles can hinder a university’s ability to satisfactorily address questions of ethics in data science. A dearth of technical faculty may make it difficult to offer a standalone course on ethics. A smaller faculty may push a university towards incorporating ethics into existent CS courses rather than creating a new class. Even this, however, requires that professors have the time and knowledge to do so, which is not always the case.

The next blog post will enumerate topics discussed and assignments used in courses that discuss ethics in data science.

Thanks to Karen Levy and Kathy Pham for their edits on a draft of this post.

Getting serious about research ethics: Security and Internet Measurement

[This blog post is a continuation of our series about research ethics in computer science that we started last week]

Research projects in the information security and Internet measurement sub-disciplines typically interact with third-party systems or devices to collect a large amounts of data. Scholars engaging in these fields are interested to collect data about technical phenomenon. As a result of the widespread use of the Internet, their experiments can interfere with human use of devices and reveal all sorts of private information, such as their browsing behaviour. As awareness of the unintended impact on Internet users grew, these communities have spent considerable time debating their ethical standards at conferences, dedicated workshops, and in journal publications. Their efforts have culminated in guidelines for topics such as vulnerability disclosure or privacy, whereby the aim is to protect unsuspecting Internet users and human implicated in technical research.

 

Prof. Nick Feamster, Prof. Prateek Mittal, moderator Prof. Elana Zeide, and I discussed some important considerations for research ethics in a panel dedicated to these sub-disciplines at the recent CITP conference on research ethics in computer science communities. We started by explaining that gathering empirical data is crucial to infer the state of values such as privacy and trust in communication systems. However, as methodological choices in computer science will often have ethical impacts, researchers need to be empowered to reflect on their experimental setup meaningfully.

 

Prof. Feamster discussed several cases where he had sought advice from ethical oversight bodies, but was left with unsatisfying guidance. For example, when his team conducted Internet censorship measurements (pdf), they were aware that they were initiating requests and creating data flows from devices owned by unsuspecting Internet users. These new information flows were created in realms where adversaries were also operating, for example in the form of a government censors. This may pose a risk to the owners of devices that were implicated in the experimentation and data collection. The ethics board, however, concluded that such measurements did not meet the strict definition of “human subjects research”, which thereby excluded the need for formal review. Prof. Feamster suggests computer scientists reassess how they think about their technologies or newly initiated data flows that can be misused by adversaries, and take that into account in ethical review procedures.

 

Ethical tensions and dilemmas in technical Internet research could be seen as interesting research problems for scholars, argued Prof. Mittal. For example, to reason about privacy and trust in the anonymous Tor network, researchers need to understand to what extent adversaries can exploit vulnerabilities and thus observe Internet traffic of individual users. The obvious, relatively easy, and ethically dubious measurement would be to attack existing Tor nodes and attempt to collect real-time traffic of identifiable users. However, Prof. Mittal gave an insight into his own critical engagement with alternative design choices, which led his team to create a new node within Princeton’s university network that they subsequently attacked. This more lab-based approach eliminates risks for unsuspecting Internet users, but allowed for the same inferences to be done.

 

I concluded the panel, suggesting that ethics review boards at universities, academic conferences, and scholarly journals engage actively with computer scientists to collect valuable data whilst respecting human values. Currently, a panel on non-experts in either computer science or research ethics are given a single moment to judge the full methodology of a research proposal or the resulting paper. When a thumbs-down is issued, researchers have no or limited opportunity to remedy their ethical shortcomings. I argued that a better approach would be an iterative process with in-person meetings and more in-depth consideration of design alternatives, as demonstrated in a recent paper about Advertising as a Platform for Internet measurements (pdf). This is the approach advocates in the Networked Systems Ethics Guidelines. Cross-disciplinary conversation, rather than one-time decisions, allow for a mutual understanding between the gatekeepers of ethical standards and designers of useful computer science research.

 

See the video of the panel here.

Design Ethics for Gender-Based Violence and Safety Technologies

Authored (and organized) by Kate Sim and Ben Zevenbergen.

Digital technologies are increasingly proposed as innovative solution to the problems and threats faced by vulnerable groups such as children, women, and LGBTQ people. However, there exists a structural lack of consideration for gender and power relations in the design of Internet technologies, as previously discussed by scholars in media and communication studies (Barocas & Nissenbaum, 2009; boyd, 2001; Thakor, 2015) and technology studies (Balsamo, 2011; MacKenzie and Wajcman, 1999). But the intersection between gender-based violence and technology deserves greater attention. To this end, scholars from the Center for Information Technology at Princeton and the Oxford Internet Institute organized a workshop to explore the design ethics of gender-based violence and safety technologies at Princeton in the Spring of 2017.

The workshop welcomed a wide range of advocates in areas of intimate partner violence and sex work; engineers, designers, developers, and academics working on IT ethics. The objectives of the day were threefold: (1) to better understand the lack of gender considerations in technology design, (2) to formulate critical questions for functional requirement discussions between advocates and developers of gender-based violence applications; and (3) establish a set of criteria by which new applications can be assessed from a gender perspective.

Following three conceptual takeaways from the workshop, we share instructive primers for developers interested in creating technologies for those affected by gender-based violence.

 

Survivors, sex workers, and young people are intentional technology users

Increasing public awareness of the prevalence gender-based violence, both on and offline, often frames survivors of gender-based violence, activists, and young people as vulnerable and helpless. Contrary to this representation, those affected by gender-based violence are intentional technology users, choosing to adopt or abandon tools as they see fit. For example, sexual assault victims strategically disclose their stories on specific social media platforms to mobilize collective action. Sex workers adopt locative technologies to make safety plans. Young people utilize secure search tools to find information about sexual health resources near them. To fully understand how and why some technologies appear to do more for these communities, developers need to pay greater attention to the depth of their lived experience with technology.

 

Context matters

Technologies designed with good intentions do not inherently achieve their stated objectives. Functions that we take for granted to be neutral, such as a ‘Find my iPhone’ feature, can have unintended consequences. In contexts of gender-based violence, abusers and survivors appropriate these technological tools. For example, survivors and sex workers can use such a feature to share their whereabouts with friends in times of need. Abusers, on the other hand, can use the locative functions to stalk their victims. It is crucial to consider the context within which a technology is used, the user’s relationship to their environment, their needs, and interests so that technologies can begin to support those affected by gender-based violence.

 

Vulnerable communities perceive unique affordances

Drawing from ecological psychology, technology scholars have described this tension between design and use as affordance, to explain how a user’s perception of what can and cannot be done on a device informs their use. Designers may create a technology with a specific use in mind, but users will appropriate, resist, and improvise their use of the features as they see fit. For example, the use of a hashtags like #SurvivorPrivilege is an example of how rape victims create in-groups on Twitter to engage in supportive discussions, without the intention of it going viral.

 

ACTION ITEMS

  1. Predict unintended outcomes

Relatedly, the idea of devices as having affordances allows us to detect how technologies lead to unintended outcomes. Facebook’s ‘authentic name’ policy may have been instituted to promote safety for victims of relationship violence. The social and political contexts in which this policy is used, however, disproportionately affects the safety of human rights activists, drag queens, sex workers, and others — including survivors of partner violence.

 

  1. Question the default

Technology developers are in a position to design the default settings of their technology. Since such settings are typically left unchanged by users, developers must take into account the effect on their target end users. For example, the default notification setting for text messages display the full message content in home screen. A smartphone user may experience texting as a private activity, but the default setting enables other people who are physically co-present to be involved. Opting out of this default setting requires some technical knowledge from the user. In abusive relationships, the abuser can therefore easily access the victim’s text messages through this default setting. So, in designing smartphone applications for survivors, developers should question the default privacy setting.

 

  1. Inclusivity is not generalizability

There appears to be an equation of generalizability with inclusivity. An alarm button that claims to be for generally safety purposes may take a one-size-fits-all approach by automatically connecting the user to law enforcement. In cases of sexual assault, especially involving those who are of color, in sex work, or of LGBTQ identities, survivors are likely to avoid such features precisely because of its connection to law enforcement. This means that those who are most vulnerable are inadvertently excluded from the feature. Alternatively, an alarm feature that centers on these communities may direct the user to local resources. Thus, a feature that is generalizable may overlook target groups it aims to support; a more targeted feature may have less reach, but meet its objective. Just as communities’ needs are context-based, inclusivity, too, is contextualized. Developers should realize that that the broader mission of inclusivity can in fact be completed by addressing a specific need, though this may reduce the scope of end-users.

 

  1. Consider co-designing

How, then, can we develop targeted technologies? Workshop participants suggested co-design (similarly, user-participatory design) as a process through which marginalized communities can take a leading role in developing new technologies. Instead of thinking about communities as passive recipients of technological tools, co-design positions both target communities and technologists as active agents who share skills and knowledge to develop innovative, technological interventions.

 

  1. Involve funders and donors

Breakout group discussions pointed out how developers’ organizational and funding structures play a key role in shaping the kind of technologies they create. Suggested strategies included (1) educating donors about the specific social issue being addressed, (2) carefully considering whether funding sources meet developers’ objectives, and (3) ensuring diversity in the development team.

 

  1. Do no harm with your research

In conducting user research, academics and technologists aim to better understand marginalized groups’ technology uses because they are typically at the forefront of adopting and appropriating digital tools. While it is important to expand our understanding of vulnerable communities’ everyday experience with technology, research on this topic can be used by authorities to further marginalize and target these communities. Take, for example, how tech startups like this align with law enforcement in ways that negatively affect sex workers. To ensure that research done about communities can actually contribute to supporting those communities, academics and developers must be vigilant and cautious about conducting ethical research that protects its subjects.

 

  1. Should this app exist?

The most important question to address at the beginning of a technology design process should be: Should there even be an app for this? The idea that technologies can solve social problems as long as the technologists just “nerd harder” continues to guide the development and funding of new technologies. Many social problems are not necessarily data problems that can be solved by an efficient design and padded with enhanced privacy features. One necessary early strategy of intervention is to simply raise the question of whether technologies truly have a place in the particular context and, if so, whether it addresses a specific need.

Our workshop began with big questions about the intersections of gender-based violence and technology, and concluded with a simple but piercing question: Who designs what for whom? Implicated here are the complex workings of gender, sexuality, and power embedded in the lifetime of newly emerging devices from design to use. Apps and platforms can certainly have their place when confronting social problems, but the flow of data and the revealed information must be carefully tailored to the target context. If you want to be involved with these future projects, please contact or .

The workshop was funded by the Princeton’s Center for Information Technology Policy, Princeton’s University Center for Human Values, the Ford Foundation, the Mozilla Foundation, and Princeton’s Council on Science and Technology.