October 6, 2022

Ethics Education in Data Science

Data scientists in academia and industry are increasingly recognizing the importance of integrating ethics into data science curricula. Recently, a group of faculty and students gathered at New York University before the annual FAT* conference to discuss the promises and challenges of teaching data science ethics, and to learn from one another’s experiences in the classroom. This blog post is the first of two which will summarize the discussions had at this workshop.

There is general agreement that data science ethics should be taught, but less consensus about what its goals should be or how they should be pursued. Because the field is so nascent, there is substantial room for innovative thinking about what data science ethics ought to mean. In some respects, its goal may be the creation of “future citizens” of data science who are invested in the welfare of their communities and the world, and understand the social and political role of data science therein. But there are other models, too: for example, an alternative goal is to equip aspiring data scientists with technical tools and organizational processes for doing data science work that aligns with social values (like privacy and fairness). The group worked to identify some of the biggest challenges in this field, and when possible, some ways to address these tensions.

One approach to data science ethics education is including a standalone ethics course in the program’s curriculum. Another option is embedding discussions of ethics into existent courses in a more integrated way. There are advantages and disadvantages to both options. Standalone ethics courses may attract a wider variety of students from different disciplines than technical classes alone, which provides potential for rich discussions. They allow professors to cover basic normative theories before diving into specific examples without having to skip the basic theories or worry that students covered them in other course modules. Independent courses about ethics do not necessarily require cooperation from multiple professors or departments, making them easier to organize. However, many worry that teaching ethics separately from technical topics may marginalize ethics and make students perceive it as unimportant. Further, standalone courses can either be elective or mandatory. If elective, they may attract a self-selecting group of students, potentially leaving out other students who could benefit from exposure to the material; mandatory ethics classes may be seen as displacing other technical training students want and need. Embedding ethics within existent CS courses may avoid some of these problems and can also elevate the discourse around ethical dilemmas by ensuring that students are well-versed in the specific technical aspects of the problems they discuss.

Beyond course structure, ethics courses can be challenging for data science faculty to teach effectively. Many students used to more technical course material are challenged by the types of learning and engagement required in ethics courses, which are often reading-heavy. And the “answers” in ethics courses are almost never clear-cut. The lack of clear answers or easily constructed rubrics can complicate grading, since both students and faculty in computer science may be used to grading based on more objective criteria. However, this problem is certainly not insurmountable – humanities departments have dealt with this for centuries, and dialogue with them may illuminate some solutions to this problem. Asking students to complete frequent but short assignments rather than occasional long ones may make grading easier, and also encourages students to think about ethical issues on a more regular basis.

Institutional hurdles can hinder a university’s ability to satisfactorily address questions of ethics in data science. A dearth of technical faculty may make it difficult to offer a standalone course on ethics. A smaller faculty may push a university towards incorporating ethics into existent CS courses rather than creating a new class. Even this, however, requires that professors have the time and knowledge to do so, which is not always the case.

The next blog post will enumerate topics discussed and assignments used in courses that discuss ethics in data science.

Thanks to Karen Levy and Kathy Pham for their edits on a draft of this post.

Sign up now for the Bitcoin and cryptocurrency technologies online course

At Princeton I taught a course on Bitcoin and cryptocurrency technologies during the semester that just ended. Joe Bonneau unofficially co-taught it with me. Based on student feedback and what we accomplished in the course, it was extremely successful. Next week I’ll post videos of all the final project presentations.

The course was based on a series of video lectures. We’re now offering these lectures free to the public, online, together with homeworks, programming assignments, and a textbook. We’ve heard from computer science students at various institutions as well as the Bitcoin community about the need for structured educational materials, and we’re excited to fill this need.

We’re using Piazza as our platform. Here’s the course page. To sign up, please fill out this (very short) form.

The first several book chapters are already available. The course starts February 16, and we’ll start making the videos available closer to that date (you’ll need to sign up to watch the videos Edit: we’ve changed this policy; the lectures are also publicly available). Each week there will be a Google hangout with that week’s lecturer. We’ll also answer questions on Piazza.

[Read more…]

Educating Leaders who Tackle the Challenges of their Time; Lessons from the Past: Book Review: First Class, The Legacy of Dunbar, America's First Black Public High School

One of last year’s CITP lectures that is still fresh in my mind is Brad Smith’s talk on “Immigration, Education, and the Future of Computer Science in America.” In his presentation on developing a process for educating the next generation of computer scientists in U.S. high schools and colleges, Mr. Smith noted that in the state of New Jersey, where 8.8 million people live, only 874 students took the computer science AP exam, and of those, only 17 were African-American. In Alison Stewart’s excellent new book “First Class: The Legacy of Dunbar, America’s First Black Public High School,” Ms. Smith tells the story of one of the best and most important American high schools of the 20th century. In the first half of the 20th century, Dunbar High School, a public school located in Washington, DC, produced numerous leaders in medicine, science, education, law, politics and the military, including several from my family. With the end of segregation, the conditions that resulted in Dunbar’s creation ceased to exist. The question remains, however, as to how in diverse public education systems to develop leaders in the fields that are critical to the country’s economic and social progress.
[Read more…]

Computer science education done right: A rookie’s view from the front lines at Princeton

In many organizations that are leaders in their field, new inductees often report being awed when they start to comprehend how sophisticated the system is compared to what they’d assumed. Engineers joining Google, for example, seem to express that feeling about the company’s internal technical architecture. Princeton’s system for teaching large undergraduate CS classes has had precisely that effect on me.

I’m “teaching” COS 226 (Data Structures and Algorithms) together with Josh Hug this semester. I put that word in quotes because lecturing turns out to be a rather small, albeit highly visible part of the elaborate instructional system for these classes that’s been put in place and refined over many years. It involves nine different educational modes that students interact with and a six different types of instructional staff(!), each with a different set of roles. Let me break it down in terms of instructional staff responsibilities, which correspond roughly to learning modes.
[Read more…]

Open Access to Scholarly Publications at Princeton

In its September 2011 meeting, the Faculty of Princeton University voted unanimously for a policy of open access to scholarly publications:

“The members of the Faculty of Princeton University strive to make their publications openly accessible to the public. To that end, each Faculty member hereby grants to The Trustees of Princeton University a nonexclusive, irrevocable, worldwide license to exercise any and all copyrights in his or her scholarly articles published in any medium, whether now known or later invented, provided the articles are not sold by the University for a profit, and to authorize others to do the same. This grant applies to all scholarly articles that any person authors or co-authors while appointed as a member of the Faculty, except for any such articles authored or co-authored before the adoption of this policy or subject to a conflicting agreement formed before the adoption of this policy. Upon the express direction of a Faculty member, the Provost or the Provost’s designate will waive or suspend application of this license for a particular article authored or co-authored by that Faculty member.

“The University hereby authorizes each member of the faculty to exercise any and all copyrights in his or her scholarly articles that are subject to the terms and conditions of the grant set forth above. This authorization is irrevocable, non-assignable, and may be amended by written agreement in the interest of further protecting and promoting the spirit of open access.”

Basically, this means that when professors publish their academic work in the form of articles in journals or conferences, they should not sign a publication contract that prevents the authors from also putting a copy of their paper on their own web page or in their university’s public-access repository.

Most publishers in Computer Science (ACM, IEEE, Springer, Cambridge, Usenix, etc.) already have standard contracts that are compatible with open access. Open access doesn’t prevent these publishers from having a pay wall, it allows other means of finding the same information. Many publishers in the natural sciences and the social sciences also have policies compatible with open access.

But some publishers in the sciences, in engineering, and in the humanities have more restrictive policies. Action like this by Princeton’s faculty (and by the faculties at more than a dozen other universities in 2009-10) will help push those publishers into the 21st century.

The complete report of the Committee on Open Access is available here.