May 21, 2018

How Data Science and Open Science are Transforming Research Ethics: Edward Freeland at CITP

How are data science and  open science movement transforming how researchers manage research ethics? And how are these changes influencing public trust in social research?


I’m here at the Center for IT Policy to hear a talk by Edward P. Freeland. Edward is the associate director of the Princeton University Survey Research Center and a lecturer at the Woodrow Wilson School of Public and International Affairs. Edward has been a member of Princeton’s Institutional Review Board since 2005 and currently serves as chair.

Edward starts out by telling us about about his family’s annual Christmas card. Every year, his family loses track of a few people, and he ends up having to try to track someone down. For several years, they sent the postcard to Ed’s wife’s cousin Billy to someone in Hartford CT, but it turns out that the address was not their cousin Billy but a retired neurosurgeon. To resolve this problem this year, Edward and his wife filled out more information about their family members into an app. Along the way, he learned just how much information about people is available on the internet. While technology makes it possible to keep track of family members more easily, some of that data might be more than people want to be known.

How does this relate to research ethics? Edward tells us about the principles that currently shape research ethics in the United States. These principles come from the 1978 Belmont Report, which was prompted in party by the Tuskeegee Syphilis Study, a horrifying medical study that ran for forty years. In the US, universities now have to do research focused on respect for persons, beneficence, and justice.

In practice, what do university ethics boards (IRBs) care about? Edward and his colleagues compiled a list of the issues that ethics boards into a single slide:

When it comes to privacy, what to university ethics boards care about? Federal regulations focus on any disclosure of the human subjects’ responses outside of the research and the risk that it would expose people to. In practice, the ethics board expects researchers to adopt procedural safeguards around who can access data and how it’s protected.

In the past, studies would basically conclude after the researchers publish the research. But the practice of research has been changing. Advocates of open science have worked to reduce fraud, prevent burying of unexpected results, enhance funder/taxpayer impact, strengthen, the integrity of scientific work, work through crowdsourcing or citizen science, and collaborate in new ways. Edward tells about the Open Science Collaboration, which tried in 2015 to replicate a hundred studies from across psychology, and who often failed to do so. Now others are trying to ask similar questions across other fields including cancer research.

In just a few years, the Center for Open Science has supported many researchers and journals to pre-register and publish the details of their research. Other organizations are also developing similar initiatives, such as

Many in the open science movement suggest that researchers archive and share data, even after submitting a manuscript. Some people use a data sharing agreement to protect data used by others. Others prepare datafiles from their research for public use. But publishing data introduces privacy risks for participants in research. While US legislation HIPAA covers medical data, there aren’t authoritative norms or guidelines around sharing that data.

Many people turn to anonymization as a way to protect the information of people who participate in research. But does it really work? The landscape of data re-identification is changing from year to year, but the consensus is that anonymization doesn’t tend to work. As Matt Salganik points out in his book Bit By Bit, we should assume that all data are potentially identifiable and potentially sensitive. Where might we need to be concerned about potential problems?

  • People are sometimes recruited to join survey panels where they answer many questions over the years. Because this data is highly-dimensional, it may be very easy to re-identify people
  • Distributed anonymous workforces like Amazon Mechanical Turk also represent a privacy risk. The ID codes aren’t anonymous: you can google people’s IDs and find people’s comments on various Amazon products
  • Re-identification attacks, which draw together data from many sources to find someone, are becoming more common

Public Confidence in Science

How we treat people’s data affects public confidence in science– not only how people interpret what we learn, but also people’s likelihood to participate in research. Edward tells us that survey response rates have been dropping, even when surveys are conducted by the government. American society has always had a fringe movement of people who resisted government data collection. If those people gain access to the levers of power, they may be able to influence the government’s likelihood to collect data that could inform the public on important issues.

Edward tells us that very few people expect their data to be kept private and secure, according to research by Pew. When combined with declining trust in institutions, concerns about privacy may be one reason that fewer people are responding to surveys.

At the same time, many people are organizing to try to resist surveying by the US government. Some political and activist groups have been filming their interactions with survey collectors, harassing them, and claiming that researchers or the government have secret. As researchers try to uphold public trust by doing trustworthy, beneficial research, we need to be aware of the social and political forces that influence how people think about research.

Why Everyone in Tech Should Visit the American Museum of Tort Law

This Monday, Nikki Bourassa and I organized a van from Harvard’s Berkman Klein Center for Internet and Society to visit the American Museum of Tort Law, which I have decided to call the American Museum of Exploding Cars and Toys that Kill You.

While at the museum, I came to see another way that research can inform democratic processes for public safety: through its role in court cases.

I think everyone in tech should visit this museum, especially if you’re designing something that becomes part of people’s lives. The stories are organized to help you learn about US law while also thinking about what it means to responsible for the risks that a product introduces to society. You can read the full post on Medium:

Automating Inequality: Virginia Eubanks Book Launch at Data & Society

What does it mean for public sector actors to implement algorithms to make public services to be more efficient? How are these systems experienced by the families and people who face the consequences?

Speaking at the Data and Society Institute today is Virginia Eubanks, author of the new book Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. Virginia Eubanks is an Associate Professor of Political Science at the University at Albany, SUNY.  Virginia is currently a founding member of the Our Data Bodies Project and a Fellow at New America. For two decades, Eubanks has worked in community technology and economic justice movements. I first met Virginia as a PHD student at the MIT Center for Civic Media, where her book Digital Dead End helped me think about the challenges of genuine empowerment through technology, and I’ve been eagerly awaiting this latest book.

Today at the Data & Society Institute, Virginia was interviewed by Alondra Nelson, president of the Social Science Research Council, and Julia Angwin, an investigative journalist at ProPublica (watch the video here).

We live in a new regime of data analytics, Virginia reminds us: a lot of really great work is thinking deeply about data-based discrimination and the role that data plays to challenge the inequities of our lives or make them worse. To study this transformation, Virginia grounds her work in history and in their political contexts. These systems didn’t fall from the sky or land on us with a blank slate. Virginia also starts by talking with people who are the targets of these systems: primarily poor and working class families across the color line– people who are often left out of the conversation. Over the last year, this has involved talking with over a hundred people across the US.

Origins of the Digital Poorhouse

How did we get from the county poorhouse of 1819 to today’s digital poorhouse? Virginia tells us about one moment in this history: the rise of this digital poorhouse. Previously, she had expected that digitization had started in the 1990s, but she realized that digital welfare records actually started in the 1960s and 1970s with the National Welfare Rights Movement. This highly successful movement, which had origins in the civil rights movement, was successful at establishing that poor Americans should enjoy the full array of constitutional rights.

In the 60s and 70s, the welfare rights movement changed policies like “man-in-house” laws, “suitable home rules,” residency restrictions, and “employable mother” says Virginia. These laws expected mothers to work and barred them from public services. In the first time in history, she argues, she expanded the rights of middle class people to poor and working class people, including unmarried moms and women of color.

Even as the Welfare Rights Movement became successful, the backlash against this movement coincided with a recession. Because it had become legally impossible to discriminate against people, and administrators became caught behind a rock and a hard place. They solved this problem, says Eubanks, by commissioning a massive set of digital technologies in the late 60s and early 70s. Almost immediately, we see a drop in the ability of people to access entitlements which they were due. This happened well in advance of Reagan’s “welfare queen” speech and subsequent legislation to reduce access to welfare.

This effort to manage and reduce access to public services continues today, says Virginia. The state of Indiana entered into a $1.16 billion contract with IBM and ICS to automate welfare eligibility contracts. This program used computers and call-centers to manage a queue of tasks, replacing the previous case-based, family-based system focused on supporting people. This system broke relationships between case workers and people seeking access to public assistance. It’s the most straightforward disaster: a million applications were denied in the first three years, a fifty percent reduction in denials. The state broke the contract because the system went so badly, and the case is ongoing.

How Computerized Public Services Put People at Risk

How did people experience this system? Virginia tells us about Omega Young, who missed a chance to apply for medicaid because she was in the hospital being treated for cancer. Because she was in the hospital, this mother struggled to meet the system’s requirements. She called the help center to let them know she was hospitalized, and her food stamps and medical assistance was cut off for “failure to cooperate.” Most people lost their assistance for this reason, often because they missed an appointment or missed a signature on a hundred-page form. The day after Omega died, she won a case

Next, Virginia talks about the Allegheny Family Screening tool, a model used to predict children who might be victims of abuse and neglect in the future (Virginia has written about it here, the New York Times also recently published an account). In 1999, the city of Pittsburgh commissioned a data warehouse that collects data from everything from the police department to public services. In 2012, the office released a request for proposals funded by foundations, asking people to propose ways to mine the data to support public services. The grant went to a team of researchers who use statistical regression models to predict what children are going to face neglect, using 130 indicators to predict treatment.

In the Pittsburgh area, when a call comes into the hotline for child abuse and neglect, the intake workers will interview the caller and make a decision based on two factors: the risk of the allegation (is it child abuse or neglect?) and how safe they feel that child is. Once they make those two decisions, they run the allegheny valley screening tool, which offers a thermometer from 1 to 20 predicting the level of risk. Based on those factors, the intake manager makes a decision about whether to screen that family through the county’s equivalent of child protective services.

Virginia shares voices from people who were targets of this system. She talks about Angel and Patrick, who she meet at a family center. They didn’t stand out initially because their experiences are so average, like many working class white people. They’ve struggled with poor health, community violence, and predatory online education. Although they’re dedicated parents, they’ve racked up a history with the state. One of them failed to meet an antibiotic payment for their daughter. Several times, they had an anonymous tipster call investigators, who investigated them and cleared them. But each of those cases was recorded. The family is now terrified that the algorithm will label them as a risk to their children: they live in fear that someone will see their daughter outside, pick her up, and say that she can’t live with her parents anymore.

Some of this system’s are ones we would expect as statisticians. The data is limited, it only includes public records, and it doesn’t track whether someone received help from private services. Yet the designers of this system have carried out all of the best practices. The design of this tool was participatory, the researchers have been transparent about everything except the weights of the predictive variables, and the algorithm is publicly owned and controlled through democratic processes.

Virginia closes by asking us: how do we respond to systems that were designed using good practices that nonetheless represent a dangerous risk to working people, systems that police, profile, and punish the poor?

Conversation with Alondra Nelson and Julia Angwin

Julia opens up by mentioning an argument that she often has with her husband, who does work on international development. Many days, he often talks about new kinds of surveillance that could improve the lives of the poor. To serve the poor well, you need to know the data that they need. For example, he used aerial photography to figure out where all the schools in Nigeria there were– which they didn’t exactly know before and which might genuienly support the poor. But at the same time, it’s surveillance.

Accessing public services is incredibly different, says Virginia, and if we can lower the barrier, that’s an incredibly possible thing. Right now, anyone who wants public assistance needs to go to many, many offices and forms. Many public assistance budgets have a line called “diversion” which is money spent by the state to reduce the number of people who access what is theirs by law. While streamlining these systems can be beneficial, people sometimes need to reduce their visibility to these systems in order to survive. When public services integrate, you become hyper-visible, which creates great harm for people, says Virginia. Surveillance systems can involve people in a cycle that can criminalize them very quickly (JNM note: for things like missing an appointment or ticking the wrong box).

Evidence is great, says Virginia, and it can help us find out what works. But evidence can be used to persecute. We should always be thinking about both of those things.

Alondra remarks that Virginia’s book offers a powerful example of how to ask the important questions about algorithms. Wonkish people tend to look more and more closely at the algorithms, when we could also just step back and look at how these systems affect people’s lives.

Julia asks about the idea of “the deserving poor,” where so much technology has been designed to try to make decisions about who is deserving and who isn’t. How can we find a way, she asks, to talk about problems that have collective harms eve when we can’t find the perfect case of injustice? Editors and storytellers often want the “perfect victim” in order to make the story relatable. How do we escape this trap?

Virginia response that people often expect that the welfare system has been designed to ensure that people get the benefits they deserve under the law. In reality, we keep on re-inventing systems that try to decide whether someone’s responsible for their poverty and avoiding supporting them. Two thirds of Americans, says Virginia, will access some kind of means-tested public service in our lifetimes, but many of us fail to admit that we’ve needed these services. And since we don’t admit that we’ve accessed these systems, we never get around to organizing to ensure that we are served effectively.

To create change, says Virginia, we need people to understand how these systems affect us all, organize social movements, and also to re-imagine how we expect technologies to support the poor. She hopes that the book will help people recognize that we have a shared struggle whoever we are.

Julia Angwin asks about movements like the Poor People’s Campaign, the health clinics provided by the Black Panthers, and the work done by the Panthers on sickle cell anemia (see Alondra’s book Body and Soul). Alondra responds that if you’re poor, you’re stigmatized. If you’re black, you’re stigmatized. The category of “deserving poor” does not exist for those who accept the definition. Social movements often offer us meaningful examples, says Alondra, because they look to the future. To make that work, communities need cohorts of experts who support communities and movements to shape their own futures.

Virginia talks about the idea of organizing communities to review, critique, and optimizing how to interact with the forms to maximize access to rights and public services within the law. Virginia mentions the Our Data Bodies project, which talks to marginalized neighborhoods about the collection, storage, and sharing of their data by government. The purpose of the movement is to help people understand what they’re facing, confirming some of their fears, and also helping them manage their “data self defense.” People have brilliant strategies for survival, self-defense, and community defense, says Virginia. The project will be sharing initial results in March.