February 19, 2018

Archives for 2018

How Data Science and Open Science are Transforming Research Ethics: Edward Freeland at CITP

How are data science and  open science movement transforming how researchers manage research ethics? And how are these changes influencing public trust in social research?

 

I’m here at the Center for IT Policy to hear a talk by Edward P. Freeland. Edward is the associate director of the Princeton University Survey Research Center and a lecturer at the Woodrow Wilson School of Public and International Affairs. Edward has been a member of Princeton’s Institutional Review Board since 2005 and currently serves as chair.

Edward starts out by telling us about about his family’s annual Christmas card. Every year, his family loses track of a few people, and he ends up having to try to track someone down. For several years, they sent the postcard to Ed’s wife’s cousin Billy to someone in Hartford CT, but it turns out that the address was not their cousin Billy but a retired neurosurgeon. To resolve this problem this year, Edward and his wife filled out more information about their family members into an app. Along the way, he learned just how much information about people is available on the internet. While technology makes it possible to keep track of family members more easily, some of that data might be more than people want to be known.

How does this relate to research ethics? Edward tells us about the principles that currently shape research ethics in the United States. These principles come from the 1978 Belmont Report, which was prompted in party by the Tuskeegee Syphilis Study, a horrifying medical study that ran for forty years. In the US, universities now have to do research focused on respect for persons, beneficence, and justice.

In practice, what do university ethics boards (IRBs) care about? Edward and his colleagues compiled a list of the issues that ethics boards into a single slide:

When it comes to privacy, what to university ethics boards care about? Federal regulations focus on any disclosure of the human subjects’ responses outside of the research and the risk that it would expose people to. In practice, the ethics board expects researchers to adopt procedural safeguards around who can access data and how it’s protected.

In the past, studies would basically conclude after the researchers publish the research. But the practice of research has been changing. Advocates of open science have worked to reduce fraud, prevent burying of unexpected results, enhance funder/taxpayer impact, strengthen, the integrity of scientific work, work through crowdsourcing or citizen science, and collaborate in new ways. Edward tells about the Open Science Collaboration, which tried in 2015 to replicate a hundred studies from across psychology, and who often failed to do so. Now others are trying to ask similar questions across other fields including cancer research.

In just a few years, the Center for Open Science has supported many researchers and journals to pre-register and publish the details of their research. Other organizations are also developing similar initiatives, such as clinicaltrials.gov.

Many in the open science movement suggest that researchers archive and share data, even after submitting a manuscript. Some people use a data sharing agreement to protect data used by others. Others prepare datafiles from their research for public use. But publishing data introduces privacy risks for participants in research. While US legislation HIPAA covers medical data, there aren’t authoritative norms or guidelines around sharing that data.

Many people turn to anonymization as a way to protect the information of people who participate in research. But does it really work? The landscape of data re-identification is changing from year to year, but the consensus is that anonymization doesn’t tend to work. As Matt Salganik points out in his book Bit By Bit, we should assume that all data are potentially identifiable and potentially sensitive. Where might we need to be concerned about potential problems?

  • People are sometimes recruited to join survey panels where they answer many questions over the years. Because this data is highly-dimensional, it may be very easy to re-identify people
  • Distributed anonymous workforces like Amazon Mechanical Turk also represent a privacy risk. The ID codes aren’t anonymous: you can google people’s IDs and find people’s comments on various Amazon products
  • Re-identification attacks, which draw together data from many sources to find someone, are becoming more common

Public Confidence in Science

How we treat people’s data affects public confidence in science– not only how people interpret what we learn, but also people’s likelihood to participate in research. Edward tells us that survey response rates have been dropping, even when surveys are conducted by the government. American society has always had a fringe movement of people who resisted government data collection. If those people gain access to the levers of power, they may be able to influence the government’s likelihood to collect data that could inform the public on important issues.

Edward tells us that very few people expect their data to be kept private and secure, according to research by Pew. When combined with declining trust in institutions, concerns about privacy may be one reason that fewer people are responding to surveys.

At the same time, many people are organizing to try to resist surveying by the US government. Some political and activist groups have been filming their interactions with survey collectors, harassing them, and claiming that researchers or the government have secret. As researchers try to uphold public trust by doing trustworthy, beneficial research, we need to be aware of the social and political forces that influence how people think about research.

Why Everyone in Tech Should Visit the American Museum of Tort Law

This Monday, Nikki Bourassa and I organized a van from Harvard’s Berkman Klein Center for Internet and Society to visit the American Museum of Tort Law, which I have decided to call the American Museum of Exploding Cars and Toys that Kill You.

While at the museum, I came to see another way that research can inform democratic processes for public safety: through its role in court cases.

I think everyone in tech should visit this museum, especially if you’re designing something that becomes part of people’s lives. The stories are organized to help you learn about US law while also thinking about what it means to responsible for the risks that a product introduces to society. You can read the full post on Medium:

Workshop on Technical Applications of Contextual Integrity

The theory of contextual integrity (CI) has inspired work across the legal, privacy, computer science and HCI research communities.  Recognizing common interests and common challenges, the time seemed ripe for a meeting to discuss what we have learned from the projects using CI and how to move forward to leverage CI for enhancing privacy preserving systems and policies. On 11 December, 2017  the Center for Information Technology Policy hosted an inaugural workshop on Technical Applications of Contextual Integrity. The workshop gathered over twenty researchers from Princeton University, New York University, Cornell Tech, University of Maryland, Data & Society, and AI Now to present their ongoing and completed projects, discuss and share ideas, and explore successes and challenges when using the CI framework. The meeting, which included faculty, postdocs, and graduate students, was kicked off with a welcome and introduction by Ed Felten, CITP Director.

The agenda comprised of two main parts. In the first half of the workshop, representatives of various projects gave a short presentation on the status of their work, describe any challenges encountered, and lessons learned in the process. The second half included a planning session of a full day event to take place in the Spring to allow for a bigger discussion and exchange of ideas.

The workshop presentations touched on a wide variety of topics which included: ways operationalizing CI, discovering contextual norms behind children’s online activities, capturing users’ expectation towards smart toys and smart-home devices, as well as demonstrating how CI can be used to analyze regulation acts, applying CI to establish research ethics guidelines, conceptualizing privacy within common government arrangement.

More specifically:

Yan Shvartzshnaider discussed Verifiable and ACtionable Contextual Integrity Norms Engine (VACCINE), a framework for building adaptable and modular Data Leakage Prevention (DLP) systems.

Darakshan Mir discussed a framework for community-based participatory framework for discovery of contextual informational norms in small and veranubale communities.

Sebastian Benthall shared the key takeaways from conducting a survey on existing computer science literature work that uses Contextual Integrity.

Paula Kift discussed how the theory of contextual Integrity can be used to analyze the recently passed Cybersecurity Information Sharing Act (CISA) to reveals some fundamental gaps in the way it conceptualizes privacy.

Ben Zevenbergen talked about his work on applying the theory of contextual integrity to help establish guidelines for Research Ethics.

Madelyn Sanfilippo discussed conceptualizing privacy within a commons governance arrangement using Governing Knowledge Commons (GKC) framework.

Priya Kumar presented recent work on using the Contextual Integrity to identify gaps in children’s online privacy knowledge.

Sarah Varghese and Noah Apthorpe discussed their works on discovering privacy norms in IoT Devices using Contextual Integrity.

The roundtable discussion covered a wide range of open questions such as what are the limitations of CI as a theory, possible extensions, integration into other frameworks, conflicting interpretations of the CI parameters, possible research directions, and interesting collaboration ideas.

This a first attempt to see how much interest there is from the wider research community in a CI-focused event. We were overwhelmed with the incredible response! The participants expressed huge interest in the bigger event in Spring 2018 and put forward a number of suggestions for the format of the workshop.  The initial idea is to organize the bigger workshop as a co-joint event with an established conference, another suggestion was to have it as part of a hands-on workshop that brings together industry and academia. We are really excited about the event that will bring together a large sample of CI-related research work both academically and geographically which will allow a much broader discussion. 

The ultimate goal of this and other future initiatives is to foster communication between the various communities of researchers and practitioners using the theory of CI as a framework to reason about privacy and a language for sharing of ideas.

For the meantime, please check out the http://privaci.info website that will serve as a central repository for news, up to date related work for the community. We will be updating it in coming months.

We look forward to your feedback and suggestions. If you’re interested in hearing about the Spring workshop or presenting your work, want to help or have any suggestion please get in touch!

Twitter: @privaci_way

Email: