February 21, 2018

Archives for January 2018

Workshop on Technical Applications of Contextual Integrity

The theory of contextual integrity (CI) has inspired work across the legal, privacy, computer science and HCI research communities.  Recognizing common interests and common challenges, the time seemed ripe for a meeting to discuss what we have learned from the projects using CI and how to move forward to leverage CI for enhancing privacy preserving systems and policies. On 11 December, 2017  the Center for Information Technology Policy hosted an inaugural workshop on Technical Applications of Contextual Integrity. The workshop gathered over twenty researchers from Princeton University, New York University, Cornell Tech, University of Maryland, Data & Society, and AI Now to present their ongoing and completed projects, discuss and share ideas, and explore successes and challenges when using the CI framework. The meeting, which included faculty, postdocs, and graduate students, was kicked off with a welcome and introduction by Ed Felten, CITP Director.

The agenda comprised of two main parts. In the first half of the workshop, representatives of various projects gave a short presentation on the status of their work, describe any challenges encountered, and lessons learned in the process. The second half included a planning session of a full day event to take place in the Spring to allow for a bigger discussion and exchange of ideas.

The workshop presentations touched on a wide variety of topics which included: ways operationalizing CI, discovering contextual norms behind children’s online activities, capturing users’ expectation towards smart toys and smart-home devices, as well as demonstrating how CI can be used to analyze regulation acts, applying CI to establish research ethics guidelines, conceptualizing privacy within common government arrangement.

More specifically:

Yan Shvartzshnaider discussed Verifiable and ACtionable Contextual Integrity Norms Engine (VACCINE), a framework for building adaptable and modular Data Leakage Prevention (DLP) systems.

Darakshan Mir discussed a framework for community-based participatory framework for discovery of contextual informational norms in small and veranubale communities.

Sebastian Benthall shared the key takeaways from conducting a survey on existing computer science literature work that uses Contextual Integrity.

Paula Kift discussed how the theory of contextual Integrity can be used to analyze the recently passed Cybersecurity Information Sharing Act (CISA) to reveals some fundamental gaps in the way it conceptualizes privacy.

Ben Zevenbergen talked about his work on applying the theory of contextual integrity to help establish guidelines for Research Ethics.

Madelyn Sanfilippo discussed conceptualizing privacy within a commons governance arrangement using Governing Knowledge Commons (GKC) framework.

Priya Kumar presented recent work on using the Contextual Integrity to identify gaps in children’s online privacy knowledge.

Sarah Varghese and Noah Apthorpe discussed their works on discovering privacy norms in IoT Devices using Contextual Integrity.

The roundtable discussion covered a wide range of open questions such as what are the limitations of CI as a theory, possible extensions, integration into other frameworks, conflicting interpretations of the CI parameters, possible research directions, and interesting collaboration ideas.

This a first attempt to see how much interest there is from the wider research community in a CI-focused event. We were overwhelmed with the incredible response! The participants expressed huge interest in the bigger event in Spring 2018 and put forward a number of suggestions for the format of the workshop.  The initial idea is to organize the bigger workshop as a co-joint event with an established conference, another suggestion was to have it as part of a hands-on workshop that brings together industry and academia. We are really excited about the event that will bring together a large sample of CI-related research work both academically and geographically which will allow a much broader discussion. 

The ultimate goal of this and other future initiatives is to foster communication between the various communities of researchers and practitioners using the theory of CI as a framework to reason about privacy and a language for sharing of ideas.

For the meantime, please check out the http://privaci.info website that will serve as a central repository for news, up to date related work for the community. We will be updating it in coming months.

We look forward to your feedback and suggestions. If you’re interested in hearing about the Spring workshop or presenting your work, want to help or have any suggestion please get in touch!

Twitter: @privaci_way


Automating Inequality: Virginia Eubanks Book Launch at Data & Society

What does it mean for public sector actors to implement algorithms to make public services to be more efficient? How are these systems experienced by the families and people who face the consequences?

Speaking at the Data and Society Institute today is Virginia Eubanks, author of the new book Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. Virginia Eubanks is an Associate Professor of Political Science at the University at Albany, SUNY.  Virginia is currently a founding member of the Our Data Bodies Project and a Fellow at New America. For two decades, Eubanks has worked in community technology and economic justice movements. I first met Virginia as a PHD student at the MIT Center for Civic Media, where her book Digital Dead End helped me think about the challenges of genuine empowerment through technology, and I’ve been eagerly awaiting this latest book.

Today at the Data & Society Institute, Virginia was interviewed by Alondra Nelson, president of the Social Science Research Council, and Julia Angwin, an investigative journalist at ProPublica (watch the video here).

We live in a new regime of data analytics, Virginia reminds us: a lot of really great work is thinking deeply about data-based discrimination and the role that data plays to challenge the inequities of our lives or make them worse. To study this transformation, Virginia grounds her work in history and in their political contexts. These systems didn’t fall from the sky or land on us with a blank slate. Virginia also starts by talking with people who are the targets of these systems: primarily poor and working class families across the color line– people who are often left out of the conversation. Over the last year, this has involved talking with over a hundred people across the US.

Origins of the Digital Poorhouse

How did we get from the county poorhouse of 1819 to today’s digital poorhouse? Virginia tells us about one moment in this history: the rise of this digital poorhouse. Previously, she had expected that digitization had started in the 1990s, but she realized that digital welfare records actually started in the 1960s and 1970s with the National Welfare Rights Movement. This highly successful movement, which had origins in the civil rights movement, was successful at establishing that poor Americans should enjoy the full array of constitutional rights.

In the 60s and 70s, the welfare rights movement changed policies like “man-in-house” laws, “suitable home rules,” residency restrictions, and “employable mother” says Virginia. These laws expected mothers to work and barred them from public services. In the first time in history, she argues, she expanded the rights of middle class people to poor and working class people, including unmarried moms and women of color.

Even as the Welfare Rights Movement became successful, the backlash against this movement coincided with a recession. Because it had become legally impossible to discriminate against people, and administrators became caught behind a rock and a hard place. They solved this problem, says Eubanks, by commissioning a massive set of digital technologies in the late 60s and early 70s. Almost immediately, we see a drop in the ability of people to access entitlements which they were due. This happened well in advance of Reagan’s “welfare queen” speech and subsequent legislation to reduce access to welfare.

This effort to manage and reduce access to public services continues today, says Virginia. The state of Indiana entered into a $1.16 billion contract with IBM and ICS to automate welfare eligibility contracts. This program used computers and call-centers to manage a queue of tasks, replacing the previous case-based, family-based system focused on supporting people. This system broke relationships between case workers and people seeking access to public assistance. It’s the most straightforward disaster: a million applications were denied in the first three years, a fifty percent reduction in denials. The state broke the contract because the system went so badly, and the case is ongoing.

How Computerized Public Services Put People at Risk

How did people experience this system? Virginia tells us about Omega Young, who missed a chance to apply for medicaid because she was in the hospital being treated for cancer. Because she was in the hospital, this mother struggled to meet the system’s requirements. She called the help center to let them know she was hospitalized, and her food stamps and medical assistance was cut off for “failure to cooperate.” Most people lost their assistance for this reason, often because they missed an appointment or missed a signature on a hundred-page form. The day after Omega died, she won a case

Next, Virginia talks about the Allegheny Family Screening tool, a model used to predict children who might be victims of abuse and neglect in the future (Virginia has written about it here, the New York Times also recently published an account). In 1999, the city of Pittsburgh commissioned a data warehouse that collects data from everything from the police department to public services. In 2012, the office released a request for proposals funded by foundations, asking people to propose ways to mine the data to support public services. The grant went to a team of researchers who use statistical regression models to predict what children are going to face neglect, using 130 indicators to predict treatment.

In the Pittsburgh area, when a call comes into the hotline for child abuse and neglect, the intake workers will interview the caller and make a decision based on two factors: the risk of the allegation (is it child abuse or neglect?) and how safe they feel that child is. Once they make those two decisions, they run the allegheny valley screening tool, which offers a thermometer from 1 to 20 predicting the level of risk. Based on those factors, the intake manager makes a decision about whether to screen that family through the county’s equivalent of child protective services.

Virginia shares voices from people who were targets of this system. She talks about Angel and Patrick, who she meet at a family center. They didn’t stand out initially because their experiences are so average, like many working class white people. They’ve struggled with poor health, community violence, and predatory online education. Although they’re dedicated parents, they’ve racked up a history with the state. One of them failed to meet an antibiotic payment for their daughter. Several times, they had an anonymous tipster call investigators, who investigated them and cleared them. But each of those cases was recorded. The family is now terrified that the algorithm will label them as a risk to their children: they live in fear that someone will see their daughter outside, pick her up, and say that she can’t live with her parents anymore.

Some of this system’s are ones we would expect as statisticians. The data is limited, it only includes public records, and it doesn’t track whether someone received help from private services. Yet the designers of this system have carried out all of the best practices. The design of this tool was participatory, the researchers have been transparent about everything except the weights of the predictive variables, and the algorithm is publicly owned and controlled through democratic processes.

Virginia closes by asking us: how do we respond to systems that were designed using good practices that nonetheless represent a dangerous risk to working people, systems that police, profile, and punish the poor?

Conversation with Alondra Nelson and Julia Angwin

Julia opens up by mentioning an argument that she often has with her husband, who does work on international development. Many days, he often talks about new kinds of surveillance that could improve the lives of the poor. To serve the poor well, you need to know the data that they need. For example, he used aerial photography to figure out where all the schools in Nigeria there were– which they didn’t exactly know before and which might genuienly support the poor. But at the same time, it’s surveillance.

Accessing public services is incredibly different, says Virginia, and if we can lower the barrier, that’s an incredibly possible thing. Right now, anyone who wants public assistance needs to go to many, many offices and forms. Many public assistance budgets have a line called “diversion” which is money spent by the state to reduce the number of people who access what is theirs by law. While streamlining these systems can be beneficial, people sometimes need to reduce their visibility to these systems in order to survive. When public services integrate, you become hyper-visible, which creates great harm for people, says Virginia. Surveillance systems can involve people in a cycle that can criminalize them very quickly (JNM note: for things like missing an appointment or ticking the wrong box).

Evidence is great, says Virginia, and it can help us find out what works. But evidence can be used to persecute. We should always be thinking about both of those things.

Alondra remarks that Virginia’s book offers a powerful example of how to ask the important questions about algorithms. Wonkish people tend to look more and more closely at the algorithms, when we could also just step back and look at how these systems affect people’s lives.

Julia asks about the idea of “the deserving poor,” where so much technology has been designed to try to make decisions about who is deserving and who isn’t. How can we find a way, she asks, to talk about problems that have collective harms eve when we can’t find the perfect case of injustice? Editors and storytellers often want the “perfect victim” in order to make the story relatable. How do we escape this trap?

Virginia response that people often expect that the welfare system has been designed to ensure that people get the benefits they deserve under the law. In reality, we keep on re-inventing systems that try to decide whether someone’s responsible for their poverty and avoiding supporting them. Two thirds of Americans, says Virginia, will access some kind of means-tested public service in our lifetimes, but many of us fail to admit that we’ve needed these services. And since we don’t admit that we’ve accessed these systems, we never get around to organizing to ensure that we are served effectively.

To create change, says Virginia, we need people to understand how these systems affect us all, organize social movements, and also to re-imagine how we expect technologies to support the poor. She hopes that the book will help people recognize that we have a shared struggle whoever we are.

Julia Angwin asks about movements like the Poor People’s Campaign, the health clinics provided by the Black Panthers, and the work done by the Panthers on sickle cell anemia (see Alondra’s book Body and Soul). Alondra responds that if you’re poor, you’re stigmatized. If you’re black, you’re stigmatized. The category of “deserving poor” does not exist for those who accept the definition. Social movements often offer us meaningful examples, says Alondra, because they look to the future. To make that work, communities need cohorts of experts who support communities and movements to shape their own futures.

Virginia talks about the idea of organizing communities to review, critique, and optimizing how to interact with the forms to maximize access to rights and public services within the law. Virginia mentions the Our Data Bodies project, which talks to marginalized neighborhoods about the collection, storage, and sharing of their data by government. The purpose of the movement is to help people understand what they’re facing, confirming some of their fears, and also helping them manage their “data self defense.” People have brilliant strategies for survival, self-defense, and community defense, says Virginia. The project will be sharing initial results in March.

Website operators are in the dark about privacy violations by third-party scripts

by Steven Englehardt, Gunes Acar, and Arvind Narayanan.

Recently we revealed that “session replay” scripts on websites record everything you do, like someone looking over your shoulder, and send it to third-party servers. This en-masse data exfiltration inevitably scoops up sensitive, personal information — in real time, as you type it. We released the data behind our findings, including a list of 8,000 sites on which we observed session-replay scripts recording user data.

As one case study of these 8,000 sites, we found health conditions and prescription data being exfiltrated from walgreens.com. These are considered Protected Health Information under HIPAA. The number of affected sites is immense; contacting all of them and quantifying the severity of the privacy problems is beyond our means. We encourage you to check out our data release and hold your favorite websites accountable.

Student data exfiltration on Gradescope

As one example, a pair of researchers at UC San Diego read our study and then noticed that Gradescope, a website they used for grading assignments, embeds FullStory, one of the session replay scripts we analyzed. We investigated, and sure enough, we found that student names and emails, student grades, and instructor comments on students were being sent to FullStory’s servers. This is considered Student Data under FERPA (US educational privacy law). Ironically, Princeton’s own Information Security course was also affected. We notified Gradescope of our findings, and they removed FullStory from their website within a few hours.
[Read more…]