January 21, 2018

Archives for January 2018

Automating Inequality: Virginia Eubanks Book Launch at Data & Society

What does it mean for public sector actors to implement algorithms to make public services to be more efficient? How are these systems experienced by the families and people who face the consequences?

Speaking at the Data and Society Institute today is Virginia Eubanks, author of the new book Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. Virginia Eubanks is an Associate Professor of Political Science at the University at Albany, SUNY.  Virginia is currently a founding member of the Our Data Bodies Project and a Fellow at New America. For two decades, Eubanks has worked in community technology and economic justice movements. I first met Virginia as a PHD student at the MIT Center for Civic Media, where her book Digital Dead End helped me think about the challenges of genuine empowerment through technology, and I’ve been eagerly awaiting this latest book.

Today at the Data & Society Institute, Virginia was interviewed by Alondra Nelson, president of the Social Science Research Council, and Julia Angwin, an investigative journalist at ProPublica (watch the video here).

We live in a new regime of data analytics, Virginia reminds us: a lot of really great work is thinking deeply about data-based discrimination and the role that data plays to challenge the inequities of our lives or make them worse. To study this transformation, Virginia grounds her work in history and in their political contexts. These systems didn’t fall from the sky or land on us with a blank slate. Virginia also starts by talking with people who are the targets of these systems: primarily poor and working class families across the color line– people who are often left out of the conversation. Over the last year, this has involved talking with over a hundred people across the US.

Origins of the Digital Poorhouse

How did we get from the county poorhouse of 1819 to today’s digital poorhouse? Virginia tells us about one moment in this history: the rise of this digital poorhouse. Previously, she had expected that digitization had started in the 1990s, but she realized that digital welfare records actually started in the 1960s and 1970s with the National Welfare Rights Movement. This highly successful movement, which had origins in the civil rights movement, was successful at establishing that poor Americans should enjoy the full array of constitutional rights.

In the 60s and 70s, the welfare rights movement changed policies like “man-in-house” laws, “suitable home rules,” residency restrictions, and “employable mother” says Virginia. These laws expected mothers to work and barred them from public services. In the first time in history, she argues, she expanded the rights of middle class people to poor and working class people, including unmarried moms and women of color.

Even as the Welfare Rights Movement became successful, the backlash against this movement coincided with a recession. Because it had become legally impossible to discriminate against people, and administrators became caught behind a rock and a hard place. They solved this problem, says Eubanks, by commissioning a massive set of digital technologies in the late 60s and early 70s. Almost immediately, we see a drop in the ability of people to access entitlements which they were due. This happened well in advance of Reagan’s “welfare queen” speech and subsequent legislation to reduce access to welfare.

This effort to manage and reduce access to public services continues today, says Virginia. The state of Indiana entered into a $1.16 billion contract with IBM and ICS to automate welfare eligibility contracts. This program used computers and call-centers to manage a queue of tasks, replacing the previous case-based, family-based system focused on supporting people. This system broke relationships between case workers and people seeking access to public assistance. It’s the most straightforward disaster: a million applications were denied in the first three years, a fifty percent reduction in denials. The state broke the contract because the system went so badly, and the case is ongoing.

How Computerized Public Services Put People at Risk

How did people experience this system? Virginia tells us about Omega Young, who missed a chance to apply for medicaid because she was in the hospital being treated for cancer. Because she was in the hospital, this mother struggled to meet the system’s requirements. She called the help center to let them know she was hospitalized, and her food stamps and medical assistance was cut off for “failure to cooperate.” Most people lost their assistance for this reason, often because they missed an appointment or missed a signature on a hundred-page form. The day after Omega died, she won a case

Next, Virginia talks about the Allegheny Family Screening tool, a model used to predict children who might be victims of abuse and neglect in the future (Virginia has written about it here, the New York Times also recently published an account). In 1999, the city of Pittsburgh commissioned a data warehouse that collects data from everything from the police department to public services. In 2012, the office released a request for proposals funded by foundations, asking people to propose ways to mine the data to support public services. The grant went to a team of researchers who use statistical regression models to predict what children are going to face neglect, using 130 indicators to predict treatment.

In the Pittsburgh area, when a call comes into the hotline for child abuse and neglect, the intake workers will interview the caller and make a decision based on two factors: the risk of the allegation (is it child abuse or neglect?) and how safe they feel that child is. Once they make those two decisions, they run the allegheny valley screening tool, which offers a thermometer from 1 to 20 predicting the level of risk. Based on those factors, the intake manager makes a decision about whether to screen that family through the county’s equivalent of child protective services.

Virginia shares voices from people who were targets of this system. She talks about Angel and Patrick, who she meet at a family center. They didn’t stand out initially because their experiences are so average, like many working class white people. They’ve struggled with poor health, community violence, and predatory online education. Although they’re dedicated parents, they’ve racked up a history with the state. One of them failed to meet an antibiotic payment for their daughter. Several times, they had an anonymous tipster call investigators, who investigated them and cleared them. But each of those cases was recorded. The family is now terrified that the algorithm will label them as a risk to their children: they live in fear that someone will see their daughter outside, pick her up, and say that she can’t live with her parents anymore.

Some of this system’s are ones we would expect as statisticians. The data is limited, it only includes public records, and it doesn’t track whether someone received help from private services. Yet the designers of this system have carried out all of the best practices. The design of this tool was participatory, the researchers have been transparent about everything except the weights of the predictive variables, and the algorithm is publicly owned and controlled through democratic processes.

Virginia closes by asking us: how do we respond to systems that were designed using good practices that nonetheless represent a dangerous risk to working people, systems that police, profile, and punish the poor?

Conversation with Alondra Nelson and Julia Angwin

Julia opens up by mentioning an argument that she often has with her husband, who does work on international development. Many days, he often talks about new kinds of surveillance that could improve the lives of the poor. To serve the poor well, you need to know the data that they need. For example, he used aerial photography to figure out where all the schools in Nigeria there were– which they didn’t exactly know before and which might genuienly support the poor. But at the same time, it’s surveillance.

Accessing public services is incredibly different, says Virginia, and if we can lower the barrier, that’s an incredibly possible thing. Right now, anyone who wants public assistance needs to go to many, many offices and forms. Many public assistance budgets have a line called “diversion” which is money spent by the state to reduce the number of people who access what is theirs by law. While streamlining these systems can be beneficial, people sometimes need to reduce their visibility to these systems in order to survive. When public services integrate, you become hyper-visible, which creates great harm for people, says Virginia. Surveillance systems can involve people in a cycle that can criminalize them very quickly (JNM note: for things like missing an appointment or ticking the wrong box).

Evidence is great, says Virginia, and it can help us find out what works. But evidence can be used to persecute. We should always be thinking about both of those things.

Alondra remarks that Virginia’s book offers a powerful example of how to ask the important questions about algorithms. Wonkish people tend to look more and more closely at the algorithms, when we could also just step back and look at how these systems affect people’s lives.

Julia asks about the idea of “the deserving poor,” where so much technology has been designed to try to make decisions about who is deserving and who isn’t. How can we find a way, she asks, to talk about problems that have collective harms eve when we can’t find the perfect case of injustice? Editors and storytellers often want the “perfect victim” in order to make the story relatable. How do we escape this trap?

Virginia response that people often expect that the welfare system has been designed to ensure that people get the benefits they deserve under the law. In reality, we keep on re-inventing systems that try to decide whether someone’s responsible for their poverty and avoiding supporting them. Two thirds of Americans, says Virginia, will access some kind of means-tested public service in our lifetimes, but many of us fail to admit that we’ve needed these services. And since we don’t admit that we’ve accessed these systems, we never get around to organizing to ensure that we are served effectively.

To create change, says Virginia, we need people to understand how these systems affect us all, organize social movements, and also to re-imagine how we expect technologies to support the poor. She hopes that the book will help people recognize that we have a shared struggle whoever we are.

Julia Angwin asks about movements like the Poor People’s Campaign, the health clinics provided by the Black Panthers, and the work done by the Panthers on sickle cell anemia (see Alondra’s book Body and Soul). Alondra responds that if you’re poor, you’re stigmatized. If you’re black, you’re stigmatized. The category of “deserving poor” does not exist for those who accept the definition. Social movements often offer us meaningful examples, says Alondra, because they look to the future. To make that work, communities need cohorts of experts who support communities and movements to shape their own futures.

Virginia talks about the idea of organizing communities to review, critique, and optimizing how to interact with the forms to maximize access to rights and public services within the law. Virginia mentions the Our Data Bodies project, which talks to marginalized neighborhoods about the collection, storage, and sharing of their data by government. The purpose of the movement is to help people understand what they’re facing, confirming some of their fears, and also helping them manage their “data self defense.” People have brilliant strategies for survival, self-defense, and community defense, says Virginia. The project will be sharing initial results in March.

Website operators are in the dark about privacy violations by third-party scripts

by Steven Englehardt, Gunes Acar, and Arvind Narayanan.

Recently we revealed that “session replay” scripts on websites record everything you do, like someone looking over your shoulder, and send it to third-party servers. This en-masse data exfiltration inevitably scoops up sensitive, personal information — in real time, as you type it. We released the data behind our findings, including a list of 8,000 sites on which we observed session-replay scripts recording user data.

As one case study of these 8,000 sites, we found health conditions and prescription data being exfiltrated from walgreens.com. These are considered Protected Health Information under HIPAA. The number of affected sites is immense; contacting all of them and quantifying the severity of the privacy problems is beyond our means. We encourage you to check out our data release and hold your favorite websites accountable.

Student data exfiltration on Gradescope

As one example, a pair of researchers at UC San Diego read our study and then noticed that Gradescope, a website they used for grading assignments, embeds FullStory, one of the session replay scripts we analyzed. We investigated, and sure enough, we found that student names and emails, student grades, and instructor comments on students were being sent to FullStory’s servers. This is considered Student Data under FERPA (US educational privacy law). Ironically, Princeton’s own Information Security course was also affected. We notified Gradescope of our findings, and they removed FullStory from their website within a few hours.

You might wonder how the companies’ privacy policies square with our finding. As best as we can tell, Gradescope’s Terms of Service actually permit this data exfiltration [1], which is a telling comment about the ineffectiveness of Terms of Service as a way of regulating privacy.

FullStory’s Terms are a different matter, and include a clause stating: “Customer agrees that it will not provide any Sensitive Data to FullStory.” We argued previously that this repudiation of responsibility by session-replay scripts puts website operators in an impossible position, because preventing data leaks might require re-engineering the site substantially, negating the core value proposition of these services, which is drag-and-drop deployment. Interestingly, Gradescope’s CEO told us that they were not aware of this requirement in FullStory’s Terms, that the clause had not existed when they first signed up for FullStory, and that they (Gradescope) had not been notified when the Terms changed. [2]

Web publishers kept in the dark

Of the four websites we highlighted in our previous post and this one (Bonobos, Walgreens, Lenovo, and Gradescope), three have removed the third-party scripts in question (all except Lenovo). As far as we can tell, no publisher (website operator) was aware of the exfiltration of sensitive data on their own sites until our study. Further, as mentioned above, Gradescope was unaware of key provisions in FullStory’s Terms of Service. This is a pattern we’ve noticed over and over again in our six years of doing web privacy research.

Worse, in many cases the publisher has no direct relationship with the offending third-party script. In Part 2 of our study we examined two third-party scripts which exploit a vulnerability in browsers’ built-in password managers to exfiltrate user identities. One web developer was unable to determine how the script was loaded and asked us for help. We pointed out that their site loaded an ad network (media-clic.com), which in turn loaded “themoneytizer.com”, which finally loaded the offending script from Audience Insights. These chains of redirects are ubiquitous on the web, and might involve half a dozen third parties. On some websites the majority of third parties have no direct relationship with the publisher.

Most of the advertising and analytics industry is premised on keeping not just users but also website operators in the dark about privacy violations. Indeed, the effort required by website operators to fully audit third parties would negate much of the benefit of offloading tasks to them. The ad tech industry creates a tremendous negative externality in terms of the privacy cost to users.

Can we turn the tables?

The silver lining is that if we can explain to web developers what third parties are doing on their sites, and empower them to take control, that might be one of the most effective ways to improve web privacy. But any such endeavor should keep in mind that web publishers everywhere are on tight budgets and may not have much privacy expertise.

To make things concrete, here’s a proposal for how to achieve this kind of impact:

  • Create a 1-pager summarizing the bare minimum that website operators need to know about web security, privacy, and third parties, with pointers to more information.
  • Create a tailored privacy report for each website based on data that is already publicly available through various sources including our own data releases.
  • Build open-source tools for website operators to scan their own sites [3]. Ideally, the tool should make recommendations for privacy-protecting changes based on the known behavior of third parties.
  • Reach out to website operators to provide information and help make changes. This step doesn’t scale, but is crucial.

If you’re interested in working with us on this, we’d love to hear from you!

Endnotes

We are grateful to UCSD researchers Dimitar Bounov and Sorin Lerner for bringing the vulnerabilities on Gradescope.com to our attention.

[1] Gradescope’s terms of use state: “By submitting Student Data to Gradescope, you consent to allow Gradescope to provide access to Student Data to its employees and to certain third party service providers which have a legitimate need to access such information in connection with their responsibilities in providing the Service.”

[2] The Wayback Machine does not archive FullStory’s Terms page far enough back in time for us to independently verify Gradescope’s statement, nor does FullStory appear in ToSBack, the EFF’s terms-of-service tracker.

[3] Privacyscore.org is one example of a nascent attempt at such a tool.

Roundup: My First Semester as a Post-Doc at Princeton

As Princeton thaws from under last week’s snow hurricane, I’m taking a moment to reflect on my first four months in the place I now call home.

This roundup post shares highlights from my first semester as a post-doc in Psychology, CITP, and Sociology.

Here in Princeton, I’m surviving winter in the best way I know how 🙂

So far, I have had an amazing experience:

  • The Paluck Lab (Psychology) and the Center for IT Policy, my main anchor points at Princeton, have been welcoming and supportive. When colleagues from both departments showed up at my IgNobel Prize viewing party in my first month, I knew I had found a good home <grin>
  • The Paluck Lab have become a wonderful research family, and they even did the LEGO duck challenge together with me!
    • Weekly lab meetings with the Paluck Lab have been a master-class in thinking about the relationship between research design and theory in the social sciences. I am so grateful to observe and participate in these conversations, since so much about research is unspoken, tacit knowledge.
    • With the help of my new colleagues, I’ve started to learn how to write papers for general science journals. I’ve also learned more about publishing in the field of psychology.
  • At CITP, I’ve learned much about thinking simultaneously as a regulator and computer scientist.
  • I’ve loved the conversations at the Kahneman-Treisman Center for Behavioral Policy, where I am now an affiliated postdoc
  • I’m looking forward to meeting more of my colleagues in Sociology this spring, now that I’ll be physically based in Princeton more consistently

Travel and Speaking

 

View of the French Alps from Lausanne

 

I’m so glad that I can scale down my travel this spring, phew!

A flock of birds takes flight in Antigua, Guatemala

 

Writing and Research

Princeton Life

Rockefeller College Cloisters, Princeton. On evenings when I have dinner here, I walk through these cloisters on my way hime.