February 19, 2018

Website operators are in the dark about privacy violations by third-party scripts

by Steven Englehardt, Gunes Acar, and Arvind Narayanan.

Recently we revealed that “session replay” scripts on websites record everything you do, like someone looking over your shoulder, and send it to third-party servers. This en-masse data exfiltration inevitably scoops up sensitive, personal information — in real time, as you type it. We released the data behind our findings, including a list of 8,000 sites on which we observed session-replay scripts recording user data.

As one case study of these 8,000 sites, we found health conditions and prescription data being exfiltrated from walgreens.com. These are considered Protected Health Information under HIPAA. The number of affected sites is immense; contacting all of them and quantifying the severity of the privacy problems is beyond our means. We encourage you to check out our data release and hold your favorite websites accountable.

Student data exfiltration on Gradescope

As one example, a pair of researchers at UC San Diego read our study and then noticed that Gradescope, a website they used for grading assignments, embeds FullStory, one of the session replay scripts we analyzed. We investigated, and sure enough, we found that student names and emails, student grades, and instructor comments on students were being sent to FullStory’s servers. This is considered Student Data under FERPA (US educational privacy law). Ironically, Princeton’s own Information Security course was also affected. We notified Gradescope of our findings, and they removed FullStory from their website within a few hours.

You might wonder how the companies’ privacy policies square with our finding. As best as we can tell, Gradescope’s Terms of Service actually permit this data exfiltration [1], which is a telling comment about the ineffectiveness of Terms of Service as a way of regulating privacy.

FullStory’s Terms are a different matter, and include a clause stating: “Customer agrees that it will not provide any Sensitive Data to FullStory.” We argued previously that this repudiation of responsibility by session-replay scripts puts website operators in an impossible position, because preventing data leaks might require re-engineering the site substantially, negating the core value proposition of these services, which is drag-and-drop deployment. Interestingly, Gradescope’s CEO told us that they were not aware of this requirement in FullStory’s Terms, that the clause had not existed when they first signed up for FullStory, and that they (Gradescope) had not been notified when the Terms changed. [2]

Web publishers kept in the dark

Of the four websites we highlighted in our previous post and this one (Bonobos, Walgreens, Lenovo, and Gradescope), three have removed the third-party scripts in question (all except Lenovo). As far as we can tell, no publisher (website operator) was aware of the exfiltration of sensitive data on their own sites until our study. Further, as mentioned above, Gradescope was unaware of key provisions in FullStory’s Terms of Service. This is a pattern we’ve noticed over and over again in our six years of doing web privacy research.

Worse, in many cases the publisher has no direct relationship with the offending third-party script. In Part 2 of our study we examined two third-party scripts which exploit a vulnerability in browsers’ built-in password managers to exfiltrate user identities. One web developer was unable to determine how the script was loaded and asked us for help. We pointed out that their site loaded an ad network (media-clic.com), which in turn loaded “themoneytizer.com”, which finally loaded the offending script from Audience Insights. These chains of redirects are ubiquitous on the web, and might involve half a dozen third parties. On some websites the majority of third parties have no direct relationship with the publisher.

Most of the advertising and analytics industry is premised on keeping not just users but also website operators in the dark about privacy violations. Indeed, the effort required by website operators to fully audit third parties would negate much of the benefit of offloading tasks to them. The ad tech industry creates a tremendous negative externality in terms of the privacy cost to users.

Can we turn the tables?

The silver lining is that if we can explain to web developers what third parties are doing on their sites, and empower them to take control, that might be one of the most effective ways to improve web privacy. But any such endeavor should keep in mind that web publishers everywhere are on tight budgets and may not have much privacy expertise.

To make things concrete, here’s a proposal for how to achieve this kind of impact:

  • Create a 1-pager summarizing the bare minimum that website operators need to know about web security, privacy, and third parties, with pointers to more information.
  • Create a tailored privacy report for each website based on data that is already publicly available through various sources including our own data releases.
  • Build open-source tools for website operators to scan their own sites [3]. Ideally, the tool should make recommendations for privacy-protecting changes based on the known behavior of third parties.
  • Reach out to website operators to provide information and help make changes. This step doesn’t scale, but is crucial.

If you’re interested in working with us on this, we’d love to hear from you!

Endnotes

We are grateful to UCSD researchers Dimitar Bounov and Sorin Lerner for bringing the vulnerabilities on Gradescope.com to our attention.

[1] Gradescope’s terms of use state: “By submitting Student Data to Gradescope, you consent to allow Gradescope to provide access to Student Data to its employees and to certain third party service providers which have a legitimate need to access such information in connection with their responsibilities in providing the Service.”

[2] The Wayback Machine does not archive FullStory’s Terms page far enough back in time for us to independently verify Gradescope’s statement, nor does FullStory appear in ToSBack, the EFF’s terms-of-service tracker.

[3] Privacyscore.org is one example of a nascent attempt at such a tool.

How the Contextual Integrity Framework Helps Explain Children’s Understanding of Privacy and Security Online

This post discusses a new paper that will be presented at the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW). I wrote this paper with co-authors Shalmali Naik, Utkarsha Devkar, Marshini Chetty, Tammy Clegg, and Jessica Vitak.

Watching YouTube during breakfast. Playing Animal Jam after school. Asking Google about snakes. Checking points on Class Dojo. Posting a lip-synching video on Musical.ly. These online activities are interspersed in the daily lives of today’s children. They also involve logging into an account, disclosing information, or exchanging messages with others—actions that can raise privacy and security concerns.

How do elementary school-age children conceptualize privacy and security online? What strategies do they and their parents use to help address such concerns? In interviews with 18 families, we found that children ages 5-11 understand some aspects of how privacy and security apply to online activities. And while children look to their parents for support, parents feel that privacy and security are largely a concern for the future, when their children are older, have their own smartphones, and spend more time on activities like social media. (For a summary of the paper, see this Princeton HCI post.)

Privacy scholar Helen Nissenbaum’s contextual integrity framework was developed to help identify what privacy concerns emerge through the use of new technology and what types of solutions can address those concerns. We found that the framework is also useful to explain what children know (and don’t know) about privacy online and what types of educational materials can enhance that knowledge.

What is contextual integrity? The contextual integrity framework considers privacy from the perspective of how information flows. People expect information to flow in a certain way in a given situation. When it does not, privacy concerns may arise. For example, the norms of a parent-teacher conference dictate that a teacher can reveal information about the parent’s child to the parent, but not about other children. Four parameters influence these norms:

  • Context: This relates to the backdrop against which a given situation occurs.  A parent-teacher conference occurs within an educational context.
  • Attributes: This refers to the types of information involved in a particular context. The parent-teacher conference involves information about a child’s academic performance and behavioral patterns, but not necessarily the child’s medical history.
  • Actors: This concerns the parties involved in a given situation. In a parent-teacher conference, the teacher (sender) discloses information about the student (subject) to the parent (recipient).
  • Transmission Principles: This involves constraints that affect the flow of information. For example, information shared during a parent-teacher conference is unidirectional (i.e. teachers don’t share information about their own children with parents) and confidential (i.e. social norms and legal restrictions prevent teachers from sharing such information with the entire school).

How does the contextual integrity framework help us understand what children know about privacy and security online? In our interviews, we found that children largely understood how attributes and actors could affect privacy and security online. They knew that certain types of information, such as a password, deserved more protection than others. They also recognized that it was more appropriate to share information with known parties, such as parents and teachers, rather than strangers or unknown people online.

But children under age 10 struggled to grasp how interacting online could violate transmission principles by, for example, enabling unintended actors to see information. Only one child recognized that someone could take information shared in a chat message and repost it elsewhere, potentially spreading it far beyond its intended audience. Children also struggled to understand how the context of a situation could inform decisions about how to appropriately share information. They largely used the heuristic of “Could I get in trouble for this?” to guide behavior.

How do children and parents navigate privacy and security online? While a few children understood that restricting access to information or providing false information online could help them protect their privacy, most relied on their parents for support in navigating potentially concerning situations. Parents primarily used passive strategies to manage their children’s technology use. They maintained a general awareness of what their children were doing, primarily by telling children to use devices only when parents were around. They minimized the chances that their children would download additional apps or spend money by withholding the passwords for app stores.

Most parents felt their children were too young to face privacy or security risks online. But elementary school-age children already engage in a variety of activities online, and our results show they can absorb lessons related to privacy and security. Childrens’ willingness to rely on parents suggests that parents have an opportunity to usher their children’s knowledge to the next level. And parents may have an easier time doing so before their children reach adolescence and lose interest in listening to parents.

How can the contextual integrity framework inform children’s learning about privacy and security online? The contextual integrity framework can guide the development of relevant materials that parents and others can use to scaffold their children’s learning. For example, the development of a child-friendly ad blocker could help show children that other actors, such as companies and trackers, can “see” what people do online. Videos or games that explain, in an age-appropriate manner, how the Internet works, can help children understand how the Internet can challenge transmission principles such as confidentiality. Integrating privacy and security-related lessons into apps and websites that children already use can help refine their understanding of how contexts and norms shape decisions to disclose information. For example, the website for the public broadcasting channel PBS Kids instructs children to avoid using personal information, such as their last name or address, in a username.

As the boundaries between offline and online life continue to fade, privacy and security knowledge remains critical for people of all ages. Theoretical frameworks like contextual integrity help us understand how to to evaluate and enhance that knowledge.

For more information, read the full paper.