December 15, 2017

No boundaries: Exfiltration of personal data by session-replay scripts

This is the first post in our “No Boundaries” series, in which we reveal how third-party scripts on websites have been extracting personal information in increasingly intrusive ways. [0]
by Steven Englehardt, Gunes Acar, and Arvind Narayanan

Update: we’ve released our data — the list of sites with session-replay scripts, and the sites where we’ve confirmed recording by third parties.

You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make.  But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.

[Read more…]

Getting serious about research ethics: Security and Internet Measurement

[This blog post is a continuation of our series about research ethics in computer science that we started last week]

Research projects in the information security and Internet measurement sub-disciplines typically interact with third-party systems or devices to collect a large amounts of data. Scholars engaging in these fields are interested to collect data about technical phenomenon. As a result of the widespread use of the Internet, their experiments can interfere with human use of devices and reveal all sorts of private information, such as their browsing behaviour. As awareness of the unintended impact on Internet users grew, these communities have spent considerable time debating their ethical standards at conferences, dedicated workshops, and in journal publications. Their efforts have culminated in guidelines for topics such as vulnerability disclosure or privacy, whereby the aim is to protect unsuspecting Internet users and human implicated in technical research.

 

Prof. Nick Feamster, Prof. Prateek Mittal, moderator Prof. Elana Zeide, and I discussed some important considerations for research ethics in a panel dedicated to these sub-disciplines at the recent CITP conference on research ethics in computer science communities. We started by explaining that gathering empirical data is crucial to infer the state of values such as privacy and trust in communication systems. However, as methodological choices in computer science will often have ethical impacts, researchers need to be empowered to reflect on their experimental setup meaningfully.

 

Prof. Feamster discussed several cases where he had sought advice from ethical oversight bodies, but was left with unsatisfying guidance. For example, when his team conducted Internet censorship measurements (pdf), they were aware that they were initiating requests and creating data flows from devices owned by unsuspecting Internet users. These new information flows were created in realms where adversaries were also operating, for example in the form of a government censors. This may pose a risk to the owners of devices that were implicated in the experimentation and data collection. The ethics board, however, concluded that such measurements did not meet the strict definition of “human subjects research”, which thereby excluded the need for formal review. Prof. Feamster suggests computer scientists reassess how they think about their technologies or newly initiated data flows that can be misused by adversaries, and take that into account in ethical review procedures.

 

Ethical tensions and dilemmas in technical Internet research could be seen as interesting research problems for scholars, argued Prof. Mittal. For example, to reason about privacy and trust in the anonymous Tor network, researchers need to understand to what extent adversaries can exploit vulnerabilities and thus observe Internet traffic of individual users. The obvious, relatively easy, and ethically dubious measurement would be to attack existing Tor nodes and attempt to collect real-time traffic of identifiable users. However, Prof. Mittal gave an insight into his own critical engagement with alternative design choices, which led his team to create a new node within Princeton’s university network that they subsequently attacked. This more lab-based approach eliminates risks for unsuspecting Internet users, but allowed for the same inferences to be done.

 

I concluded the panel, suggesting that ethics review boards at universities, academic conferences, and scholarly journals engage actively with computer scientists to collect valuable data whilst respecting human values. Currently, a panel on non-experts in either computer science or research ethics are given a single moment to judge the full methodology of a research proposal or the resulting paper. When a thumbs-down is issued, researchers have no or limited opportunity to remedy their ethical shortcomings. I argued that a better approach would be an iterative process with in-person meetings and more in-depth consideration of design alternatives, as demonstrated in a recent paper about Advertising as a Platform for Internet measurements (pdf). This is the approach advocates in the Networked Systems Ethics Guidelines. Cross-disciplinary conversation, rather than one-time decisions, allow for a mutual understanding between the gatekeepers of ethical standards and designers of useful computer science research.

 

See the video of the panel here.

Help us improve the usability of Tor and onion services!

Update 2017-09-11: We have collected several hundred responses, so we are now closing the survey to begin data analysis. Thanks for your help!

We are looking for volunteers for a study to improve the usability of Tor and onion services, but first some background: The Tor network is primarily known for client anonymity, that is, users can download Tor Browser and browse the Internet anonymously. A slightly lesser-known feature of Tor is server anonymity. It is possible to set up web servers—and other TCP-based services—whose IP address is hidden by the Tor network. We call these “hidden” servers onion services. Several well-known web sites such as Facebook, DuckDuckGo, and ProPublica have started to offer onion services in addition to their normal web sites.

Onion services differ from normal web services in several aspects; for example in their unusual domain format (an example is expyuzz4wqqyqhjn.onion, The Tor Project’s onion site) and in the way users connect to them—onion services are only accessible over Tor. In this research project, we are trying to understand how users deal with these differences by administering a survey to Tor users. A sound understanding of how users interact with onion services will allow privacy engineers to both improve onion service usability and better protect Tor users from surveillance, censorship, and other attacks.

You can help our research by filling out our survey (the survey is closed as of 2017-09-11). To learn more about our work, visit our project page, and don’t hesitate to get in touch with us if you have any questions.