August 20, 2018

Archives for 2018

Internet of Things in Context: Discovering Privacy Norms with Scalable Surveys

by Noah Apthorpe, Yan Shvartzshnaider, Arunesh Mathur, Nick Feamster

Privacy concerns surrounding disruptive technologies such as the Internet of Things (and, in particular, connected smart home devices) have been prevalent in public discourse, with privacy violations from these devices occurring frequently. As these new technologies challenge existing societal norms, determining the bounds of “acceptable” information handling practices requires rigorous study of user privacy expectations and normative opinions towards information transfer.

To better understand user attitudes and societal norms concerning data collection, we have developed a scalable survey method for empirically studying privacy in context.  This survey method uses (1) a formal theory of privacy called contextual integrity and (2) combinatorial testing at scale to discover privacy norms. In our work, we have applied the method to better understand norms concerning data collection in smart homes. The general method, however, can be adapted to arbitrary contexts with varying actors, information types, and communication conditions, paving the way for future studies informing the design of emerging technologies. The technique can provide meaningful insights about privacy norms for manufacturers, regulators, researchers and other stakeholders.  Our paper describing this research appears in the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.

Scalable CI Survey Method

Contextual integrity. The survey method applies the theory of contextual integrity (CI), which frames privacy in terms of the appropriateness of information flows in defined contexts. CI offers a framework to describe flows of information (attributes) about a subject from a sender to a receiver, under specific conditions (transmission principles).  Changing any of these parameters of an information flow could result in a violation of privacy.  For example, a flow of information about your web searches from your browser to Google may be appropriate, while the same information flowing from your browser to your ISP might be inappropriate.

Combinatorial construction of CI information flows. The survey method discovers privacy norms by asking users about the acceptability of a large number of information flows that we automatically construct using the CI framework. Because the CI framework effectively defines an information flow as a tuple (attributes, subject, sender, receiver, and transmission principle), we can automate the process of constructing information flows by defining a range of parameter values for each tuple and generating a large number of flows from combinations of parameter values.

Applying the Survey Method to Discover Smart Home Privacy Norms

We applied the survey method to 3,840 IoT-specific information flows involving a range of device types (e.g., thermostats, sleep monitors), information types (e.g., location, usage patterns), recipients (e.g., device manufacturers, ISPs) and transmission principles (e.g., for advertising, with consent). 1,731 Amazon Mechanical Turk workers rated the acceptability of these information flows on a 5-point scale from “completely unacceptable” to “completely acceptable”.

Trends in acceptability ratings across information flows indicate which context parameters are particularly relevant to privacy norms. For example, the following heatmap shows the average acceptability ratings of all information flows with pairwise combinations of recipients and transmission principles.

Average acceptability scores of information flows with given recipient/transmission principle pairs.

Average acceptability scores of information flows with given recipient/transmission principle pairs. For example, the top left box shows the average acceptability score of all information flows with the recipient “its owner’s immediate family” and the transmission principle “if its owner has given consent.” Higher (more blue) scores indicate that flows with the corresponding parameters are more acceptable, while lower (more red) scores indicate that the flows are less acceptable. Flows with the null transmission principle are controls with no specific condition on their occurrence. Empty locations correspond to less intuitive information flows that were excluded from the survey. Parameters are sorted by descending average acceptability score for all information flows containing that parameter.

These results provide several insights about IoT privacy, including the following:

  • Advertising and Indefinite Data Storage Generally Violate Privacy Norms. Respondents viewed information flows from IoT devices for advertising or for indefinite storage as especially unacceptable. Unfortunately, advertising and indefinite storage remain standard practice for many IoT devices and cloud services.
  • Transitive Flows May Violate Privacy Norms. Consider a device that sends its owner’s location to a smartphone, and the smartphone then sends the location to a manufacturer’s cloud server. This device initiates two information flows: (1) to the smartphone and (2) to the phone manufacturer. Although flow #1 may conform to user privacy norms, flow #2 may violate norms. Manufacturers of devices that connect to IoT hubs (often made by different companies), rather than directly to cloud services, should avoid having these devices send potentially sensitive information with greater frequency or precision than necessary.

Our paper expands on these findings, including more details on the survey method, additional results, analyses, and recommendations for manufacturers, researchers, and regulators.

We believe that the survey method we have developed is broadly applicable to studying societal privacy norms at scale and can thus better inform privacy-conscious design across a range of domains and technologies.

Teaching the Craft, Ethics, and Politics of Field Experiments

How can we manage the politics and ethics of large-scale online behavioral research? When this question came up in April during a forum on Defending Democracy at Princeton, Ed Felten mentioned on stage that I was teaching a Princeton undergrad class on this very topic. No pressure!

Ed was right about the need: people with undergrad computer science degrees routinely conduct large-scale behavioral experiments affecting millions or billions of people. Since large-scale human subjects research is now common, universities need to equip students to make sense of and think critically about that kind of power.

[Read more…]

Against privacy defeatism: why browsers can still stop fingerprinting

In this post I’ll discuss how a landmark piece of privacy research was widely misinterpreted, how this misinterpretation deterred the development of privacy technologies rather than spurring it, how a recent paper set the record straight, and what we can learn from all this.

The research in question is about browser fingerprinting. Because of differences in operating systems, browser versions, fonts, plugins, and at least a dozen other factors, different users’ web browsers tend to look different. This can be exploited by websites and third-party trackers to create so-called fingerprints. These fingerprints are much more effective than cookies for tracking users across websites: they leave no trace on the device and cannot easily be reset by the user.

The question is simply this: how effective is browser fingerprinting? That is, how unique is the typical user’s device fingerprint? The answer has big implications for online privacy. But studying this question scientifically is hard: while there are many tracking companies that have enormous databases of fingerprints, they don’t share them with researchers.

The first large-scale experiment on fingerprinting, called Panopticlick, was done by the Electronic Frontier Foundation starting in 2009. Hundreds of thousands of volunteers visited panopticlick.eff.org and agreed to have their browser fingerprinted for research. What the EFF found was remarkable at the time: 83% of participants had a fingerprint that was unique in the sample. Among those with Flash or Java enabled, fingerprints were even more likely to be unique: 94%. A project by researchers at INRIA in France with an even larger sample found broadly similar results. Meanwhile, researchers, including us, found that an ever larger number of browser features — Canvas, Battery, Audio, and WebRTC — were being abused by tracking companies for fingerprinting.

The conclusion was clear: fingerprinting is devastatingly effective. It would be futile for web browsers to try to limit fingerprintability by exposing less information to scripts: there were too many leaks to plug; too many fingerprinting vectors. The implications were profound. Browser vendors concluded that they wouldn’t be able to stop third-party tracking, and so privacy protection was left up to extensions. [1] These extensions didn’t aim to limit fingerprintability either. Instead, most of them worked in a convoluted way: by manually compiling block lists of thousands of third-party tracking scripts, constantly playing catch up as new players entered the tracking game.

But here’s the twist: a team at INRIA (including some of the same researchers responsible for the earlier study) managed to partner with a major French website and test the website’s visitors for fingerprintability. The findings were published a few months ago, and this time the results were quite different: only a third of users had unique fingerprints (compared to 83% and 94% earlier), despite the researchers’ use of a comprehensive set of 17 fingerprinting attributes. For mobile users the number was even lower: less than a fifth. There were two reasons for the differences: a larger sample in the new study, and because self-selection of participants appears to have introduced a bias in the earlier studies. There’s more: since the web is evolving away from plugins such as Flash and Java, we should expect fingerprintability to drop even further. A close look at the paper’s findings suggests that even simple interventions by browsers to limit the highest-entropy attributes would greatly improve the ability of users to hide in the crowd.

Apple recently announced that Safari would try and limit fingerprinting, and it’s likely that the recent paper had an influence in this decision. Notably, a minority of web privacy experts never subscribed to the view that fingerprinting protection is futile, and W3C, the main web standards body, has long provided guidance for developers of new standards on how to minimize fingerprintability. It’s still not too late. But if we’d known in 2009 what we know today, browsers would have had a big head start in developing and deploying fingerprinting defenses.

Why did the misinterpretation happen in the first place? One easy lesson is that statistics is hard, and non-representative samples can thoroughly skew research conclusions. But there’s another pill that’s harder to swallow: the recent study was able to test users in the wild only because the researchers didn’t ask or notify the users. [2] With Internet experiments, there is a tension between traditional informed consent and validity of findings, and we need new ethical norms to resolve this.

Another lesson is that privacy defenses don’t need to be perfect. Many researchers and engineers think about privacy in all-or-nothing terms: a single mistake can be devastating, and if a defense won’t be perfect, we shouldn’t deploy it at all. That might make sense for some applications such as the Tor browser, but for everyday users of mainstream browsers, the threat model is death by a thousand cuts, and privacy defenses succeed by interfering with the operation of the surveillance economy.

Finally, the fingerprinting-defense-is-futile argument is an example of privacy defeatism. Faced with an onslaught of bad news about privacy, we tend to acquire a form of learned helplessness, and reach the simplistic conclusion that privacy is dying and there’s nothing we can do about it. But this position is not supported by historical evidence: instead, we find that there is a constant re-negotiation of the privacy equilibrium, and while there are always privacy-infringing developments, there are offset from time to time by legal, technological, and social defenses.

Browser fingerprinting remains on the frontlines of the privacy battle today. The GDPR is making things harder for fingerprinters. It’s time for browser vendors to also get serious in cracking down on this sneaky practice. 

Thanks to Günes Acar and Steve Englehardt for comments on a draft.

[1] One notable exception is the Tor browser, but it comes at a serious cost to performance and breakage of features on websites. Another is Brave, which has a self-selected userbase presumably willing to accept some breakage in exchange for privacy.

[2] The researchers limited their experiment to users who had previously consented to the site’s generic cookie notice; they did not specifically inform users about their study.