January 16, 2017

Web Tracking and User Privacy Workshop: Test Cases for Privacy on the Web

This guest post is from Nick Doty, of the W3C and UC Berkeley School of Information. As a companion post to my summary of the position papers submitted for last month’s W3C Do-Not-Track Workshop, hosted by CITP, Nick goes deeper into the substance and interaction during the workshop.

The level of interest and participation in last month’s Workshop on Web Tracking and User Privacy — about a hundred attendees spanning multiple countries, dozens of companies, a wide variety of backgrounds — confirms the broad interest in Do Not Track. The relatively straightforward technical approach with a catchy name has led to, in the US, proposed legislation at both the state and federal level and specific mention by the Federal Trade Commission (it was nice to have Ed Felten back from DC representing his new employer at the workshop), and comparatively rapid deployment of competing proposals by browser vendors. Still, one might be surprised that so many players are devoting such engineering resources to a relatively narrow goal: building technical means that allow users to avoid tracking across the Web for the purpose of compiling behavioral profiles for targeted advertising.

In fact, Do Not Track (in all its variations and competing proposals) is the latest test case for how new online technologies will address privacy issues. What mix of minimization techniques (where one might classify Microsoft’s Tracking Protection block lists) versus preference expression and use limitation (like a Do Not Track header) will best protect privacy and allow for innovation? Can parties agree on a machine-readable expression of privacy preferences (as has been heavily debated in P3P, GeoPriv and other standards work), and if so, how will terms be defined and compliance monitored and enforced? Many attendees were at the workshop not just to address this particular privacy problem — ubiquitous invisible tracking of Web requests to build behavioral profiles — but to grab a seat at the table where the future of how privacy is handled on the Web may be decided. The W3C, for its part, expects to start an Interest Group to monitor privacy on the Web and spin out specific work as new privacy issues inevitably arise, in addition to considering a Working Group to address this particular topic (more below). The Internet Engineering Task Force (IETF) is exploring a Privacy Directorate to provide guidance on privacy considerations across specs.

At a higher level, this debate presents a test case for the process of building consensus and developing standards around technologies like tracking protection or Do Not Track that have inspired controversy. What body (or rather, combination of bodies) can legitimately define preference expressions that must operate at multiple levels in the Web stack, not to mention serve the diverse needs of individuals and entities across the globe? Can the same organization that defines the technical design also negotiate semantic agreement between very diverse groups on the meaning of “tracking”? Is this an appropriate role for technical standards bodies to assume? To what extent can technical groups work with policymakers to build solutions that can be enforced by self-regulatory or governmental players?

Discussion at the recent workshop confirmed many of these complexities: though the agenda was organized to roughly separate user experience, technical granularity, enforcement and standardization, overlap was common and inevitable. Proposals for an “ack” or response header brought up questions of whether the opportunity to disclaim following the preference would prevent legal enforcement; whether not having such a response would leave users confused about when they had opted back in; and how granular such header responses should be. In defining first vs. third party tracking, user expectations, current Web business models and even the same-origin security policy could point the group in different directions.

We did see some moments of consensus. There was general agreement that while user interface issues were key to privacy, trying to standardize those elements was probably counterproductive but providing guidance could help significantly. Regarding the scope of “tracking”, the group was roughly evenly divided on what they would most prefer: a broad definition (any logging), a narrow definition (online behavioral advertising profiling only) or something in between (where tracking is more than OBA but excludes things like analytics or fraud protection, as in the proposal from the Center for Democracy and Technology). But in a “hum” to see which proposals workshop attendees opposed (“non-starters”) no one objected to starting with a CDT-style middle ground — a rather shocking level of agreement to end two days chock full of debate.

For tech policy nerds, then, this intimate workshop about a couple of narrow technical proposals was heady stuff. And the points of agreement suggest that real interoperable progress on tracking protection — the kind that will help the average end user’s privacy — is on the way. For the W3C, this will certainly be a topic of discussion at the ongoing meeting in Bilbao, and we’re beginning detailed conversations about the scope and milestones for a Working Group to undertake technical standards work.

Thanks again to Princeton/CITP for hosting the event, and to Thomas and Lorrie for organizing it: bringing together this diverse group of people on short notice was a real challenge, and it paid off for all of us. If you’d like to see more primary materials: minutes from the workshop (including presentations and discussions) are available, as are the position papers and slides. And the W3C will post a workshop report with a more detailed summary very soon.

Summary of W3C DNT Workshop Submissions

Last week, we hosted the W3C “Web Tracking and User Privacy” Workshop here at CITP (sponsored by Adobe, Yahoo!, Google, Mozilla and Microsoft). If you were not able to join us for this event, I hope to summarize some of the discussion embodied in the roughly 60 position papers submitted.

The workshop attracted a wide range of participants; the agenda included advocates, academics, government, start-ups and established industry players from various sectors. Despite the broad name of the workshop, the discussion centered around “Do Not Track” (DNT) technologies and policy, essentially ways of ensuring that people have control, to some degree, over web profiling and tracking.

Unfortunately, I’m going to have to expect that you are familiar with the various proposals before going much further, as the workshop position papers are necessarily short and assume familiarity. (If you are new to this area, the CDT’s Alissa Cooper has a brief blog post from this past March, “Digging in on ‘Do Not Track'”, that mentions many of the discussion points. Technically, much of the discussion involved the mechanisms of the Mayer, Narayanan and Stamm IETF Internet-Draft from March and the Microsoft W3C member submission from February.)

Read on for more…

Technical Implementation: First, some quick background and updates: A number of papers point out how analogizing to a Do-Not-Call-like registry–I suppose where netizens would sign-up not to be tracked–would not work in the online tracking sense, so we should be careful not to shape the technology and policy too closely to Do-Not-Call. Having recognized that, the current technical proposals center around the Microsoft W3C submission and the Mayer et al. IETF submission, including some mix of a DNT HTTP header, a DNT DOM flag, and Tracking Protection Lists (TPLs). While the IETF submission focuses exclusively on the DNT HTTP Header, the W3C submission includes all three of these technologies. Browsers are moving pretty quickly here: Mozilla’s FireFox v4.0 browser includes the DNT header, Microsoft’s IE9 includes all three of these capabilities, Google’s Chrome browser now allows extensions to send the DNT Header through the WebRequest API and Apple has announced that the next version of its Safari browser will support the DNT header.

Some of the papers critique certain aspects of the three implementation options while some suggest other mechanisms entirely. CITP’s Harlan Yu includes an interesting discussion of the problems with DOM flag granularity and access control problems when third-party code included in a first-party site runs as if it were first-part code. Toubiana and Nissenbaum talk about a number of problems with the persistence of DNT exceptions (where a user opts back in) when a resource changes content or ownership and then go on to suggest profile-based opting-back-in based on a “topic” or grouping of websites. Avaya’s submission has a fascinating discussion of the problems with implementation of DNT within enterprise environments, where tracking-like mechanisms are used to make sure people are doing their jobs across disparate enterprise web-services; Avaya proposes a clever solution where the browser first checks to see if it can reach a resource only available internally to the enterprise (virtual) network, in which case it ignores DNT preferences for enterprise software tracking mechanisms. A slew of submissions from Aquin et al., Azigo and PDECC favor a culture of “self-tracking”, allowing and teaching people to know more about the digital traces they leave and giving them (or their agents) control over the use and release of their personal information. CASRO-ESOMAR and Apple have interesting discussions of gaming TPLs: CASRO-ESOMAR points out that a competitor could require a user to accept a TPL that blocks traffic from their competitors and Apple talks about spam-like DNS cycling as an example of an “arms race” response against TPLs.

Definitions: Many of the papers addressed definitions definitions definitions… mostly about what “tracking” means and what terms like “third-party” should mean. Many industry submissions such as Paypal, Adobe, SIIA, and Google urge caution so that good types of “tracking”, such as analytics and forensics, are not swept under the rug and further argue that clear definitions of the terms involved in DNT is crucial to avoid disrupting user expectations, innovation and the online ecosystem. Paypal points out, as have others, that domain names are not good indicators of third-party (e.g., metrics.apple.com is the Adobe Omniture service for apple.com and fb.com is equivalent to facebook.com). Ashkan Soltani’s submission distinguishes definitions for DNT that are a “do not use” conception vs. a “do not collect” conception and argues for a solution that “does not identify”, requiring the removal of any unique identifiers associated with the data. Soltani points out how this has interesting measurement/enforcement properties as if a user sees a unique ID in the DNI case, the site is doing it wrong.

Enforcement: Some raised the issue of enforcement; Mozilla, for example, wants to make sure that there are reasonable enforcement mechanisms to deal with entities that ignore DNT mechanisms. On the other side, so to speak, are those calling for self-regulation such as Comcast and SIIA vs. those advocating for explicit regulation. The opinion polling research groups, CASRO-ESOMAR, call explicitly for regulation no matter what DNT mechanism is ultimately adopted, such that DNT headers requests are clearly enforced or that TPLs are regulated tightly so as to not over-block legitimate research activities. Abine wants a cooperative market mechanism that results in a “healthy market system that is responsive to consumer outcome metrics” and that incentivizes advertising companies to work with privacy solution providers to increase consumer awareness and transparency around online tracking. Many of the industry players worried about definitions are also worried about over-prescription from a regulatory perspective; e.g., Datran Media is concerned about over-prescription via regulation that might stifle innovation in new business models. Hoofnagle et al. are evaluating the effectiveness of self-regulation, and find that the self-regulation programs currently in existence are greatly stilted in favor of industry and do not adequately embody consumer conceptions of privacy and tracking.

Research: There were a number of submissions addressing research that is ongoing and/or further needed to gauge various aspects of the DNT puzzle. The submissions from McDonald and Wang et al. describe user studies focusing, respectively, on what consumers expect from DNT–spoiler: they expect no collection of their data–and gauging the usability and effectiveness of current opt-out tools. Both of these lines of work argue for usable mechanisms that communicate how developers implement/envision DNT and how users can best express their preferences via these tools. NIST’s submission argues for empirical studies to set objective and usable standards for tracking protection and describes a current study of single sign-on (SSO) implementations. Thaw et al. discuss a proposal for incentivizing developers to communicate and design the various levels of rich data they need to perform certain kinds of ad targeting, and then uses a multi-arm bandit model to illustrate game-theoretic ad targeting that can be tweaked based on how much data they are allowed to collect. Finally, CASRO-ESOMAR makes a plea for exempting legitimate research purposes from DNT, so that opinion polling and academic research can avoid bias.

Transparency: A particularly fascinating thread of commentary to me was the extent to which submissions touched on or entirely focused on issues of transparency in tracking. Grossklags argues that DNT efforts will spark increased transparency but he’s not sure that will overcome some common consumer privacy barriers they see in research. Seltzer talks about the intimate relationship between transparency and privacy and concludes that a DNT header is not very transparent–in operation, not use–while TPLs are more transparent in that they are a user-side mechanism that users can inspect, change and verify correct operation. Google argues that there is a need for transparency in “what data is collected and how it is used”, leaving out the ability for users to effect or controls these things. In contrast, BlueKai also advocates for transparency in the sense of both accessing a user’s profile and user “control” over the data it collects, but it doesn’t and probably cannot extend this transparency to an understanding how BlueKai’s clients use this data. Datran Media describes their PreferenceCentral tool which allows opting out of brands the user doesn’t want targeting them (instead of ad networks, with which people are not familiar), which they argue is granular enough to avoid the “creepy” targeting feeling that users get from behavioral ads and also allow high-value targeted advertising. Evidon analogizes to physical world shopping transactions and concludes, smartly, “Anytime data that was not explicitly provided is explicitly used, there is a reflexive notion of privacy violation.” and “A permanently affixed ‘Not Me’ sign is not a representation of an engaged, meaningful choice.”

W3C vs. IETF: Finally, Mozilla seems to be the only submission that wrestles a bit with the “which standards-body?” question: W3C, IETF or some mix of both? They point out that the DNT Header is a broader issue than just web browsing so should be properly tackled by IETF where HTTP resides and the W3C effort could be focused on TPLs with a subcommittee for the DNT DOM element.

Finally, here are a bunch of submissions that don’t fit into the above categories that caught my eye:

  • Soghoian talks about the quantity and quality of information needed for security, law enforcement and fraud prevention is usually so big as to risk making it the exception that swallows the rule. Soghoian further recommends a total kibosh on certain nefarious technologies such as browser fingerprinting.

  • Lowenthal makes the very good point that browser vendors need to get more serious about managing security and privacy vulnerabilities, as that kind of risk can be best dealt with in the choke-point of the browsers that users choose, rather than the myriad of possible web entities. This would allow browsers to compete on privacy in terms of how privacy preserving they can be.

  • Mayer argues for a “generative” approach to a privacy choice signaling technology, highlighting that language preferences (via short codes) and browsing platform (via user-agent strings) are now sent as preferences in web requests and web sites are free to respond as they see fit. A DNT signaling mechanism like this would allow for great flexibility in how a web service responded to a DNT request, for example serving a DNT version of the site/resource, prompting the user for their preferences or asking for a payment before serving.

  • Yahoo points out that DNT will take a while to make it into the majority of browsers that users are using. They suggest a hybrid approach using the DAA CLEAR ad notice for backwards compatibility for browsers that don’t support DNT mechanisms and the DNT header for an opt-out that is persistent and enforceable.

Whew; I likely left out a lot of good stuff across the remaining submissions, but I hope that readers get an idea of some of the issues in play and can consult the submissions they find particularly interesting as this develops. We hope to have someone pen a “part 2” to this entry describing the discussion during the workshop and what the next steps in DNT will be.

Internet Voting in Union Elections?

The U.S. Department of Labor (DOL) recently asked for public comment on a fascinating issue: what kind of guidelines should they give unions that want to use “electronic voting” to elect their officers? (Curiously, they defined electronic voting broadly to include computerized (DRE) voting systems, vote-by-phone systems and internet voting systems.)

As a technology policy researcher with the NSF ACCURATE e-voting center, I figured we should have good advice for DOL.

(If you need a quick primer on security issues in e-voting, GMU’s Jerry Brito has just posted an episode of his Surprisingly Free podcast where he and I work through a number of basic issues in e-voting and security. I’d suggest you check out Jerry’s podcast regularly as he gets great guests (like a podcast with CITP’s own Tim Lee) and really digs deep into the issues while keeping it at an understandable level.)

The DOL issued a Request for Information (PDF) that asked a series of questions, beginning with the very basic, “Should we issue e-voting guidelines at all?” The questions go on to ask about the necessity of voter-verified paper audit trails (VVPATs), observability, meaningful recounts, ballot secrecy, preventing flawed and/or malicious software, logging, insider threats, voter intimidation, phishing, spoofing, denial-of-service and recovering from malfunctions.

Whew. The DOL clearly wanted a “brain dump” from computer security and the voting technology communities!

It turns out that labor elections and government elections aren’t as different as I originally thought. The controlling statute for union elections (the LMRDA) and caselaw* that has developed over the years require strict ballot secrecy–such that any technology that could link a voter and their ballot is not allowed–both during voting and in any post-election process. The one major difference is that there isn’t a body of election law and regulation on top of which unions and the DOL can run their elections; for example, election laws frequently disallow campaigning or photography within a certain distance of an official polling place while that would be hard to prohibit in union elections.

After a considerable amount of wrangling and writing, ACCURATE submitted a comment, find it here in PDF. The essential points we make are pretty straightforward: 1) don’t allow internet voting from unsupervised, uncontrolled computing devices for any election that requires high integrity; and, 2) only elections that use voter-verified paper records (VVPRs) subject to an audit process that uses those records to audit the reported election outcome can avoid the various types of threats that DOL is concerned with. The idea is simple: VVPRs are independent of the software and hardware of the voting system, so it doesn’t matter how bad those aspects are as long as there is a robust parallel process that can check the result. Of course, VVPRs are no panacea: they must be carefully stored, secured and transported and ACCURATE’s HCI researchers have shown that it’s very hard to get voters to consistently check them for accuracy. However, those problems are much more tractable than, say, removing all the malware and spyware from hundreds of thousands of voter PCs and mobile devices.

I must say I was a bit surprised to see the other sets of comments submitted, mostly by voting system vendors and union organizations, but also the Electronic Privacy Information Center (EPIC). ACCURATE and EPIC seem to be lone voices in this process “porting” what we’ve learned about the difficulties of running secure civic elections to the labor sphere. Many of the unions talked about how they must have forms of electronic, phone and internet voting as their constituencies are spread far and wide, can’t make it to polling places and are concerned with environmental impacts of paper and more traditional voting methods. Of course, we would counter that accommodations can be made for most of these concerns and still not fundamentally undermine the integrity of union elections.

Both unions and vendors used an unfortunate rhetorical tactic when talking about security properties of these systems: “We’ve run x hundreds of elections using this kind of technology and have never had a problem/no one has ever complained about fraud.” Unfortunately, that’s not how security works. Akin to adversarial processes like financial audits, security isn’t something that you can base predictions of future performance on past results. That is, the SEC doesn’t say to companies that their past 10 years of financials have been in order, so take a few years off. No, security requires careful design, affirmative effort and active auditing to assure that a system doe not violate the properties it claims.

There’s a lot more in our comment, and I’d be more than happy to respond to comments if you have questions.

* Check out the “Court Cases” section of the Federal Register notice linked to above.