July 20, 2017

Web Tracking and User Privacy Workshop: Test Cases for Privacy on the Web

This guest post is from Nick Doty, of the W3C and UC Berkeley School of Information. As a companion post to my summary of the position papers submitted for last month’s W3C Do-Not-Track Workshop, hosted by CITP, Nick goes deeper into the substance and interaction during the workshop.

The level of interest and participation in last month’s Workshop on Web Tracking and User Privacy — about a hundred attendees spanning multiple countries, dozens of companies, a wide variety of backgrounds — confirms the broad interest in Do Not Track. The relatively straightforward technical approach with a catchy name has led to, in the US, proposed legislation at both the state and federal level and specific mention by the Federal Trade Commission (it was nice to have Ed Felten back from DC representing his new employer at the workshop), and comparatively rapid deployment of competing proposals by browser vendors. Still, one might be surprised that so many players are devoting such engineering resources to a relatively narrow goal: building technical means that allow users to avoid tracking across the Web for the purpose of compiling behavioral profiles for targeted advertising.

In fact, Do Not Track (in all its variations and competing proposals) is the latest test case for how new online technologies will address privacy issues. What mix of minimization techniques (where one might classify Microsoft’s Tracking Protection block lists) versus preference expression and use limitation (like a Do Not Track header) will best protect privacy and allow for innovation? Can parties agree on a machine-readable expression of privacy preferences (as has been heavily debated in P3P, GeoPriv and other standards work), and if so, how will terms be defined and compliance monitored and enforced? Many attendees were at the workshop not just to address this particular privacy problem — ubiquitous invisible tracking of Web requests to build behavioral profiles — but to grab a seat at the table where the future of how privacy is handled on the Web may be decided. The W3C, for its part, expects to start an Interest Group to monitor privacy on the Web and spin out specific work as new privacy issues inevitably arise, in addition to considering a Working Group to address this particular topic (more below). The Internet Engineering Task Force (IETF) is exploring a Privacy Directorate to provide guidance on privacy considerations across specs.

At a higher level, this debate presents a test case for the process of building consensus and developing standards around technologies like tracking protection or Do Not Track that have inspired controversy. What body (or rather, combination of bodies) can legitimately define preference expressions that must operate at multiple levels in the Web stack, not to mention serve the diverse needs of individuals and entities across the globe? Can the same organization that defines the technical design also negotiate semantic agreement between very diverse groups on the meaning of “tracking”? Is this an appropriate role for technical standards bodies to assume? To what extent can technical groups work with policymakers to build solutions that can be enforced by self-regulatory or governmental players?

Discussion at the recent workshop confirmed many of these complexities: though the agenda was organized to roughly separate user experience, technical granularity, enforcement and standardization, overlap was common and inevitable. Proposals for an “ack” or response header brought up questions of whether the opportunity to disclaim following the preference would prevent legal enforcement; whether not having such a response would leave users confused about when they had opted back in; and how granular such header responses should be. In defining first vs. third party tracking, user expectations, current Web business models and even the same-origin security policy could point the group in different directions.

We did see some moments of consensus. There was general agreement that while user interface issues were key to privacy, trying to standardize those elements was probably counterproductive but providing guidance could help significantly. Regarding the scope of “tracking”, the group was roughly evenly divided on what they would most prefer: a broad definition (any logging), a narrow definition (online behavioral advertising profiling only) or something in between (where tracking is more than OBA but excludes things like analytics or fraud protection, as in the proposal from the Center for Democracy and Technology). But in a “hum” to see which proposals workshop attendees opposed (“non-starters”) no one objected to starting with a CDT-style middle ground — a rather shocking level of agreement to end two days chock full of debate.

For tech policy nerds, then, this intimate workshop about a couple of narrow technical proposals was heady stuff. And the points of agreement suggest that real interoperable progress on tracking protection — the kind that will help the average end user’s privacy — is on the way. For the W3C, this will certainly be a topic of discussion at the ongoing meeting in Bilbao, and we’re beginning detailed conversations about the scope and milestones for a Working Group to undertake technical standards work.

Thanks again to Princeton/CITP for hosting the event, and to Thomas and Lorrie for organizing it: bringing together this diverse group of people on short notice was a real challenge, and it paid off for all of us. If you’d like to see more primary materials: minutes from the workshop (including presentations and discussions) are available, as are the position papers and slides. And the W3C will post a workshop report with a more detailed summary very soon.

Building a better CA infrastructure

As several Tor project authors, Ben Adida and many others have written, our certificate authority infrastructure has the flaw that any one CA, anywhere on the planet, can issue a certificate for any web site, anywhere else on the planet. This was tolerable when the only game in town was VeriSign, but now that’s just untenable. So what solutions are available?

First, some non-solutions: Extended validation certs do nothing useful. Will users be properly trained to look for the extra changes in browser behavior as to scream when they’re absent via a normal cert? Fat chance. Similarly, certificate revocation lists buy you nothing if you can’t actually download them (a notable issue if you’re stuck behind the firewall of somebody who wants to attack you).

A straightforward idea is to track the certs you see over time and generate a prominent warning if you see something anomalous. This is available as a fully-functioning Firefox extension, Certificate Patrol. This should be built into every browser.

In addition to your first-hand personal observations, why not leverage other resources on the network to make their own observations? For example, while Google is crawling the web, it can easily save SSL/TLS certificates when it sees them, and browsers could use a real-time API much like Google SafeBrowsing. A research group at CMU has already built something like this, which they call a network notary. In essence, you can have multiple network services, running from different vantage points in the network, all telling you whether the cryptographic credentials you got match what others are seeing. Of course, if you’re stuck behind an attacker’s firewall, the attacker will similarly filter out all these sites.

UPDATE: Google is now doing almost exactly what I suggested.

There are a variety of other proposals out there, notably trying to leverage DNSSEC to enhance or supplant the need for SSL/TLS certificates. Since DNSSEC provides more control over your DNS records, it also provides more control over who can issue SSL/TLS certificates for your web site. If and when DNSSEC becomes universally supported, this would be a bit harder for attacker firewalls to filter without breaking everything, so I certainly hope this takes off.

Let’s say that future browsers properly use all of these tricks and can unquestionably determine for you, with perfect accuracy, when you’re getting a bogus connection. Your browser will display an impressive error dialog and refuses to load the web site. Is that sufficient? This will certainly break all the hotel WiFi systems that want to redirect you to an internal site where they can charge you to access the network. (Arguably, this sort of functionality belongs elsewhere in the software stack, such as through IEEE 802.21, notably used to connect AT&T iPhones to the WiFi service at Starbucks.) Beyond that, though, should the browser just steadfastly refuse to allow the connection? I’ve been to at least one organization whose internal WiFi network insists that it proxy all of your https sessions and, in fact, issues fabricated certificates that you’re expected to configure your browser to trust. We need to support that sort of thing when it’s required, but again, it would perhaps best be supported by some kind of side-channel protocol extension, not by doing a deliberate MITM attack on the crypto protocol.

Corner cases aside, what if you’re truly in a hostile environment and your browser has genuinely detected a network adversary? Should the browser refuse the connection, or should there be some other option? And if so, what would that be? Should the browser perhaps allow the connection (with much gnashing of teeth and throbbing red borders on the window)? Should previous cookies and saved state be hidden away? Should web sites like Gmail and Facebook allow users to have two separate passwords, one for “genuine” login and a separate one for “Yes, I’m in a hostile location, but I need to send and receive email in a limited but still useful fashion?”

[Editor’s note: you may also be interested in the many prior posts on this topic by Freedom to Tinker contributors: 1, 2, 3, 4, 5, 6, 7, 8 — as well as the “Emerging Threats to Online Trust: The Role of Public Policy and Browser Certificates” event that CITP hosted in DC last year with policymakers, industry, and activists.]

A Free Internet, If We Can Keep It

“We stand for a single internet where all of humanity has equal access to knowledge and ideas. And we recognize that the world’s information infrastructure will become what we and others make of it. ”

These two sentences, from Secretary of State Clinton’s groundbreaking speech on Internet freedom, sum up beautifully the challenge facing our Internet policy. An open Internet can advance our values and support our interests; but we will only get there if we make some difficult choices now.

One of these choices relates to anonymity. Will it be easy to speak anonymously on the Internet, or not? This was the subject of the first question in the post-speech Q&A:

QUESTION: You talked about anonymity on line and how we have to prevent that. But you also talk about censorship by governments. And I’m struck by – having a veil of anonymity in certain situations is actually quite beneficial. So are you looking to strike a balance between that and this emphasis on censorship?

SECRETARY CLINTON: Absolutely. I mean, this is one of the challenges we face. On the one hand, anonymity protects the exploitation of children. And on the other hand, anonymity protects the free expression of opposition to repressive governments. Anonymity allows the theft of intellectual property, but anonymity also permits people to come together in settings that gives them some basis for free expression without identifying themselves.

None of this will be easy. I think that’s a fair statement. I think, as I said, we all have varying needs and rights and responsibilities. But I think these overriding principles should be our guiding light. We should err on the side of openness and do everything possible to create that, recognizing, as with any rule or any statement of principle, there are going to be exceptions.

So how we go after this, I think, is now what we’re requesting many of you who are experts in this area to lend your help to us in doing. We need the guidance of technology experts. In my experience, most of them are younger than 40, but not all are younger than 40. And we need the companies that do this, and we need the dissident voices who have actually lived on the front lines so that we can try to work through the best way to make that balance you referred to.

Secretary Clinton’s answer is trying to balance competing interests, which is what good politicians do. If we want A, and we want B, and A is in tension with B, can we have some A and some B together? Is there some way to give up a little A in exchange for a lot of B? That’s a useful way to start the discussion.

But sometimes you have to choose — sometimes A and B are profoundly incompatible. That seems to be the case here. Consider the position of a repressive government that wants to spy on a citizen’s political speech, as compared to the position of the U.S. government when it wants to eavesdrop on a suspect’s conversations under a valid search warrant. The two positions are very different morally, but they are pretty much the same technologically. Which means that either both governments can eavesdrop, or neither can. We have to choose.

Secretary Clinton saw this tension, and, being a lawyer, she saw that law could not resolve it. So she expressed the hope that technology, the aspect she understood least, would offer a solution. This is a common pattern: Given a difficult technology policy problem, lawyers will tend to seek technology solutions and technologists will tend to seek legal solutions. (Paul Ohm calls this “Felten’s Third Law”.) It’s easy to reject non-solutions in your own area because you have the knowledge to recognize why they will fail; but there must be a solution lurking somewhere in the unexplored wilderness of the other area.

If we’re forced to choose — and we will be — what kind of Internet will we have? In Secretary Clinton’s words, “the world’s information infrastructure will become what we and others make of it.” We’ll have a free Internet, if we can keep it.