March 19, 2024

Web Tracking and User Privacy Workshop: Test Cases for Privacy on the Web

This guest post is from Nick Doty, of the W3C and UC Berkeley School of Information. As a companion post to my summary of the position papers submitted for last month’s W3C Do-Not-Track Workshop, hosted by CITP, Nick goes deeper into the substance and interaction during the workshop.

The level of interest and participation in last month’s Workshop on Web Tracking and User Privacy — about a hundred attendees spanning multiple countries, dozens of companies, a wide variety of backgrounds — confirms the broad interest in Do Not Track. The relatively straightforward technical approach with a catchy name has led to, in the US, proposed legislation at both the state and federal level and specific mention by the Federal Trade Commission (it was nice to have Ed Felten back from DC representing his new employer at the workshop), and comparatively rapid deployment of competing proposals by browser vendors. Still, one might be surprised that so many players are devoting such engineering resources to a relatively narrow goal: building technical means that allow users to avoid tracking across the Web for the purpose of compiling behavioral profiles for targeted advertising.

In fact, Do Not Track (in all its variations and competing proposals) is the latest test case for how new online technologies will address privacy issues. What mix of minimization techniques (where one might classify Microsoft’s Tracking Protection block lists) versus preference expression and use limitation (like a Do Not Track header) will best protect privacy and allow for innovation? Can parties agree on a machine-readable expression of privacy preferences (as has been heavily debated in P3P, GeoPriv and other standards work), and if so, how will terms be defined and compliance monitored and enforced? Many attendees were at the workshop not just to address this particular privacy problem — ubiquitous invisible tracking of Web requests to build behavioral profiles — but to grab a seat at the table where the future of how privacy is handled on the Web may be decided. The W3C, for its part, expects to start an Interest Group to monitor privacy on the Web and spin out specific work as new privacy issues inevitably arise, in addition to considering a Working Group to address this particular topic (more below). The Internet Engineering Task Force (IETF) is exploring a Privacy Directorate to provide guidance on privacy considerations across specs.

At a higher level, this debate presents a test case for the process of building consensus and developing standards around technologies like tracking protection or Do Not Track that have inspired controversy. What body (or rather, combination of bodies) can legitimately define preference expressions that must operate at multiple levels in the Web stack, not to mention serve the diverse needs of individuals and entities across the globe? Can the same organization that defines the technical design also negotiate semantic agreement between very diverse groups on the meaning of “tracking”? Is this an appropriate role for technical standards bodies to assume? To what extent can technical groups work with policymakers to build solutions that can be enforced by self-regulatory or governmental players?

Discussion at the recent workshop confirmed many of these complexities: though the agenda was organized to roughly separate user experience, technical granularity, enforcement and standardization, overlap was common and inevitable. Proposals for an “ack” or response header brought up questions of whether the opportunity to disclaim following the preference would prevent legal enforcement; whether not having such a response would leave users confused about when they had opted back in; and how granular such header responses should be. In defining first vs. third party tracking, user expectations, current Web business models and even the same-origin security policy could point the group in different directions.

We did see some moments of consensus. There was general agreement that while user interface issues were key to privacy, trying to standardize those elements was probably counterproductive but providing guidance could help significantly. Regarding the scope of “tracking”, the group was roughly evenly divided on what they would most prefer: a broad definition (any logging), a narrow definition (online behavioral advertising profiling only) or something in between (where tracking is more than OBA but excludes things like analytics or fraud protection, as in the proposal from the Center for Democracy and Technology). But in a “hum” to see which proposals workshop attendees opposed (“non-starters”) no one objected to starting with a CDT-style middle ground — a rather shocking level of agreement to end two days chock full of debate.

For tech policy nerds, then, this intimate workshop about a couple of narrow technical proposals was heady stuff. And the points of agreement suggest that real interoperable progress on tracking protection — the kind that will help the average end user’s privacy — is on the way. For the W3C, this will certainly be a topic of discussion at the ongoing meeting in Bilbao, and we’re beginning detailed conversations about the scope and milestones for a Working Group to undertake technical standards work.

Thanks again to Princeton/CITP for hosting the event, and to Thomas and Lorrie for organizing it: bringing together this diverse group of people on short notice was a real challenge, and it paid off for all of us. If you’d like to see more primary materials: minutes from the workshop (including presentations and discussions) are available, as are the position papers and slides. And the W3C will post a workshop report with a more detailed summary very soon.

Building a better CA infrastructure

As several Tor project authors, Ben Adida and many others have written, our certificate authority infrastructure has the flaw that any one CA, anywhere on the planet, can issue a certificate for any web site, anywhere else on the planet. This was tolerable when the only game in town was VeriSign, but now that’s just untenable. So what solutions are available?

First, some non-solutions: Extended validation certs do nothing useful. Will users be properly trained to look for the extra changes in browser behavior as to scream when they’re absent via a normal cert? Fat chance. Similarly, certificate revocation lists buy you nothing if you can’t actually download them (a notable issue if you’re stuck behind the firewall of somebody who wants to attack you).

A straightforward idea is to track the certs you see over time and generate a prominent warning if you see something anomalous. This is available as a fully-functioning Firefox extension, Certificate Patrol. This should be built into every browser.

In addition to your first-hand personal observations, why not leverage other resources on the network to make their own observations? For example, while Google is crawling the web, it can easily save SSL/TLS certificates when it sees them, and browsers could use a real-time API much like Google SafeBrowsing. A research group at CMU has already built something like this, which they call a network notary. In essence, you can have multiple network services, running from different vantage points in the network, all telling you whether the cryptographic credentials you got match what others are seeing. Of course, if you’re stuck behind an attacker’s firewall, the attacker will similarly filter out all these sites.

UPDATE: Google is now doing almost exactly what I suggested.

There are a variety of other proposals out there, notably trying to leverage DNSSEC to enhance or supplant the need for SSL/TLS certificates. Since DNSSEC provides more control over your DNS records, it also provides more control over who can issue SSL/TLS certificates for your web site. If and when DNSSEC becomes universally supported, this would be a bit harder for attacker firewalls to filter without breaking everything, so I certainly hope this takes off.

Let’s say that future browsers properly use all of these tricks and can unquestionably determine for you, with perfect accuracy, when you’re getting a bogus connection. Your browser will display an impressive error dialog and refuses to load the web site. Is that sufficient? This will certainly break all the hotel WiFi systems that want to redirect you to an internal site where they can charge you to access the network. (Arguably, this sort of functionality belongs elsewhere in the software stack, such as through IEEE 802.21, notably used to connect AT&T iPhones to the WiFi service at Starbucks.) Beyond that, though, should the browser just steadfastly refuse to allow the connection? I’ve been to at least one organization whose internal WiFi network insists that it proxy all of your https sessions and, in fact, issues fabricated certificates that you’re expected to configure your browser to trust. We need to support that sort of thing when it’s required, but again, it would perhaps best be supported by some kind of side-channel protocol extension, not by doing a deliberate MITM attack on the crypto protocol.

Corner cases aside, what if you’re truly in a hostile environment and your browser has genuinely detected a network adversary? Should the browser refuse the connection, or should there be some other option? And if so, what would that be? Should the browser perhaps allow the connection (with much gnashing of teeth and throbbing red borders on the window)? Should previous cookies and saved state be hidden away? Should web sites like Gmail and Facebook allow users to have two separate passwords, one for “genuine” login and a separate one for “Yes, I’m in a hostile location, but I need to send and receive email in a limited but still useful fashion?”

[Editor’s note: you may also be interested in the many prior posts on this topic by Freedom to Tinker contributors: 1, 2, 3, 4, 5, 6, 7, 8 — as well as the “Emerging Threats to Online Trust: The Role of Public Policy and Browser Certificates” event that CITP hosted in DC last year with policymakers, industry, and activists.]

A Free Internet, If We Can Keep It

“We stand for a single internet where all of humanity has equal access to knowledge and ideas. And we recognize that the world’s information infrastructure will become what we and others make of it. ”

These two sentences, from Secretary of State Clinton’s groundbreaking speech on Internet freedom, sum up beautifully the challenge facing our Internet policy. An open Internet can advance our values and support our interests; but we will only get there if we make some difficult choices now.

One of these choices relates to anonymity. Will it be easy to speak anonymously on the Internet, or not? This was the subject of the first question in the post-speech Q&A:

QUESTION: You talked about anonymity on line and how we have to prevent that. But you also talk about censorship by governments. And I’m struck by – having a veil of anonymity in certain situations is actually quite beneficial. So are you looking to strike a balance between that and this emphasis on censorship?

SECRETARY CLINTON: Absolutely. I mean, this is one of the challenges we face. On the one hand, anonymity protects the exploitation of children. And on the other hand, anonymity protects the free expression of opposition to repressive governments. Anonymity allows the theft of intellectual property, but anonymity also permits people to come together in settings that gives them some basis for free expression without identifying themselves.

None of this will be easy. I think that’s a fair statement. I think, as I said, we all have varying needs and rights and responsibilities. But I think these overriding principles should be our guiding light. We should err on the side of openness and do everything possible to create that, recognizing, as with any rule or any statement of principle, there are going to be exceptions.

So how we go after this, I think, is now what we’re requesting many of you who are experts in this area to lend your help to us in doing. We need the guidance of technology experts. In my experience, most of them are younger than 40, but not all are younger than 40. And we need the companies that do this, and we need the dissident voices who have actually lived on the front lines so that we can try to work through the best way to make that balance you referred to.

Secretary Clinton’s answer is trying to balance competing interests, which is what good politicians do. If we want A, and we want B, and A is in tension with B, can we have some A and some B together? Is there some way to give up a little A in exchange for a lot of B? That’s a useful way to start the discussion.

But sometimes you have to choose — sometimes A and B are profoundly incompatible. That seems to be the case here. Consider the position of a repressive government that wants to spy on a citizen’s political speech, as compared to the position of the U.S. government when it wants to eavesdrop on a suspect’s conversations under a valid search warrant. The two positions are very different morally, but they are pretty much the same technologically. Which means that either both governments can eavesdrop, or neither can. We have to choose.

Secretary Clinton saw this tension, and, being a lawyer, she saw that law could not resolve it. So she expressed the hope that technology, the aspect she understood least, would offer a solution. This is a common pattern: Given a difficult technology policy problem, lawyers will tend to seek technology solutions and technologists will tend to seek legal solutions. (Paul Ohm calls this “Felten’s Third Law”.) It’s easy to reject non-solutions in your own area because you have the knowledge to recognize why they will fail; but there must be a solution lurking somewhere in the unexplored wilderness of the other area.

If we’re forced to choose — and we will be — what kind of Internet will we have? In Secretary Clinton’s words, “the world’s information infrastructure will become what we and others make of it.” We’ll have a free Internet, if we can keep it.

There’s anonymity on the Internet. Get over it.

In a recent interview prominent antivirus developer Eugene Kaspersky decried the role of anonymity in cybercrime. This is not a new claim – it is touched on in the Commission on Cybersecurity for the 44th Presidency Report and Cybersecurity Act of 2009, among others – but it misses the mark. Any Internet design would allow anonymity. What renders our Internet vulnerable is primarily weakness of software security and authentication, not anonymity.

Consider a hypothetical of three Internet users: Alice, Bob, and Charlie. If Alice wants to communicate anonymously with Charlie, she may relay her messages through Bob. While Charlie knows Bob is an intermediary, Charlie does not know with whom he is ultimately communicating. For even greater anonymity Alice can pass her messages through multiple Bobs, and by applying cryptography she can ensure no individual Bob can piece together that she is communicating with Charlie. This basic approach to anonymity is remarkable in its independence of the Internet’s design: it only requires that some Bob(s) can and do run intermediary software. Even on an Internet where users could verify each other’s identity this means of anonymity would remain viable.

The sad state of software security – the latest DHS weekly bulletin alone identified over 40 “high severity” vulnerabilities – is what enables malicious users to exploit the Internet’s indelible capacity for anonymity. Modifying the prior hypothetical, suppose Alice now wants to spam, phish, denial of service (DoS) attack, or hack Charlie. After compromising Bob’s computer with malicious software (malware), Alice can send emails, host websites, and launch DoS attacks from it; Charlie knows Bob is apparently misbehaving, but has no means of discovering Alice’s role. Nearly all spam, phishing, and DoS attacks are now perpetrated with networks of compromised computers like Bob’s (botnets). At the writing of a July 2009 private sector report, just five botnets sourced nearly 75% of spam. Worse yet, botnets are increasingly self-perpetuating: spam and phishing websites propagate malware that compromises new computers for the botnet.

Shortcomings in authentication, the means of proving one’s identity either when necessary or at all times, are a secondary contributor to the Internet’s ills. Most applications rely on passwords, which are easily guessed or divulged through deception – the very mechanisms of most phishing and account hijacking. There are potential technical solutions that would enable a user to authenticate themselves without the risk of compromising accounts. But any approach will be undermined by weaknesses in underlying software security when a malicious party can trivially compromise a user’s computer.

The policy community is already trending towards acceptance of Internet anonymity and refocusing on software security and authentication; the recent White House Cyberspace Policy Review in particular emphasizes both issues. To the remaining unpersuaded, I can only offer at last a truism: There’s anonymity on the Internet. Get over it.

Net Neutrality: When is Network Management "Reasonable"?

Last week the FCC released its much-awaited Notice of Proposed Rulemaking (NPRM) on network neutrality. As expected, the NPRM affirms past FCC neutrality principles, and adds two more. Here’s the key language:

1. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from sending or receiving the lawful content of the user’s choice over the Internet.

2. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from running the lawful applications or using the lawful services of the user’s choice.

3. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from connecting to and using on its network the user’s choice of lawful devices that do not harm the network.

4. Subject to reasonable network management, a provider of broadband Internet access service may not deprive any of its users of the user’s entitlement to competition among network providers, application providers, service providers, and content providers.

5. Subject to reasonable network management, a provider of broadband Internet access service must treat lawful content, applications, and services in a nondiscriminatory manner.

6. Subject to reasonable network management, a provider of broadband Internet access service must disclose such information concerning network management and other practices as is reasonably required for users and content, application, and service providers to enjoy the protections specified in this part.

That’s a lot of policy packed into (relatively) few words. I expect that my colleagues and I will have a lot to say about these seemingly simple rules over the coming weeks.

Today I want to focus on the all-purpose exception for “reasonable network management”. Unpacking this term might tell us a lot about how the proposed rule would operate.

Here’s what the NPRM says:

Reasonable network management consists of: (a) reasonable practices employed by a provider of broadband Internet access to (i) reduce or mitigate the effects of congestion on its network or to address quality-of-service concerns; (ii) address traffic that is unwanted by users or harmful; (iii) prevent the transfer of unlawful content; or (iv) prevent the unlawful transfer of content; and (b) other reasonable network management practices.

The key word is “reasonable”, and in that respect the definition is nearly circular: in order to be “reasonable”, a network management practice must be (a) “reasonable” and directed toward certain specific ends, or (b) “reasonable”.

In the FCC’s defense, it does seek comments and suggestions on what the definition should be, and it does say that it intends to make case-by-case determinations in practice, as it did in the Comcast matter. Further, it rejects a “strict scrutiny” standard of the sort that David Robinson rightly criticized in a previous post.

“Reasonable” is hard to define because in real life every “network management” measure will have tradeoffs. For example, a measure intended to block copyright-infringing material would in practice make errors in both directions: it would block X% (less than 100%) of infringing material, while as a side-effect also blocking Y% (more than 0%) of non-infringing material. For what values of X and Y is such a measure “reasonable”? We don’t know.

Of course, declaring a vague standard rather than a bright-line rule can sometimes be good policy, especially where the facts on the ground are changing rapidly and it’s hard to predict what kind of details might turn out to be important in a dispute. Still, by choosing a case-by-case approach, the FCC is leaving us mostly in the dark about where it will draw the line between “reasonable” and “unreasonable”.