November 29, 2024

Do Not Track: Not as Simple as it Sounds

Over the past few weeks, regulators have rekindled their interest in an online Do Not Track proposal in hopes of better protecting consumer privacy. FTC Chairman Jon Leibowitz told a Senate Commerce subcommittee last month that Do Not Track is “one promising area” for regulatory action and that the Commission plans to issue a report in the fall about “whether this is one viable way to proceed.” Senator Mark Pryor (D-AR), who sits on the subcommittee, is also reportedly drafting a new privacy bill that includes some version of this idea, of empowering consumers with blanket opt-out powers over online tracking.

Details are sparse at this point about how a Do Not Track mechanism might actually be implemented. There are a variety of possible technical and regulatory approaches to the problem, each with its own difficulties and limitations, which I’ll discuss in this post.

An Adaptation of “Do Not Call”

Because of its name, Do Not Track draws immediate comparisons to arguably the most popular piece of consumer protection regulation ever instituted in the US—the National Do Not Call Registry. If the FTC were to take an analogous approach for online tracking, a consumer would register his device’s network identifier—its IP address—with the national registry. Online advertisers would then be prohibited from tracking devices that are identified by those IP addresses.

Of course, consumer devices rarely have persistent long-term IP addresses. Most ISPs assign IP addresses dynamically (using DHCP) and a single device might be assigned a new IP address every few minutes. Consumer devices often also share the same IP address at the same time (using NAT) so there’s no stable one-to-one mapping between IPs and devices. Things could be different with IPv6, where each device could have its own stable IP address, but the Do Not Call framework, directly applied, is not the best solution for today’s online world.

The comparison is still useful though, if only to caution against the assumption that Do Not Track will be as easy, or as successful, as Do Not Call. The differences between the problems at hand and the technologies involved are substantial.

A Registry of Tracking Domains

Back in 2007, a coalition of online consumer privacy groups lobbied for the creation of a national Do Not Track List. They proposed a reverse approach: online advertisers would be required to register with the FTC all domain names used to issue persistent identifiers to user devices. The FTC would then publish this list, and it would be up to the browser to protect users from being tracked by these domains. Notice that the onus here is fully on the browser—equipped with this list—to protect the user from being uniquely identified. Meanwhile, online advertisers would still have free rein to try any method they wish to track user behavior, so long as it happens from these tracking domains.

We’ve learned over the past couple of years that modern browsers, from a practical perspective, can be limited in their ability to protect the user from unique identification. The most stark example of this is the browser fingerprinting attack, which was popularized by the EFF earlier this year. In this attack, the tracking site runs a special script that gathers information about the browser’s configurations, which are unique enough to identify the browser instance in nearly every case. The attack takes advantage of the fact that much of the gathered information is used frequently for legitimate purposes—such as determining which plugins are available to the site—so a browser which blocks the release of this information would surely irritate the user. As these kinds of “side-channel” attacks grow in sophistication, major browser vendors might always be playing catch-up in the technical arms race, leaving most users vulnerable to some form of tracking by these domains.

The x-notrack Header

If we believe that browsers, on their own, will be unable to fully protect users, then any effective Do No Track proposal will need to place some restraints on server tracking behavior. Browsers could send a signal to the tracking server to indicate that the user does not want this particular interaction to be tracked. The signaling mechanism could be in the form of a standard pre-defined cookie field, or more likely, an HTTP header that marks the user’s tracking preference for each connection.

In the simplest case, the HTTP header—call it x-notrack—is a binary flag that can be turned on or off. The browser could enable x-notrack for every HTTP connection, or for connections to only third party sites, or for connections to some set of user-specified sites. Upon receiving the signal not to track, the site would be prevented, by FTC regulation, from setting any persistent identifiers on the user’s machine or using any other side-channel mechanism to uniquely identify the browser and track the interaction.

While this approach seems simple, it could raise a few complicated issues. One issue is bifurcation: nothing would prevent sites from offering limited content or features to users who choose to opt-out of tracking. One could imagine a divided Web, where a user who turns on the x-notrack header for all HTTP connections—i.e. a blanket opt-out—would essentially turn off many of the useful features on the Web.

By being more judicious in the use of x-notrack, a user could permit silos of first-party tracking in exchange for individual feature-rich sites, while limiting widespread tracking by third parties. But many third parties offer useful services, like embedding videos or integrating social media features, and they might require that users disable x-notrack in order to access their services. Users could theoretically make a privacy choice for each third party, but such a reality seems antithetical to the motivations behind Do Not Track: to give consumers an easy mechanism to opt-out of harmful online tracking in one fell swoop.

The FTC could potentially remedy this scenario by including some provision for “tracking neutrality,” which would prohibit sites from unnecessarily discriminating against a user’s choice not to be tracked. I won’t get into the details here, but suffice it to say that crafting a narrow yet effective neutrality provision would be highly contentious.

Privacy Isn’t a Binary Choice

The underlying difficulty in designing a simple Do Not Track mechanism is the subjective nature of privacy. What one user considers harmful tracking might be completely reasonable to another. Privacy isn’t a single binary choice but rather a series of individually-considered decisions that each depend on who the tracking party is, how much information can be combined and what the user gets in return for being tracked. This makes the general concept of online Do Not Track—or any blanket opt-out regime—a fairly awkward fit. Users need simplicity, but whether simple controls can adequately capture the nuances of individual privacy preferences is an open question.

Another open question is whether browser vendors can eventually “win” the technical arms race against tracking technologies. If so, regulations might not be necessary, as innovative browsers could fully insulate users from unwanted tracking. While tracking technologies are currently winning this race, I wouldn’t call it a foregone conclusion.

The one thing we do know is this: Do Not Track is not as simple as it sounds. If regulators are serious about putting forth a proposal, and it sounds like they are, we need to start having a more robust conversation about the merits and ramifications of these issues.

New Search and Browsing Interface for the RECAP Archive

We have written in the past about RECAP, our project to help make federal court documents more easily accessible. We continue to upgrade the system, and we are eager for your feedback on a new set of functionality.

One of the most-requested RECAP features is a better web interface to the archive. Today we’re releasing an experimental system for searching and browsing, at archive.recapthelaw.org. There are also a couple of extra features that we’re eager to get feedback on. For example, you can subscribe to an RSS feed for any case in order to get updates when new documents are added to the archive. We’ve also included some basic tagging features that lets anybody add tags to any case. We’re sure that there will be bugs to be fixed or improvements that can be made. Please let us know.

The first version of the system was built by an enterprising team of students in Professor Ed Felten’s “Civic Technologies” course: Jen King, Brett Lullo, Sajid Mehmood, and Daniel Mattos Roberts. Dhruv Kapadia has done many of the subsequent updates. The links from the Recap Archive pages point to files on our gracious host, the Internet Archive.

See, for example, the RECAP Archive page for United States of America v. Arizona, State of, et al. This is the Arizona District Court case in which the judge last week issued an order granting injunction against several portions of the controversial immigration law. As you can see, some of the documents have a “Download” link that allows you to directly download the document from the Internet Archive, whereas others have a “Buy from PACER” link because no RECAP users have yet liberated the document.

A Major Internet Milestone: DNSSEC and SSL

On July 15th, a small but significant internet event occurred. On that day, years of planning culminated in the deployment of a cryptographic signature on the root DNS zone. To simplify greatly, this means that internet users will soon be able to have a much higher degree of trust in the hierarchical Domain Name System by utilizing the powers of recursion and cryptography. When a user’s computer is told that the IP address for “gmail.com” is 72.14.204.19, the user can be sure that this answer is true. This is important if you are someone such as a Chinese dissident who wants to reliably and securely reach gmail.com in order to communicate with your peers. The rollout of this throughout all domains, DNS resolvers, and client applications will take a little while, but the basic infrastructure is now in place.

This mitigates a certain class of vulnerabilities that web users used to face. Although it forecloses attacks at the domain name-to-IP address stage of requesting a web page, it does not necessarily foreclose attacks at other stages. For instance, an attacker that gets between you and the server you are trying to reach can simply claim that he is the server at 72.14.204.19. Our traditional way of protecting against this style of attack has been to rely on Certificate Authorities — trusted third-parties who certify digital key-pairs only for the true owners of a given domain name. Thus, even if an attacker tries to execute one of these “man-in-the-middle” attacks, he won’t possess the secret portion of the digital key-pair that is required to prove that his communications come from the true gmail.com. Your browser checks for a certified corresponding public key in the process of setting up a secure SSL/TLS connection to https://gmail.com.

Unfortunately, there are several technical, operational, and jurisdictional shortcomings of the Certificate Authority model. As I discussed in an earlier post, many of these problems are not present in the hierarchical and delegated model of DNS. However, DNS does not inherently provide the ability to store domain name-to-key-pair information. But could it? At one of the recent DNSSEC deployment ceremonies, Vint Cerf noted:

More has happened here today than meets the eye. An infrastructure has been created for a hierarchical security system, which can be purposed and re-purposed in a number of different ways. And so I would predict that although we started out putting this system together to assure that the domain name lookups return valid internet addresses, that in the long run this hierarchical structure of trust will be applied to a number of other functions that require strong authentication. And so you will have seen a new major milestone in the internet story.

I believe that storing SSL/TLS keys in DNSSEC-secured DNS records will be the first significant “other function” that will emerge. An alternative to Certificate Authorities for domain-to-key mapping is sorely needed. There are two major practical hurdles to getting there: 1) We must define a standard for placing keys in DNS and 2) We must secure the “last mile” from the service provder’s DNS resolver to the end-user’s computer.

The first hurdle involves the type of standard-setting that the internet community is quite familiar with. On a technical level, it means that we need to collectively decide what these DNS records look like. The second hurdle involves building more functionality into end users’ software so that it can do cryptographic validation of DNS results rather than blindly trusting its upstream DNS resolver. There may be temporary ways to do this within web browser code, but ultimately it will probably have to be built into what is called the “stub resolver” — a local service running on your computer that usually just asks for the results from the upstream resolver.

It is important to note that none of his makes Certificate Authorities obsolete. Although the DNS-based approach replaces the most basic type of SSL certificates, the Certificate Authorities will continue to be the only entities that can offer validation of real-world identity of site owners. The DNS-based approach and basic “domain validated” Certificate Authority certificates both verify only that whoever controls the domain name is the entity that your computer is communicating with, without saying who that is. In recent years, “Extended Validation” certificates (the ones that make your browser bar glow green) have begun to be offered by all major certificate authorities. These certificates require more rigorous validation of the identity of the owner, so that for example you know that the person who controls bankofamerica.com is really Bank of America Corporation.

At this year’s Black Hat and Defcon, Dan Kaminsky demonstrated some new software he is releasing that could make deploying DNSSEC more easy in general, and that could also address the two main hurdles to placing keys in DNS. He readily admits that his particular implementation is not perfect, and has encouraged critiques and changes. [Update: His slides are available here.]

Hopefully, with the input of the many smart folks in the security, internet standards, and software development communities, we will see a production-quality DNSSEC-secured solution to domain-to-key authentication in the near future.

My Work at CITP This Year: Judicial Policy, Public Access, and The Electronic Court

Hi. My name is Ron Hedges. I am a Visiting Research Collaborator with the CITP for 2010-11.

Let me tell you a little about myself. I am a graduate of the University of Maryland and Georgetown University Law Center. I spent over twenty years as a United States Magistrate Judge and sat in Newark, NJ. I came to the Center through my work with the use and abuse of electronic information in civil litigation in the United States Courts. Several years ago, I wrote a decision on the subjects of “preservation” and “spoliation” electronic information. That led me to The Sedona Conference, a think-tank of academics, attorneys, and judges who focus on electronic information and other things. Today, I’m on a Sedona advisory board and work on, among other things, confidentiality, public access, and electronic information in criminal actions. For information on Sedona, go to www.thesedonaconference.org.

This year, I hope to work with the Center to update something Sedona did a few years ago on confidentiality and public access in civil litigation. Our society prizes two conflicting values: openness in our judicial system and protection for matters of personal privacy and “protected” information. Examples of the latter are trade secrets and personal medical information. How we as a society reconcile openness and protection in civil litigation was the theme of The Sedona Guidelines on Confidentiality and Public Access, published in March of 2007. This document is not focused on electronic information and offers only general guidance on access to electronic information managed by courts. I hope to use my time at CITP to conduct a symposium on confidentiality and access and to move The Sedona Guidelines forward.

Another project for 2010-11 would be to consider the automation of the review of electronic information for “relevance” and “privilege.” Relevance is a simple, but often misunderstood, concept. To be relevant, information must tend to either prove – or disprove – something. Privilege is also simple, but often misunderstood. To be privileged (in a broad sense), information must be either subject to either the “attorney client privilege” or “work product.” Privileged information need not be turned over to an adversary and, if it is turned over, there can be serious consequences. Not surprisingly, human review for privilege is estimated to account for about half of the cost of litigation.

The “holy grail” of litigation is to come up with an automated process or processes for relevance and privilege review that is reasonable. The process must also be something that can be explained to laypeople (i.e., judges and lawyers). Research is being spearheaded by NIST, and I hope to have CITP sponsor a program on automated search that would feature, among others, Jason Baron of NARA and Maura Grossman of the Wachtell firm. They have led the NARA initiative and are prominent exponents of automated review.

Finally, I hope to offer a symposium or class to introduce technology-oriented folks like you to the intricacies of the law as it deals with electronic information.

Please give me your thoughts as we move toward the Fall semester.

Jailbreaking Copyright's Extended Scope

A bit late for the rule’s “triennial” cycle, the Librarian of Congress has released the sec 1201(a)(1)(C) exceptions from the DMCA prohibitions on circumventing copyright access controls. For the next three years, people will not be ” circumventing” if they “jailbreak” or unlock their smartphones, remix short portions of motion pictures on DVD (if they are college and university professors or media students, documentary filmmakers, or non-commercial video-makers), research the security of videogames, get balky obsolete dongled programs to work, or make an ebook read-aloud. (I wrote about the hearings more than a year ago, when the movie studios demoed camcording a movie — that didn’t work to stop the exemption.)

Since I’ve criticized the DMCA’s copyright expansion, I was particularly interested in the inter-agency debate over EFF’s proposed jailbreak exemption. Even given the expanded “para-copyright” of anticircumvention, the Register of Copyrights and NTIA disagreed over how far the copyright holder’s monopoly should reach. The Register recommended that jailbreaking be exempted from circumvention liability, while NTIA supported Apple’s opposition to the jailbreak exemption.

According to the Register (PDF), Apple’s “access control [preventing the running of unapproved applications] does not really appear to be protecting any copyright interest.” Apple might have had business reasons for wanting to close its platform, including taking a 30% cut of application sales and curating the iPhone “ecosystem,” those weren’t copyright reasons to bar the modification of 50 bytes of code.

NTIA saw it differently. In November 2009, after receiving preliminary recommendations from Register Peters, Asst. Secretary Larry Strickling wrote (PDF):

NTIA does not support this proposed exemption [for cell phone jailbreaking]…. Proponents argue that jailbreaking will support open communications platforms and the rights of consumers to take maximum advantage of wireless networks and associated hardware and software. Even if permitting cell phone “jailbreaking” could facilitate innovation, better serve consumers, and encourage the market to utilize open platforms, it might just as likely deter innovation by not allowing the developer to recoup its development costs and to be rewarded for its innovation. NTIA shares proponents’ enthusiasm for open platforms, but is concerned that the proper forum for consideration of these public policy questions lies before the expert regulatory agencies, the U.S. Department of Justice and the U.S. Congress.

The debate affects what an end-user buys when purchasing a product with embedded software, and how far copyright law can be leveraged to control that experience and the market. Is it, as Apple would have it, only the right to use the phone in the closed “ecosystem” as dictated by Apple, with only exit (minus termination fees) if you don’t like it there? or is it a building block, around which the user can choose a range of complements from Apple and elsewhere? In the first case, we see the happenstance of software copyright locking together a vertically integrated or curated platform, forcing new entrants to build the whole stack in order to compete. In the second, we see opportunities for distributed innovation that starts at a smaller scale: someone can build an application without Apple’s approval, improving the user’s iPhone without starting from scratch.

NTIA would send these “public policy” questions to Congress or the Department of Justice (antitrust), but the Copyright Office and Librarian of Congress properly handled them here. “[T]he task of this rulemaking is to determine whether the availability and use of access control measures has already diminished or is about to diminish the ability of the public to engage in noninfringing uses of copyrighted works similar or analogous to those that the public had traditionally been able to make prior to the enactment of the DMCA,” the Register says. Pre-DMCA, copyright left room for reverse engineering for interoperability, for end-users and complementors to bust stacks and add value. Post-DMCA, this exemption helps to restore the balance toward noninfringing uses.

In a related vein, economists have been framing research into proprietary strategies for two-sided markets, in which a platform provider is mediating between two sets of users — such as iPhone’s end-users and its app developers. In their profit-maximizing interests, proprietors may want to adjust both price and other aspects of their platforms, for example selecting fewer app developers than a competitive market would support so each earns a scarcity surplus it can pay to Apple. But just because proprietors want a constrained environment does not mean that the law should support them, nor that end-users are better off when the platform-provider maximizes profits. Copyright protects individual works against unauthorized copying; it should not be an instrument of platform maintenance — not even when the platform is or includes a copyrighted work.