April 23, 2014

avatar

Do Not Track: Not as Simple as it Sounds

Over the past few weeks, regulators have rekindled their interest in an online Do Not Track proposal in hopes of better protecting consumer privacy. FTC Chairman Jon Leibowitz told a Senate Commerce subcommittee last month that Do Not Track is “one promising area” for regulatory action and that the Commission plans to issue a report in the fall about “whether this is one viable way to proceed.” Senator Mark Pryor (D-AR), who sits on the subcommittee, is also reportedly drafting a new privacy bill that includes some version of this idea, of empowering consumers with blanket opt-out powers over online tracking.

Details are sparse at this point about how a Do Not Track mechanism might actually be implemented. There are a variety of possible technical and regulatory approaches to the problem, each with its own difficulties and limitations, which I’ll discuss in this post.

An Adaptation of “Do Not Call”

Because of its name, Do Not Track draws immediate comparisons to arguably the most popular piece of consumer protection regulation ever instituted in the US—the National Do Not Call Registry. If the FTC were to take an analogous approach for online tracking, a consumer would register his device’s network identifier—its IP address—with the national registry. Online advertisers would then be prohibited from tracking devices that are identified by those IP addresses.

Of course, consumer devices rarely have persistent long-term IP addresses. Most ISPs assign IP addresses dynamically (using DHCP) and a single device might be assigned a new IP address every few minutes. Consumer devices often also share the same IP address at the same time (using NAT) so there’s no stable one-to-one mapping between IPs and devices. Things could be different with IPv6, where each device could have its own stable IP address, but the Do Not Call framework, directly applied, is not the best solution for today’s online world.

The comparison is still useful though, if only to caution against the assumption that Do Not Track will be as easy, or as successful, as Do Not Call. The differences between the problems at hand and the technologies involved are substantial.

A Registry of Tracking Domains

Back in 2007, a coalition of online consumer privacy groups lobbied for the creation of a national Do Not Track List. They proposed a reverse approach: online advertisers would be required to register with the FTC all domain names used to issue persistent identifiers to user devices. The FTC would then publish this list, and it would be up to the browser to protect users from being tracked by these domains. Notice that the onus here is fully on the browser—equipped with this list—to protect the user from being uniquely identified. Meanwhile, online advertisers would still have free rein to try any method they wish to track user behavior, so long as it happens from these tracking domains.

We’ve learned over the past couple of years that modern browsers, from a practical perspective, can be limited in their ability to protect the user from unique identification. The most stark example of this is the browser fingerprinting attack, which was popularized by the EFF earlier this year. In this attack, the tracking site runs a special script that gathers information about the browser’s configurations, which are unique enough to identify the browser instance in nearly every case. The attack takes advantage of the fact that much of the gathered information is used frequently for legitimate purposes—such as determining which plugins are available to the site—so a browser which blocks the release of this information would surely irritate the user. As these kinds of “side-channel” attacks grow in sophistication, major browser vendors might always be playing catch-up in the technical arms race, leaving most users vulnerable to some form of tracking by these domains.

The x-notrack Header

If we believe that browsers, on their own, will be unable to fully protect users, then any effective Do No Track proposal will need to place some restraints on server tracking behavior. Browsers could send a signal to the tracking server to indicate that the user does not want this particular interaction to be tracked. The signaling mechanism could be in the form of a standard pre-defined cookie field, or more likely, an HTTP header that marks the user’s tracking preference for each connection.

In the simplest case, the HTTP header—call it x-notrack—is a binary flag that can be turned on or off. The browser could enable x-notrack for every HTTP connection, or for connections to only third party sites, or for connections to some set of user-specified sites. Upon receiving the signal not to track, the site would be prevented, by FTC regulation, from setting any persistent identifiers on the user’s machine or using any other side-channel mechanism to uniquely identify the browser and track the interaction.

While this approach seems simple, it could raise a few complicated issues. One issue is bifurcation: nothing would prevent sites from offering limited content or features to users who choose to opt-out of tracking. One could imagine a divided Web, where a user who turns on the x-notrack header for all HTTP connections—i.e. a blanket opt-out—would essentially turn off many of the useful features on the Web.

By being more judicious in the use of x-notrack, a user could permit silos of first-party tracking in exchange for individual feature-rich sites, while limiting widespread tracking by third parties. But many third parties offer useful services, like embedding videos or integrating social media features, and they might require that users disable x-notrack in order to access their services. Users could theoretically make a privacy choice for each third party, but such a reality seems antithetical to the motivations behind Do Not Track: to give consumers an easy mechanism to opt-out of harmful online tracking in one fell swoop.

The FTC could potentially remedy this scenario by including some provision for “tracking neutrality,” which would prohibit sites from unnecessarily discriminating against a user’s choice not to be tracked. I won’t get into the details here, but suffice it to say that crafting a narrow yet effective neutrality provision would be highly contentious.

Privacy Isn’t a Binary Choice

The underlying difficulty in designing a simple Do Not Track mechanism is the subjective nature of privacy. What one user considers harmful tracking might be completely reasonable to another. Privacy isn’t a single binary choice but rather a series of individually-considered decisions that each depend on who the tracking party is, how much information can be combined and what the user gets in return for being tracked. This makes the general concept of online Do Not Track—or any blanket opt-out regime—a fairly awkward fit. Users need simplicity, but whether simple controls can adequately capture the nuances of individual privacy preferences is an open question.

Another open question is whether browser vendors can eventually “win” the technical arms race against tracking technologies. If so, regulations might not be necessary, as innovative browsers could fully insulate users from unwanted tracking. While tracking technologies are currently winning this race, I wouldn’t call it a foregone conclusion.

The one thing we do know is this: Do Not Track is not as simple as it sounds. If regulators are serious about putting forth a proposal, and it sounds like they are, we need to start having a more robust conversation about the merits and ramifications of these issues.

Comments

  1. James says:

    I think it’s quite possible that some companies might obey the do not track binary switch. However I am sure there are plenty who won’t. Blocking cookies helps but very few will understand why they shouldn’t log into a website which doesn’t respect their privacy.

    It’s really tough to figure out. I don’t have it figured out. I think that if there was any measure to put into place absolutely, it would be a requirement for all ISPs to protect or mask the IP identity of a user, however it turns out exactly the opposite has happened over time. Forum admins need to identify IPs to keep malicious types out of their properties.

    It is a tough call… I think the answer is in regulations. I just wanted to comment, sorry for my ramblings.

  2. Jim Brock says:

    Excellent and thoughtful post.

    The goal cannot be to find a technical guarantee that consumers will never be tracked. It’s not realistic to expect that browser or add-on makers have the incentive or the means to neutralize tracking methods like browser fingerprints or IP address tracking any time soon. As tracking proliferates across mobile devices and apps, these challenges are compounded.

    Rather, what matters is finding a technical framework for preference signaling that ad companies can easily implement and which are compatible with simple browser-based methods. Although there will always be some bad players who don’t abide by the rule, if ad companies can be “certified” as either compliant or not, and websites can use that certification to determine which tags are reputable, the footprint of non-reputable ad tracking companies can be greatly limited.

    In addition to finding a simple technical means for consumers to signal preferences, the FTC should focus on how to ensure that there is industry oversight on ad-company back-end processes that will always be opaque to users. Meaningful and independent reviews can support consumer confidence that preferences are effective.

    http://privacychoice.wordpress.com

  3. dr2chase says:

    Speaking from the POV of someone running a website, that collects no PII, we nonetheless have a use for IP addresses recorded in logs, and that is keeping track of the Bad Guys, who are persistently attempting to break in (when you see someone sending admin.php requests at a machine that has no PHP installed, that’s a bit of a clue).

  4. Maxim K says:

    I think we should not try to prevent online tracking but rather require disclosure of the tracking information.

    • rp says:

      Although it’s obviously difficult to enforce a rule that says “don’t aggregate or analyze the data you have” I agree that the ideal makes sense. In so many cases information is innocently useful by itself, and only dangerous in combination with other chunks of also-individually-innocent information. If we trusted (either motivation or ability) any of the companies that gather data from users and consumer on the web, it would be an easy principle to state.

      I wonder if there’s a reasonably effective way to tag users data so that certain proscribed combinations of it will be immediately verifiable. Then checking do-not-track compliance would be fairly straightforward.

      Another possibility is the creation of honeytrap users — with the right knowledge of tracking systems it would be fairly easy (ha) to create a bot cloud of automated personas whose full profiles could be assembled only by proscribed aggregation and analysis. Anyone who makes use of that full profile would then face both appropriate sanctions and full discovery of their operation.

  5. Matt says:

    Mark Pryor is from AL, not AK.

  6. Lori says:

    One way in which do-not-track is not analogous to do-not-call: calls are inbound messages, tracking outbound. Do-not-track would be analogous not to do-not-call, but do-not-accumulate-lists-of-phone-numbers. Of course what you are proposing to regulate is the delivery of adware based on tracked information.

    One argument against this practice is that some of us (prepaid wireless broadband customers, and others) pay by the byte for internet access. For us, a server with a policy of making sure a payload of salesmanship is delivered prior to actual content, is effectively putting its content behind a literal paywall, whether or not our browsers are effectively keeping the ads out of our literal faces. A relevant precedent is the fact that unsolicited faxes are [still?] strictly illegal based on the fact that they are an non-consensual financial burden on their recipients.

    Another plausible line of argument is an appeal to efficiency. Diluted signal-to-noise ratio thankx to the ‘value subtracted’ business model that is advertising is inefficient use of bandwidth. The Internet runs on electricity, and runaway energy consumption threatens global warming. Even amateur radio operators are required by law to make the most efficient possible use of both energy and bandwidth. Why should our increasingly wireless Internet be any different?

  7. The Heretic says:

    The problem is that any law or regulation or policy from the FTC would only be enforceable on US companies. Do you really think that some website from Russia (or China, or any other country for that matter) is going to respect US laws regarding any kind of ‘do not track’ or registering tracking servers?

    If they don’t have to, then it would only be a short time until a lot of websites would be hosted from servers in those countries in order to track people’s browsing habits online.

    Moreover, the tracking of my browsing habits by commercial enterprises in order to more effectively target me as a consumer is relatively benign compared to the tracking the government does in order to determine whether I am a threat to their power-base (whether it be a physical threat, or a political threat because I disagree with their policies). Do you really think the US government is going to honor any of these policies? Even if they did they can still gather and analyze all info that passes over the internet as they see fit and necessary.

    • rp says:

      As long as you want to do business in the US, you follow the rules. Sure, there will be a black markert, but if the FTC were serious about enforcement any online payments involving a US entity would have to abide by the regs. MC/Visa and Paypal are pretty big targets.

  8. Lori says:

    The idea that surveillence by business is INHERENTLY benign relative to surveillance by government is a truly flawed idea, and if anything the belief that there’s real Power (and hence corruption) in the private sector is the truly heretical position in the current marketplace of ideas, dominated as it is by the teachings of neoclassical economics.

    Since de facto power is largely private, so must be de facto recouse against the same. For this reason, legislative remedies amount to so much pissing in the wind. Countermeasures must be technological, not legal or political. Also, it is time to write off privacy as a lost cause, aiming instead for symmetry. Answer proprietary data mining with public domain data mining. Answer market research on consumers with market research on producers and vendors. Answer video and audio surveillance with sousveillance.

  9. Steve R. says:

    The phrase “empowering consumers with blanket opt-out powers over online tracking” is a joke. It should not be the consumer who has to take a protective action, it is the data collectors who should be restrained from doing anything with that data.

    Data collection by firms is an obvious necessity, but they should not be entitled to do anything with that data beyond the immediate need for that data. The selling/renting/loaning/sharing data with so-called “partners” should be prohibited.

    • rp says:

      In ostensibly-civilized countries excluding the US, the situation 10 years ago was pretty much that people had to opt in, that you couldn’t resell or repurpose data without affirmative consent, and that you couldn’t coerce people into giving up information that wasn’t necessary for the transaction you were doing. Oh, yeah, and you could (eventually) go to jail for violating those rules. Then came the GWOT, and better datamining, and a bunch of rules so that you could pretend personal data was being safeguarded…

      Things seem to have been going in the wrong direction.