June 25, 2017

Archives for June 2017

Killing car privacy by federal mandate

The US National Highway Traffic Safety Administration (NHTSA) is proposing a requirement that every car should broadcast a cleartext message specifying its exact position, speed, and heading ten times per second. In comments filed in April, during the 90-day comment period, we (specifically, Leo Reyzin, Anna Lysyanskaya, Vitaly Shmatikov, Adam Smith, together with the CDT via Joseph Lorenzo Hall and Joseph Jerome) argued that this requirement will result in a significant loss to privacy. Others have aptly argued that the proposed system also has serious security challenges and cannot prevent potentially deadly malicious broadcasts, and that it will be outdated before it is deployed. In this post I focus on privacy, though I think security problems and resulting safety risks are also important to consider.

The basic summary of the proposal, known as Dedicated Short Range Communication (DSRC), is as follows. From the moment a car turns on and every tenth of a second until it shuts off, it will broadcast a so-called “basic safety message” (BSM) to within a minimum distance of 300m. The message will include position (with accuracy of 1.5m), speed, heading, acceleration, yaw rate, path history for the past 300m, predicted path curvature, steering wheel angle, car length and width rounded to 20cm precision, and a few other indicators. Each message will also include a temporary vehicle id (randomly generated and changed every five minutes), to enable receivers to tell whether they are hearing from the same car or from different cars.

Under the proposal, each message will be digitally signed. Each car will be provisioned with 20 certificates (and corresponding secret keys) per week, and will cycle through these certificates during the week, using each one for five minutes at a time. Certificates will be revocable; revocation is meant to guard against incorrect (malicious or erroneous) information in the broadcast messages, though there is no concrete proposal for how to detect such incorrect information.

It is not hard to see that if such a system were to be deployed, a powerful antenna could easily listen to messages from well over the 300m design radius (we’ve seen examples of design range being extended by two or three orders of magnitude through the use of good antennas with bluetooth and wifi). Combining data from several antennas, one could easily link messages together, figuring out where each car was parked, what path it took, and where it ended up. This information will often enable one to link the car to an individual–for example, by looking at the address where the car is parked at night.

The fundamental privacy problem with the proposal is that messages can be linked together even though they have no long-term ids. The linking is simplest, of course, when the temporary id does not change, which makes it easy to track a car for five minutes. When the temporary id changes, two consecutive messages can be easily linked using the high-precision position information they contain. One also doesn’t have to observe the exact moment that the temporary id changes: it is possible to link messages by a variety of so-called “quasi-identifiers,” such as car dimensions; position in relation to other cars; the relationship between acceleration, steering wheel angle, and yaw, which will differ for different models; variability in how different models calculate path history; repeated certificates; etc. You can read more about various linking methods in our comments; and in comments by the EFF.

Thus, by using an antenna and a laptop, one could put a neighborhood under ubiquitous real-time surveillance — a boon to stalkers and burglars. Well-resourced companies, crime bosses, and government agencies could easily surveill movements of a large population in real time for pennies per car per year.

To our surprise, the NHTSA proposal did not consider the cost of lost privacy in its cost-benefit analysis; instead, it considered only “perceived” privacy loss as a cost. The adjective “perceived” in this context is a convenient way to dismiss privacy concerns as figments of imagination, despite the fact that NHTSA-commissioned analysis found that BSM-based tracking would be quite easy.

What about the safety benefits of proposed technology? Are they worth the privacy loss? As the EFF and Brad Templeton (among others) have argued, the proposed mandate will take away money from other safety technologies that are likely to have broader applications and raise fewer privacy concerns. The proposed technology is already becoming outdated, and will be even more out of date by the time it is deployed widely enough to make any difference.

But, you may object, isn’t vehicle privacy already dead? What about license plate scanners, cell-phone-based tracking, or aerial tracking from drones? Indeed, all of these technologies are a threat to vehicle privacy. None of them, however, permits tracking quite as cheaply, undetectably, and pervasively. For example, license-plate scanners require visual contact and are more conspicuous that a hidden radio antenna would be. A report commissioned by NHTSA concluded that other approaches did not seem practical for aggregate tracking.

Moreover, it is important to avoid the fallacy of relative privation: even if there are other ways of tracking cars today, we should not add one more, which will be mandated by the government for decades to come. To fix existing privacy problems, we can work on technical approaches for making cell phones harder to track or on regulatory restrictions on the use of license plate scanners. Instead of creating new privacy problems that will persist for decades, we should be working on reducing the ones that exist.

Lessons of 2016 for U.S. Election Security

The 2016 election was one of the most eventful in U.S. history. We will be debating its consequences for a long time. For those of us who pay attention to the security and reliability of elections, the 2016 election teaches some important lessons. I’ll review some of them in this post.

First, though, let’s review what has not changed. The level of election security varies considerably from place to place in the United States, depending on management, procedures, and of course technology choices. Places that rely on paperless voting systems, such as touchscreen voting machines that record votes directly in computer memories (so-called DREs), are at higher risk, because of the malleability of computer memory and the lack of an auditable record of the vote that was seen directly by the voter. Much better are systems such as precinct-count optical scan, in which the voter marks a paper ballot and feeds the ballot through an electronic scanner, and the ballot is collected in a ballot box as a record of the vote. The advantage of such a system is that a post-election audit that compares a random sample of paper ballots to the corresponding electronic records can verify with high confidence that the election results are consistent with what voters saw. Of course, you have to make the audit a routine post-election procedure.

Now, on to the lessons of 2016.

The first lesson is that nation-state adversaries may be more aggressive than we had thought. Russia took aggressive action in advance of the 2016 U.S. election, and showed signs of preparing for an attack that would disrupt or steal the election. Fortunately they did not carry out such an attack–although they did take other actions to influence the election. In the future, we will have to assume the presence of aggressive, highly capable nation-state adversaries, which we knew to be possible in principle before, but now seem more likely.

The second lesson is that we should be paying more attention to attacks that aim to undermine the legitimacy of an election rather than changing the election’s result. Election-stealing attacks have gotten most of the attention up to now–and we are still vulnerable to them in some places–but it appears that external threat actors may be more interested in attacking legitimacy.

Attacks on legitimacy could take several forms. An attacker could disrupt the operation of the election, for example, by corrupting voter registration databases so there is uncertainty about whether the correct people were allowed to vote. They could interfere with post-election tallying processes, so that incorrect results were reported–an attack that might have the intended effect even if the results were eventually corrected. Or the attacker might fabricate evidence of an attack, and release the false evidence after the election.

Legitimacy attacks could be easier to carry out than election-stealing attacks, as well. For one thing, a legitimacy attacker will typically want the attack to be discovered, although they might want to avoid having the culprit identified. By contrast, an election-stealing attack must avoid detection in order to succeed. (If detected, it might function as a legitimacy attack.)

The good news is that steps like adopting auditable paper ballots and conducting routine post-election audits are useful against both election-stealing and legitimacy attacks. If we have strong evidence of voter intent, this will make election-stealing harder, and it will make falsified evidence of election-stealing less plausible. But attacks that aim to disrupt the election process may require different types of defenses.

One thing is certain: election workers have a very difficult job, and they need all of the help they can get, from the best technology to the best procedures, if we are going to reach the level of security we need.

Web Census Notebook: A new tool for studying web privacy

As part of the Web Transparency and Accountability Project, we’ve been visiting the web’s top 1 million sites every month using our open-source privacy measurement tool OpenWPM. This has led to numerous worrying findings such as the systematic abuse of newly introduced web features for fingerprinting, leading to better privacy tools and occasionally strong responses from browser vendors.

Enabling research is great — OpenWPM has led to 14 papers so far — but research is slow and requires expertise. To make our work more directly useful, today we’re announcing a new tool to study web privacy: a Jupyter notebook interface and a set of libraries to quickly answer most questions about web tracking by querying the the 500 GB of data we collect every month.

Jupyter notebook is an intuitive tool for data analysis using Python, and it’s what we use here internally for much of our own research. Notebooks are accessible with a simple web interface; yet the code, data, and memory persists on the server if you close the browser and return to it later (even from a different device). Notebooks combine code with visualizations, making them ideal for data exploration and analysis.

Who could benefit from this tool? We envision uses such as these:

  • Publishers could use our data to understand third-party tracking on their own websites.
  • Journalists could use our data to investigate and expose privacy-infringing practices.
  • Regulators and enforcement agencies could use our tool in investigations.
  • Creators of browser privacy tools could use our data to test their effectiveness.

Let’s look at an example that shows the feel of the interface. The code below computes the average number of embedded trackers on the top 100 websites in various categories such as “news” and “shopping”. It is intuitive and succinct. Without our interface, not only would the SQL version of this query be much more cumbersome, but it would require a ton of legwork and setup to even get to a point where you can write the query. Now you just need to point your browser at our notebook.

    for category, domains in census.first_parties.alexa_categories.items():
        avg = sum(1 for first_party in domains[:100]
                    for third_party in first_party.third_party_resources
                    if third_party.is_tracker) / 100
        print("Average number of trackers on %s sites: %.1f" % (category, avg))

The results confirm our finding that news sites have the most trackers, and adult sites the least. [1]

Here’s what happens behind the scenes:

  • census is a Python object that exposes all the relationships between websites and third parties as object attributes, hiding the messy details of the underlying database schema. Each first party is represented by a FirstParty object that gives access to each third-party resource (URI object) on the first party, and the ThirdParty that the URI belongs to. When the objects are accessed, they are instantiated automatically by querying the database.
  • census.first_parties is a container of FirstParty objects ordered by Alexa traffic rank, so you can easily analyze the top sites, or sites in the long tail, or specific sites. You can also easily slice the sites by category: in the example above, we iterate through each category of census.first_parties.alexa_categories.
  • There’s a fair bit of logic that goes into analyzing the crawl data which third parties are embedded on which websites, and cross-referencing that with tracking-protection lists to figure out which of those are trackers. This work is already done for you, and exposed via attributes such as ThirdParty.is_tracker.

Since the notebooks run on our server, we expect to be able to support only a limited number (a few dozen) at this point, so you need to apply for access. The tool is currently in beta as we smooth out rough edges and add features, but it is usable and useful. Of course, you’re welcome to run the notebook on your own server — the underlying crawl datasets are public, and we’ll release the code behind the notebooks soon. We hope you find this of use to you, and we welcome your feedback.


[1] The linked graph from our paper measures the number of distinct domains whereas the query above counts every instance of every tracker. The trends are the same in both cases, but the numbers are different. Here’s the output of the query:


Average number of third party trackers on computers sites: 41.0
Average number of third party trackers on regional sites: 68.8
Average number of third party trackers on recreation sites: 58.2
Average number of third party trackers on health sites: 38.4
Average number of third party trackers on news sites: 151.2
Average number of third party trackers on business sites: 55.0
Average number of third party trackers on kids_and_teens sites: 74.8
Average number of third party trackers on home sites: 94.5
Average number of third party trackers on arts sites: 108.6
Average number of third party trackers on sports sites: 86.6
Average number of third party trackers on reference sites: 43.8
Average number of third party trackers on science sites: 43.1
Average number of third party trackers on society sites: 73.5
Average number of third party trackers on shopping sites: 53.1
Average number of third party trackers on adult sites: 16.8
Average number of third party trackers on games sites: 70.5