January 18, 2017

Archives for January 2016

Updating the Defend Trade Secrets Act?

Despite statements to the contrary by sponsors and supporters in April 2014, August 2015, and October 2015, backers of the Defend Trade Secrets Act (DTSA) now aver that “cyber espionage is not the primary focus” of the legislation. At last month’s Senate Judiciary Committee hearing, the DTSA was instead supported by two different primary reasons: the rise of trade secret theft by rogue employees and the need for uniformity in trade secret law.

While a change in a policy argument is not inherently bad, the alteration of the core justification for a bill should be considered when assessing it. Perhaps the new position of DTSA proponents acknowledges the arguments by over 40 academics, including me, that the DTSA will not reduce cyberespionage. However, we also disputed these new rationales in that letter: the rogue employee is more than adequately addressed by existing trade secret law, and there will be less uniformity in trade secrecy under the DTSA because of the lack of federal jurisprudence.

The downsides — including weakened industry cybersecurity, abusive litigation against small entities, and resurrection of the anti-employee inevitable disclosure doctrine — remain. As such, I continue to oppose the DTSA as a giant trade secrecy policy experiment with little data to back up its benefits and much evidence of its costs.

Who Will Secure the Internet of Things?

Over the past several months, CITP-affiliated Ph.D. student Sarthak Grover and fellow Roya Ensafi been investigating various security and privacy vulnerabilities of Internet of Things (IoT) devices in the home network, to get a better sense of the current state of smart devices that many consumers have begun to install in their homes.

To explore this question, we purchased a collection of popular IoT devices, connected them to a laboratory network at CITP, and monitored the traffic that these devices exchanged with the public Internet. We initially expected that end-to-end encryption might foil our attempts to monitor the traffic to and from these devices. The devices we explored included a Belkin WeMo Switch, the Nest Thermostat, an Ubi Smart Speaker, a Sharx Security Camera, a PixStar Digital Photoframe, and a Smartthings hub.

What We Found: Be Afraid!

Many devices fail to encrypt at least some of the traffic that they send and receive. Investigating the traffic to and from these devices turned out to be much easier than expected, as many of the devices exchanged personal or private information with servers on the Internet in the clear, completely unencrypted.

We recently presented a summary of our findings to the Federal Trade Commission, last week at PrivacyCon.  The video of Sarthak’s talk is available from the FTC website, as well as on YouTube.  Among some of the more striking findings include:

  • The Nest thermostat was revealing location information of the home and weather station, including the user’s zip code, in the clear.  (Note: Nest promptly fixed this bug after we notified them.)
  • The Ubi uses unencrypted HTTP to communicate information to its portal, including voice chats, sensor readings (sound, temperature, light, humidity). It also communicates to the user using unencrypted email. Needless to say, much of this information, including the sensor readings, could reveal critical information, such as whether the user was home, or even movements within a house.
  • The Sharx security camera transmits video over unencrypted FTP; if the server for the video archive is outside of the home, this traffic could also be intercepted by an eavesdropper.
  • All traffic to and from the PixStar photoframe was sent unencrypted, revealing many user interactions with the device.

Traffic capture from Nest Thermostat in Fall 2015, showing user zip code and other information in cleartext.

Traffic capture from Ubi, which sends sensor values and states in clear text.

Some devices encrypt data traffic, but encryption may not be enough. A natural reaction to some of these findings might be that these devices should encrypt all traffic that they send and receive. Indeed, some devices we investigated (e.g., the Smartthings hub) already do so. Encryption may be a good starting point, but by itself, it appears to be insufficient for preserving user privacy.  For example, user interactions with these devices generate traffic signatures that reveal information, such as when power to an outlet has been switched on or off. It appears that simple traffic features such as traffic volume over time may be sufficient to reveal certain user activities.

In all cases, DNS queries from the devices clearly indicate the presence of these devices in a user’s home. Indeed, even when the data traffic itself is encrypted, other traffic sent in the clear, such as DNS lookups, may reveal not only the presence of certain devices in your home, but likely also information about both usage and activity patterns.

Of course, there is also the concern about how these companies may use and share the data that they collect, even if they manage to collect it securely. And, beyond the obvious and more conventional privacy and security risks, there are also potential physical risks to infrastructure that may result from these privacy and security problems.

Old problems, new constraints. Many of the security and privacy problems that we see with IoT devices sound familiar, but these problems arise in a new, unique context, which present unique challenges:

  • Fundamentally insecure. Manufacturers of consumer products have little interest in releasing software patches and may even design the device without any interfaces for patching the software in the first place.  There are various examples of insecure devices that ordinary users may connect to the network without any attempts to secure them (or any means of doing so).  Occasionally, these insecure devices can result in “stepping stones” into the home for attackers to mount more extensive attacks. A recent study identified more than 500,000 insecure, publicly accessible embedded networked devices.
  • Diverse. Consumer IoT settings bring a diversity of devices, manufacturers, firmware versions, and so forth. This diversity can make it difficult for a consumer (or the consumer’s ISP) to answer even simple questions such as exhaustively identifying the set of devices that are connected to the network in the first place, let alone detecting behavior or network traffic that might reveal an anomaly, compromise, or attack.
  • Constrained. Many of the devices in an IoT network are severely resource-constrained: the devices may have limited processing or storage capabilities, or even limited battery life, and they often lack a screen or intuitive user interface. In some cases, a user may not even be able to log into the device.  

Complicating matters, a user has limited control over the IoT device, particularly as compared to a general-purpose computing platform. When we connect a general purpose device to a network, we typically have at least a modicum of choice and control about what software we run (e.g., browser, operating system), and perhaps some more visibility or control into how that device interacts with other devices on the network and on the public Internet. When we connect a camera, thermostat, or sensor to our network, the hardware and software are much more tightly integrated, and our visibility into and control over that device is much more limited. At this point, we have trouble, for example, even knowing that a device might be sending private data to the Internet, let alone being able to stop it.

Compounding all of these problems, of course, is the access a consumer gives an untrusted IoT device to other data or devices on the home network, simply by connecting it to the network—effectively placing it behind the firewall and giving it full network access, including in many cases the shared key for the Wi-Fi network.

A Way Forward

Ultimately, multiple stakeholders may be involved with ensuring the security of a networked IoT device, including consumers, manufacturers, and Internet service providers. There remain many unanswered questions concerning both who is able to (and responsible for) securing these devices, but we should start the discussion about how to improve the security for networks with IoT devices.

This discussion will include both policy aspects (including who bears the ultimate responsibility for device insecurity, whether devices need to adopt standard practices or behavior, and for how long their manufacturers should continue to support them), as well as technical aspects (including how we design the network to better monitor and control the behavior of these often-insecure devices).

Devices should be more transparent. The first step towards improving security and privacy for IoT should be to work with manufacturers to improve the transparency of these IoT devices, so that consumers (and possibly ISPs) have more visibility into what software the devices are running, and what traffic they are sending and receiving. This, of course, is a Herculean effort, given the vast quantity and diversity of device manufacturers; an alternative would be trying to infer what devices are connected to the network based on their traffic behavior, but doing so in a way that is both comprehensive, accurate, and reasonably informative seems extremely difficult.

Instead, some IoT device manufacturers might standardize on a manifest protocol that announces basic information, such as the device type, available sensors, firmware version, the set of destinations the device expects to communicate with (and whether the traffic is encrypted), and so forth. (Of course, such a manifest poses its own security risks.)

Network infrastructure can play a role. Given such basic information, anomalous behavior that is suggestive of a compromise or data leak would be more evident to network intrusion detection systems and firewalls—in other words, we could bring more of our standard network security tools to bear on these devices, once we have a way to identify what the devices are and what their expected behavior should be. Such a manifest might also serve as a concise (and machine readable!) privacy statement; a concise manifest might be one way for consumers to evaluate their comfort with a certain device, even though it may be far from a complete privacy statement.

Armed with such basic information about the devices on the network, smart network switches would have a much easier way to implement network security policies. For example, a user could specify that the smart camera should never be able to communicate with the public Internet, or that the thermostat should only be able to interact with the window locks if someone is present.

Current network switches don’t provide easy mechanisms for consumers to either express or implement these types of policies. Advances in Software-Defined Networking (SDN) in software switches such as Open vSwitch may make it possible to implement policies that resolve contention for shared resources and conflicts, or to isolate devices on the network from one another, but even if that is a reasonable engineering direction, this technology will only take us part of the way, as users will ultimately need far better interfaces to both monitor network activity and express policies about how these devices should behave and exchange traffic.

Update [20 Jan 2015]: After significant press coverage, Nest has contacted the media to clarify that the information being leaked in cleartext was not the zip code of the thermostat, but merely the zip code that the user enters when configuring the device. (Clarifying statement here.) Of course, when would a user ever enter a zip code other than that of their home, where the thermostat was located?

The Web Privacy Problem is a Transparency Problem: Introducing the OpenWPM measurement tool

In a previous blog post I explored the success of our study, The Web Never Forgets, in having a positive impact on web privacy. To ensure a lasting impact, we’ve been doing monthly, automated 1-million-site measurement of tracking and privacy. Soon we’ll be releasing these datasets and our findings. But in this post I’d like to introduce our web measurement platform OpenWPM that we’ve built for this purpose. OpenWPM has been quickly gaining adoption — it has already been used by at least 6 other research groups, as well as journalists, regulators, and students for class projects. In this post, I’ll explain why we built OpenWPM, describe a previously unmeasured type of tracking we found using the tool, and show you how you can participate and contribute to this community effort.

This post is based on a talk I gave at the FTC’s PrivacyCon. You can watch the video online here.

Why monthly, large-scale measurements are necessary

In my previous post, I showed how measurements from academic studies can help improve online privacy, but I also pointed out how they can fall short. Measurement results often have an immediate impact on online privacy. Unless that impact leads to a technical, policy, or legal solution, the impact will taper off over time as the measurements age.

Technical solutions do not always exist for privacy violations. I discussed how canvas fingerprinting can’t be prevented without sacrificing usability in my previous blog post, but there are others as well. For example, it has proven difficult to find a satisfactory solution to the privacy concerns surrounding WebRTC’s access to local IPs. This is also highlighted in the unofficial W3C draft on Fingerprinting Guidance for Web Specification Authors, which states: “elimination of the capability of browser fingerprinting by a determined adversary through solely technical means that are widely deployed is implausible.”

It seems inevitable that measurement results will go out of date, for two reasons. Most obviously, there is a high engineering cost to running privacy studies. Equally important is the fact that academic papers in this area are published as much for their methodological novelty as for their measurement results. Updating the results of an earlier study is unlikely to lead to a publication, which takes away the incentive to do it at all. [1]

OpenWPM: our platform for automated, large-scale web privacy measurements

We built OpenWPM (Github, technical report), a generic platform for online tracking measurement. It provides the stability and instrumentation necessary to run many online privacy studies. Our goal in developing OpenWPM is to decrease the initial engineering cost of studies and make running a measurement as effortless as possible. It has already been used in several published studies from multiple institutions to detect and reverse engineer online tracking.

OpenWPM also makes it possible to run large-scale measurements with Firefox, a real consumer browser [2]. Large scale measurement lets us compare the privacy practices of the most popular sites to those in the long tail. This is especially important when observing the use of a tracking technique highlighted in a measurement study. For example, we can check if it’s removed from popular sites but added to less popular sites.

Transparency through measurement, on 1 million sites

We are using OpenWPM to run the Princeton Transparency Census, a monthly web-scale measurement of tracking techniques and privacy issues, comprising 1 million sites. With it, we will be able to detect and measure many of the known privacy violations reported by researchers so far: the use of stateful tracking mechanisms, browser fingerprinting, cookie synchronization, and more.

During the measurements, we’ll collect data in three categories: (1)  network traffic — all HTTP requests and response headers (2) client-side state — cookies, Flash cookies, etc. (3) execution traces — we trap and record targeted JavaScript API calls that have been known to be used for tracking. In addition to releasing all of the raw data collected during the census, we’ll release the results of our own automated analysis.

Alongside the 1 million site measurement, we are also running smaller, targeted measurements with different browser configurations. Examples include crawling deeper into the site or browsing with a privacy extension, such as Ghostery or AdBlock Plus. These smaller crawls will provide additional insight into the privacy threats faced by real users.

Detecting WebRTC local IP discovery

As a case study of the ease of introducing a new measurement into the infrastructure, I’ll walk through the steps I took to measure scripts using WebRTC to discover a machine’s local IP address [3]. For machines behind a home router, this technique may reveal an IP of the form 192.168.1.*. Users of corporate or university networks may return a unique local IP address from within that organization’s IP range.

A user’s local IP address adds additional information to a browser fingerprint. For example, it can be used as a way to differentiate multiple users behind a NAT without requiring browser state. The amount of identifying information it provides for the average user hasn’t been studied. However, both Chrome and Firefox [4] have implemented opt-in solutions to prevent the technique. The first reported use that I could find for this technique in the wild was a third-party on nytimes.com in July 2015.

After examining a demo script, I decided to record all property access and all method calls of the RTCPeerConnection interface, the primary interface for WebRTC. The additional instrumentation necessary for this interface is just a single line of Javascript in OpenWPM’s Firefox extension.

A preliminary analysis [5] of a 50,000 site pilot measurement from October 2015 suggests that WebRTC local IP discovery is used on the homepages of over 100 sites, from over 20 distinct scripts. Only 1 of these scripts would be blocked by EasyList or EasyPrivacy.

How can this be useful for you

We envision several ways researchers and other members of the community can make use of  OpenWPM and our measurements. I’ve listed them here from least involved to most involved.

(1) Use our measurement data for their own tools. In my analysis of canvas fingerprinting I mentioned that Disconnect incorporated our research results into their blocklist. We want to make it easy for privacy tools to make use of the analysis we run, by releasing analysis data in a structured, machine readable way.

(2) Use the data collected during our measurements, and build their own analysis on top of it. We know we’ll never be able to take the human element out of these studies. Detection methodologies will change, new features of the browser will be released and others will change. The depth of the Transparency measurements should make it easy test new ideas, with the option of contributing them back to the regular crawls.

(3) Use OpenWPM to collect and release their own data. This is the model we see most web privacy researchers opting for, and a model we plan to use for most of our own studies. The platform can be used and tweaked as necessary for the individual study, and the measurement results and data can be shared publicly after the study is complete.

(4) Contribute to OpenWPM through pull requests. This is the deepest level of involvement we see. Other developers can write new features into the infrastructure for their own studies or to be run as part of our transparency measurements. Contributions here will benefit all users of OpenWPM.

Over the coming months we will release new blog posts and research results on the measurements I’ve discussed in this post. You can follow our progress here on Freedom to Tinker, on Twitter @s_englehardt, and on our Github repository.

 

[1] Notable exceptions include the study of cookie respawning: 2009, 2011, 2011, 2014. and the statistics on stateful tracking use and third-party inclusion: 2009, 2012, 2012, 2012, 2015.

[2] Crawling with a real browser is important for two reasons: (1) it’s less likely to be detected as a bot, meaning we’re less likely to receive different treatment from a normal user, and (2) a real browser supports all the modern web features (e.g. WebRTC, HTML5 audio and video), plugins (e.g. Flash), and extensions (e.g. Ghostery, HTTPS Everywhere). Many of these additional features play a large role in the average user’s privacy online.

[3] There is an additional concern that WebRTC can be used to determine a VPN user’s actual IP address, however this attack is distinct from the one described in this post.

[4] uBlock Origin also provides an option to prevent WebRTC local IP discovery on Firefox.

[5] We are in the process of running and verifying this analysis on a our 1 million site measurements, and will release an updated analysis with more details in the future.