March 30, 2017

Who Will Secure the Internet of Things?

Over the past several months, CITP-affiliated Ph.D. student Sarthak Grover and fellow Roya Ensafi been investigating various security and privacy vulnerabilities of Internet of Things (IoT) devices in the home network, to get a better sense of the current state of smart devices that many consumers have begun to install in their homes.

To explore this question, we purchased a collection of popular IoT devices, connected them to a laboratory network at CITP, and monitored the traffic that these devices exchanged with the public Internet. We initially expected that end-to-end encryption might foil our attempts to monitor the traffic to and from these devices. The devices we explored included a Belkin WeMo Switch, the Nest Thermostat, an Ubi Smart Speaker, a Sharx Security Camera, a PixStar Digital Photoframe, and a Smartthings hub.

What We Found: Be Afraid!

Many devices fail to encrypt at least some of the traffic that they send and receive. Investigating the traffic to and from these devices turned out to be much easier than expected, as many of the devices exchanged personal or private information with servers on the Internet in the clear, completely unencrypted.

We recently presented a summary of our findings to the Federal Trade Commission, last week at PrivacyCon.  The video of Sarthak’s talk is available from the FTC website, as well as on YouTube.  Among some of the more striking findings include:

  • The Nest thermostat was revealing location information of the home and weather station, including the user’s zip code, in the clear.  (Note: Nest promptly fixed this bug after we notified them.)
  • The Ubi uses unencrypted HTTP to communicate information to its portal, including voice chats, sensor readings (sound, temperature, light, humidity). It also communicates to the user using unencrypted email. Needless to say, much of this information, including the sensor readings, could reveal critical information, such as whether the user was home, or even movements within a house.
  • The Sharx security camera transmits video over unencrypted FTP; if the server for the video archive is outside of the home, this traffic could also be intercepted by an eavesdropper.
  • All traffic to and from the PixStar photoframe was sent unencrypted, revealing many user interactions with the device.

Traffic capture from Nest Thermostat in Fall 2015, showing user zip code and other information in cleartext.

Traffic capture from Ubi, which sends sensor values and states in clear text.

Some devices encrypt data traffic, but encryption may not be enough. A natural reaction to some of these findings might be that these devices should encrypt all traffic that they send and receive. Indeed, some devices we investigated (e.g., the Smartthings hub) already do so. Encryption may be a good starting point, but by itself, it appears to be insufficient for preserving user privacy.  For example, user interactions with these devices generate traffic signatures that reveal information, such as when power to an outlet has been switched on or off. It appears that simple traffic features such as traffic volume over time may be sufficient to reveal certain user activities.

In all cases, DNS queries from the devices clearly indicate the presence of these devices in a user’s home. Indeed, even when the data traffic itself is encrypted, other traffic sent in the clear, such as DNS lookups, may reveal not only the presence of certain devices in your home, but likely also information about both usage and activity patterns.

Of course, there is also the concern about how these companies may use and share the data that they collect, even if they manage to collect it securely. And, beyond the obvious and more conventional privacy and security risks, there are also potential physical risks to infrastructure that may result from these privacy and security problems.

Old problems, new constraints. Many of the security and privacy problems that we see with IoT devices sound familiar, but these problems arise in a new, unique context, which present unique challenges:

  • Fundamentally insecure. Manufacturers of consumer products have little interest in releasing software patches and may even design the device without any interfaces for patching the software in the first place.  There are various examples of insecure devices that ordinary users may connect to the network without any attempts to secure them (or any means of doing so).  Occasionally, these insecure devices can result in “stepping stones” into the home for attackers to mount more extensive attacks. A recent study identified more than 500,000 insecure, publicly accessible embedded networked devices.
  • Diverse. Consumer IoT settings bring a diversity of devices, manufacturers, firmware versions, and so forth. This diversity can make it difficult for a consumer (or the consumer’s ISP) to answer even simple questions such as exhaustively identifying the set of devices that are connected to the network in the first place, let alone detecting behavior or network traffic that might reveal an anomaly, compromise, or attack.
  • Constrained. Many of the devices in an IoT network are severely resource-constrained: the devices may have limited processing or storage capabilities, or even limited battery life, and they often lack a screen or intuitive user interface. In some cases, a user may not even be able to log into the device.  

Complicating matters, a user has limited control over the IoT device, particularly as compared to a general-purpose computing platform. When we connect a general purpose device to a network, we typically have at least a modicum of choice and control about what software we run (e.g., browser, operating system), and perhaps some more visibility or control into how that device interacts with other devices on the network and on the public Internet. When we connect a camera, thermostat, or sensor to our network, the hardware and software are much more tightly integrated, and our visibility into and control over that device is much more limited. At this point, we have trouble, for example, even knowing that a device might be sending private data to the Internet, let alone being able to stop it.

Compounding all of these problems, of course, is the access a consumer gives an untrusted IoT device to other data or devices on the home network, simply by connecting it to the network—effectively placing it behind the firewall and giving it full network access, including in many cases the shared key for the Wi-Fi network.

A Way Forward

Ultimately, multiple stakeholders may be involved with ensuring the security of a networked IoT device, including consumers, manufacturers, and Internet service providers. There remain many unanswered questions concerning both who is able to (and responsible for) securing these devices, but we should start the discussion about how to improve the security for networks with IoT devices.

This discussion will include both policy aspects (including who bears the ultimate responsibility for device insecurity, whether devices need to adopt standard practices or behavior, and for how long their manufacturers should continue to support them), as well as technical aspects (including how we design the network to better monitor and control the behavior of these often-insecure devices).

Devices should be more transparent. The first step towards improving security and privacy for IoT should be to work with manufacturers to improve the transparency of these IoT devices, so that consumers (and possibly ISPs) have more visibility into what software the devices are running, and what traffic they are sending and receiving. This, of course, is a Herculean effort, given the vast quantity and diversity of device manufacturers; an alternative would be trying to infer what devices are connected to the network based on their traffic behavior, but doing so in a way that is both comprehensive, accurate, and reasonably informative seems extremely difficult.

Instead, some IoT device manufacturers might standardize on a manifest protocol that announces basic information, such as the device type, available sensors, firmware version, the set of destinations the device expects to communicate with (and whether the traffic is encrypted), and so forth. (Of course, such a manifest poses its own security risks.)

Network infrastructure can play a role. Given such basic information, anomalous behavior that is suggestive of a compromise or data leak would be more evident to network intrusion detection systems and firewalls—in other words, we could bring more of our standard network security tools to bear on these devices, once we have a way to identify what the devices are and what their expected behavior should be. Such a manifest might also serve as a concise (and machine readable!) privacy statement; a concise manifest might be one way for consumers to evaluate their comfort with a certain device, even though it may be far from a complete privacy statement.

Armed with such basic information about the devices on the network, smart network switches would have a much easier way to implement network security policies. For example, a user could specify that the smart camera should never be able to communicate with the public Internet, or that the thermostat should only be able to interact with the window locks if someone is present.

Current network switches don’t provide easy mechanisms for consumers to either express or implement these types of policies. Advances in Software-Defined Networking (SDN) in software switches such as Open vSwitch may make it possible to implement policies that resolve contention for shared resources and conflicts, or to isolate devices on the network from one another, but even if that is a reasonable engineering direction, this technology will only take us part of the way, as users will ultimately need far better interfaces to both monitor network activity and express policies about how these devices should behave and exchange traffic.

Update [20 Jan 2015]: After significant press coverage, Nest has contacted the media to clarify that the information being leaked in cleartext was not the zip code of the thermostat, but merely the zip code that the user enters when configuring the device. (Clarifying statement here.) Of course, when would a user ever enter a zip code other than that of their home, where the thermostat was located?

Supreme Court Takes Important GPS Tracking Case

This morning, the Supreme Court agreed to hear an appeal next term of United States v. Jones (formerly United States v. Maynard), a case in which the D.C. Circuit Court of Appeals suppressed evidence of a criminal defendant’s travels around town, which the police collected using a tracking device they attached to his car. For more background on the case, consult the original opinion and Orin Kerr’s previous discussions about the case.

No matter what the Court says or holds, this case will probably prove to be a landmark. Watch it closely.

(1) Even if the Court says nothing else, it will face the constitutionally of the use by police of tracking beepers to follow criminal suspects. In a pair of cases from the mid-1980’s, the Court held that the police did not need a warrant to use a tracking beeper to follow a car around on public, city streets (Knotts) but did need a warrant to follow a beeper that was moved indoors (Karo) because it “reveal[ed] a critical fact about the interior of the premises.” By direct application of these cases, the warrantless tracking in Jones seems constitutional, because it was restricted to movement on public, city streets.

Not so fast, said the D.C. Circuit. In Jones, the police tracked the vehicle 24 hours a day for four weeks. Citing the “mosaic theory often invoked by the Government in cases involving national security information,” the Court held that the whole can sometimes be more than the parts. Tracking a car continuously for a month is constitutionally different in kind not just degree from tracking a car along a single trip. This is a new approach to the Fourth Amendment, one arguably at odds with opinions from other Courts of Appeal.

(2) This case gives the Court the opportunity to speak generally about the Fourth Amendment and location privacy. Depending on what it says, it may provide hints for lower courts struggling with the government’s use of cell phone location information, for example.

(3) For support of its embrace of the mosaic theory, the D.C. Circuit cited a 1989 Supreme Court case, U.S. Department of Justice v. National Reporters Committee. In this case, which involved the Freedom of Information Act (FOIA) not the Fourth Amendment, the Court allowed the FBI to refuse to release compiled “rap sheets” about organized crime suspects, even though the rap sheets were compiled mostly from “public” information obtainable from courthouse records. In agreeing that the rap sheets nevertheless fell within a “personal privacy” exemption from FOIA, the Court embraced, for the first time, the idea that the whole may be worth more than the parts. The Court noted the difference “between scattered disclosure of the bits of information contained in a rap-sheet and revelation of the rap-sheet as a whole,” and found a “vast difference between the public records that might be found after a diligent search of courthouse files, county archives, and local police stations throughout the country and a computerized summary located in a single clearinghouse of information.” (FtT readers will see the parallels to the debates on this blog about PACER and RECAP.) In summary, it found that “practical obscurity” could amount to privacy.

Practical obscurity is an idea that hasn’t gotten much traction in the Courts since National Reporters Committee. But it is an idea well-loved by many privacy scholars, including myself, for whom it helps explain their concerns about the privacy implications of data aggregation and mining of supposedly “public” data.

The Court, of course, may choose a narrow route for affirming or reversing the D.C. Circuit. But if it instead speaks broadly or categorically about the viability of practical obscurity as a legal theory, this case might set a standard that we will be debating for years to come.

Tracking Your Every Move: iPhone Retains Extensive Location History

Today, Pete Warden and Alasdair Allan revealed that Apple’s iPhone maintains an apparently indefinite log of its location history. To show the data available, they produced and demoed an application called iPhone Tracker for plotting these locations on a map. The application allows you to replay your movements, displaying your precise location at any point in time when you had your phone. Their open-source application works with the GSM (AT&T) version of the iPhone, but I added changes to their code that allow it to work with the CDMA (Verizon) version of the phone as well.

When you sync your iPhone with your computer, iTunes automatically creates a complete backup of the phone to your machine. This backup contains any new content, contacts, and applications that were modified or downloaded since your last sync. Beginning with iOS 4, this backup also included is a SQLite database containing tables named ‘CellLocation’, ‘CdmaCellLocaton’ and ‘WifiLocation’. These correspond to the GSM, CDMA and WiFi variants of location information. Each of these tables contains latitude and longitude data along with timestamps. These tables also contain additional fields that appear largely unused on the CDMA iPhone that I used for testing — including altitude, speed, confidence, “HorizontalAccuracy,” and “VerticalAccuracy.”

Interestingly, the WifiLocation table contains the MAC address of each WiFi network node you have connected to, along with an estimated latitude/longitude. The WifiLocation table in our two-month old CDMA iPhone contains over 53,000 distinct MAC addresses, suggesting that this data is stored not just for networks your device connects to but for every network your phone was aware of (i.e. the network at the Starbucks you walked by — but didn’t connect to).

Location information persists across devices, including upgrades from the iPhone 3GS to iPhone 4, which appears to be a function of the migration process. It is important to note that you must have physical access to the synced machine (i.e. your laptop) in order to access the synced location logs. Malicious code running on the iPhone presumably could also access this data.

Not only was it unclear that the iPhone is storing this data, but the rationale behind storing it remains a mystery. To the best of my knowledge, Apple has not disclosed that this type or quantity of information is being stored. Although Apple does not appear to be currently using this information, we’re curious about the rationale for storing it. In theory, Apple could combine WiFi MAC addresses and GPS locations, creating a highly accurate geolocation service.

The exact implications for mobile security (along with forensics and law enforcement) will be important to watch. What is most surprising is that this granularity of information is being stored at such a large scale on such a mainstream device.