November 20, 2018

What Your ISP (Probably) Knows About You

Earlier this week, I came across a working paper from Professor Peter Swire—a highly respected attorney, professor, and policy expert.  Swire’s paper, entitled “Online Privacy and ISPs“, argues that ISPs have limited capability to monitor users’ online activity. The paper argues that ISPs have limited visibility into users’ online activity for three reasons:  (1) users are increasingly using many devices and connections, so any single ISP is the conduit of only a fraction of a typical user’s activity; (2) end-to-end encryption is becoming more pervasive, which limits ISPs’ ability to glean information about user activity; and (3) users are increasingly shifting to VPNs to send traffic.

An informed reader might surmise that this writeup relates to the reclassification of Internet service providers under Title II of the Telecommunications Act, which gives the FCC a mandate to protect private information that ISPs learn about their customers. This private information includes both personal information, as well as information about a customer’s use of the service that is provided as a result of receiving service—sometimes called Customer Proprietary Network Information, or CPNI. One possible conclusion a reader might draw from this white paper is that ISPs have limited capability to learn information about customers’ use of their service and hence should not be subject to additional privacy regulations.

I am not taking a position in this policy debate, nor do I intend to make any normative statements about whether an ISP’s ability to see this type of user information is inherently “good” or “bad” (in fact, one might even argue that an ISP’s ability to see this information might improve network security, network management, or other services). Nevertheless, these debates should be based on a technical picture that is as accurate as possible.  In this vein, it is worth examining Professor Swire’s “factual description of today’s online ecosystem” that claims to offer the reader an “up-to-date and accurate understanding of the facts”. It is true that the report certainly contains many facts, but it also omits important details about the “online ecosystem”. Below, I fill in what I see as some important missing pieces. Much of what I discuss below I have also sent verbatim in a letter to the FCC Chairman. I hope that the original report will ultimately incorporate some of these points.

[Update (March 9): Swire notes in a response that the report itself doesn’t contain technical inaccuracies. Although there are certainly many points that are arguable, they are hard to disprove without better data, so it is difficult to “prove” the inaccuracies. Even if we take it as a given that there are no inaccuracies, that’s a very different thing than saying that the report tells the whole story.]

Claim 1: User Mobility Prevents a Single ISP from Observing a Significant Fraction of User Traffic

The report’s claim: Due to increased mobility, users are increasingly using many devices and connections, so any single ISP is the conduit of only a fraction of a typical user’s activity.

A missing piece: A single ISP can still track significant user activities from home network traffic and (as the user moves) through WiFi sharing services.

The report cites statistics from Cisco on the increase in mobile devices; these statistics do not offer precise information about how user traffic distributes across ISPs, but it’s reasonable to assume that users who are more mobile are not sending all of their traffic over a single access ISP.

Yet, a user’s increased mobility by no means implies that a single ISP cannot track users’ activities in their homes. Our previous research has shown that the traffic that users send in their home networks—typically through a single ISP—reveals significant information about user activity and behavior. The median home had about five connected devices at any give time. Simply by observing traffic patterns (i.e., without looking at any packet contents), we could determine the types of devices that users had in their homes, as well as how often (and how heavily) they used each device. In some cases, we could even determine when the user was likely to be home, based on diurnal traffic usage patterns. We could determine the most popular domains that each home visited.  The figure below shows examples of such popular domains.

Lots to learn from home network traffic. This example from our previous work shows an example of what an ISP can learn from network traffic popular domains from 25 home networks. The number of homes for which each domain appeared in the top 5 or top 10 most popular domains for that home.

 

Based on what we can observe from this traffic, it should come as no surprise that the data that we gathered—which is the same data an ISP can see—warrants special handling, due to its private nature. University Institutional Review Boards (IRBs) consider this type of work human subjects research because it “obtains (1) data through intervention or interaction with the individual; or (2) private, identifiable information”; indeed, we had to get special approval to even perform this study in the first place.

The report claims that user mobility may make it more difficult for a single ISP to track a user’s activity because a mobile user is more likely to connect through different ISPs.  But, another twist in this story makes me think that this deserves more careful examination: the rise of shared WiFi hotspots—such as Xfinity WiFi, which had deployed 10 million WiFi hotspots as of mid-2015, and which users had accessed 3.6 billion times—in some cases allow a single ISP to track mobile users more than they otherwise would be able to without such a service.

Incidentally, the report also says that “limited processing power and storage placed technical and cost limits [deep-packet inspection] capability”, but in the last mile, data-rates are substantially lower and can thus permit DPI.  For example, we had no trouble gathering all of the traffic data for our research on a simple, low-cost Netgear router running OpenWrt. : Most home networks we have studied are sending traffic at only tens of megabits per second, even at peak rate. We have been able to perform packet capture on very resource-limited devices at these rates.

Claim 2: End-to-End Encryption Limits ISP Visibility into User Behavior

The report’s claim: End-to-end encryption on websites is increasingly pervasive; thus, ISPs have limited visibility into user behavior .

A missing piece: ISPs can observe user activity based on general traffic patterns (e.g., volumes), unencrypted portions of communication, and the large number of in-home devices that do not encrypt traffic.

Nearly all Internet-connected devices use the Domain Name System (DNS) to look up domain names for specific Internet destinations. These DNS lookups are generally “in the clear” (i.e., unencrypted) and can be particularly revealing. For example, we conducted a recent study of traffic patterns from a collection of IoT devices; in that study, we observed, for example, that a Nest thermostat routinely performs a DNS lookup to frontdoor.nest.com, a popular digital photo frame routinely issued DNS queries to api.pix-star.com, and a popular IP camera routinely issued DNS queries to (somewhat ironically!) sharxsecurity.com. No sophisticated traffic analysis was required to identify the usage of these devices from plaintext DNS query traffic.

Even when a site uses HTTPS to communicate with an Internet destination, the initial TLS handshake typically indicates the hostname that it is communicating with using the Server Name Indication (SNI), which allows the server to present the client with the appropriate certificate for the corresponding domain that the client is attempting to communicate with. The SNI is transmitted in cleartext and naturally reveals information about the domains that a user’s devices are communicating with.

The report cites the deployment of HTTPS on many major websites as evidence that traffic from consumers is increasingly encrypted end-to-end. Yet, consumer networks are increasingly being equipped with Internet of Things (IoT) devices, many of which we have observed send traffic entirely in cleartext. In fact, of the devices we have studied, cleartext communication was the norm, not the exception. While of course, we all hope that many of these devices ultimately shift to using encrypted communications in the future, the current state of affairs is much different. Even in the longer term, it is possible that certain IoT devices may be so resource-limited as to make cryptography impractical, particularly in the case of low-cost IoT devices. The deployment of HTTPS on major websites is certainly encouraging for the state of privacy on the Internet in general, but it is a poor indicator for how much traffic from a home network is encrypted.

Claim 3: Users are Increasingly Using VPNs, Which Conceal User Activity from ISPs

The report’s claim: Users’ increasing use of VPNs encrypt all traffic, including DNS, as traffic traverses the ISP; therefore, ISPs cannot see any user traffic.

A missing piece: DNS traffic sometimes goes to the ISP’s DNS server after it exits the VPN tunnel. Configuring certain devices to use VPNs may not be straightforward for many users.

Whether VPNs will prevent ISPs from seeing DNS traffic depends on the configuration of the VPN tunnel. A VPN is simply an encrypted tunnel that takes the original IP packet and encapsulates the packet in a new packet whose destination IP address is the tunnel endpoint. But, the IP address for DNS resolution is typically set by the Dynamic Host Configuration Protocol (DHCP). If the consumer uses the ISP’s DHCP server to configure the host in question (which most of us do), the client’s DNS server will still be the ISP’s DNS server, unless the client’s VPN software explicitly reconfigures the DNS server (many VPN clients do not).

In these cases, the ISP will still continue to observe all of the user’s DNS traffic, even if the user configures a VPN tunnel: the DNS queries will exit the VPN tunnel and head right back to the ISP’s DNS server. It is often for a user to configure a device to not use the ISP’s DNS server, but this is by no means automatic and in certain cases (e.g., on IoT devices) it may be quite difficult. Even in cases where a VPN uses its own DNS resolver, the traffic for those queries by no means stay local: DNS cache misses can cause these queries to traverse many ISPs.

Traffic from VPNs doesn’t simply disappear: it merely resurfaces in another ISP that can subsequently monitor user activity. The opportunities for observing user traffic are substantial. For example, in a recent simple experiment that postdoc Philipp Winter performed, web requests from Tor exit relays to the Alexa top 1,000 websites traversed more than 350 Internet service providers considering the DNS lookups from these exit relays, the traffic from these exit nodes traverses an additional 173 Internet service providers.

Furthermore, VPN clients are typically for desktop machines and, in some cases, mobile devices such as phones and tablets. As previously discussed, IoT devices in homes will continue to generate more traffic. Most such devices do not support VPN software. While it is conceivable that a user could set up an encrypted VPN tunnel from the home router and route all home traffic through a VPN, typical home gateways don’t easily support this functionality at this point, and configuring such a setup would be cumbersome for the typical user.

Conclusion

Policymakers, industry, and consumers should debate whether, when, and how the FCC should impose privacy protections for consumers. Robust debate, however, needs an understanding of the technical underpinnings that is complete as possible. In this post, I have attempted to fill in what struck me as some missing pieces in Professor Swire’s discussion of ISPs’ ability to observe user activity in network traffic. The report implies that ISPs’ access to information about users’ online activity is neither “comprehensive” nor “unique”. Yet, an ISP is in the position to see user traffic to much more user traffic from many more devices than other parties in the Internet ecosystem—and certainly much more than the paper would have the reader conclude. I hope that the original working paper is revised to reflect a more complete and balanced view of ISPs’ capabilities.

Comments

  1. After receiving some feedback, I realized that some aspects of the post could be misconstrued.

    Allow me to clarify a few points in the post.

    1. The data that I mentioned above is data that we gathered as part of an IRB-approved university study, using routers and equipment that we deployed as part of the BISmark project (http://projectbismark.net/). It is not data that any ISP gathered and/or provided to us. I have no idea, of course, whether any access ISPs *actually* gather the same kind of data or analyze it in the way we did (I expect that many do not, since, unlike people who are actually running networks, the point of our study was focused on studying user behavior in the home). The point that I tried to raise here—by way of concrete example—is that it’s _possible_ for ISPs to see this data. That’s not necessarily good or bad—it just *is*. This brings me to my next point of clarification…

    2. The fact that ISPs _can_ gather this kind of data is not necessarily a bad thing; this kind of data can actually help ISPs run their networks better. ISPs gather network traffic data for many purposes (malware detection, DDoS mitigation, planning and provisioning, etc.). There are many *good* reasons for ISPs to gather network traffic data. But, the main point of this post was not to discuss those reasons (of which there are many), but rather to discuss what is visible, even with encryption and VPNs. I think (until and unless there is evidence otherwise) it is fair and reasonable to assume that ISPs’ first motive is generally not to abuse privacy, but rather to gather data to run the network better. The fact that ISPs have this information, of course, makes it possible to see user activity, but I wouldn’t presume that they have any such motives. The post is meant to illustrate what is in the data, *NOT* what ISPs are or are not actually doing.

    3. My example of Xfiniti Wifi was illustrative as a (popular) example of a service that an ISP runs where they might be able to see data from the same user, even if that user is mobile. I chose the example not to single out that service, but actually because it’s one I happen to (happily and frequently!) use—from many different locations—so it was the first one that came to mind. It is worth mentioning that _many_ service providers support forms of mobile access (cellular providers of course being the most prominent, but any roaming WiFi service also falls into this category). Another prominent example would be cellular WiFi hotspots. My main point here was to illustrate that just because a user is mobile does not mean that they move off of the network of their “home” access ISP. Quite the opposite, in many cases, in fact, and I expect these trends to only continue as mobile offerings become even more prominent. Again, this is not a bad thing in and of itself; it just reflects the situation. To suggest that mobility by itself somehow offers the user more privacy is just often not the case.

  2. “it is fair and reasonable to assume that ISPs’ first motive is generally not to abuse privacy, but rather to gather data to run the network better.”

    I don’t know if that’s reasonable, as illustrated by Comcast’s HTTP ad-injection, super-cookies, etc. I think it is more objectively constructive and fair to treat ISPs as demonstrably-untrustworthy global adversaries. They are potentially privy to all of the traffic you generate, and their potential to violate out privacy is limited only by the effectiveness of our encryption and the social/consumer forms of redress we choose to put in place.