January 24, 2017

The Effects of the Forthcoming FCC Privacy Rules on Internet Security

Last week, the Federal Communications Commission (FCC) announced new privacy rules that govern how Internet service providers can share information about consumers with third parties.  One focus of this rulemaking has been on the use and sharing of so-called “Consumer Proprietary Network Information (CPNI)”—information about subscribers—for advertising. The Center for Information Technology Policy and the Center for Democracy and Technology jointly hosted a panel exploring this topic last May, and I have previously written on certain aspects of this issue, including what ISPs might be able to infer about user behavior, even if network traffic were encrypted.

Although the forthcoming rulemaking targets the collection, use, and sharing of customer data with “third parties”, an important—and oft-forgotten—facet of this discussion is that (1) ISPs rely on the collection, use, and sharing of CPNI to operate and secure their networks and (2) network researchers (myself included) rely on this data to conduct our research.  As one example of our work that is discussed today in the Wall Street Journal, we used DNS domain registration data to identify cybercriminals before they launch attacks. Performing this research required access to all .com domain registrations. We have also developed algorithms that detect the misuse of DNS domain names by analyzing the DNS lookups themselves. We have also worked with ISPs to explore the relationship between Internet speeds and usage, which required access to byte-level usage data from individual customers. ISPs also rely on third parties, including Verisign and Arbor Networks, to detect and mitigating attacks; network equipment vendors also use traffic traces from ISPs to test new products and protocols. In summary, although the goal of the FCC’s rulemaking is to protect the use of consumer data, the rulemaking could have had unintended negative consequences for the stability and security of the Internet, as well as for Internet innovation.

In response to the potential negative effects this rule could have created for Internet security and networking researchers, I filed comment with the FCC highlighting how network operators researchers depend on data to keep the network operating well, to keep it secure, and to foster continued innovation.  My comment in May highlights the type of data that Internet service providers (ISPs) collect, how they use it for operational and research purposes, and potential privacy concerns with each of these datasets.  In my comment, I exhaustively enumerate the types of data that ISPs collect; the following data types are particularly interesting because ISPs and researchers rely on them heavily, yet they also introduce certain privacy concerns:

  • IPFIX (“NetFlow”) data, which is the Internet traffic equivalent of call data records. IPFIX data is collected at a router and contains statistics about each traffic flow that traverses the router. It contains information about the “metadata” of each flow (e.g., the source and destination IP address, the start and end time of the flow). This data doesn’t contain “payload” information, but as previous research on information like telephone metadata has shown, a lot can be learned about a user from this kind of information. Nonetheless, this data has been used in research and security for many purposes, including (among other things) detecting botnets and denial of service attacks.
  • DNS Query data, which contains information about the domain names that each IP address (i.e., customer) is looking up (i.e., from a Web browser, from an IoT device, etc.). DNS query data can be highly revealing, as we have shown in previous work. Yet, at the same time, DNS query data is also incredibly valuable for detecting Internet abuse, including botnets and malware.

Over the summer, I gave a follow-up a presentation and filed follow-up comments (several of which were jointly authored with members of the networking and security research community) to help draw attention to how much Internet research depends on access to this type of data.  In early August, a group of us filed a comment with proposed wording for the upcoming rule. In this comment, we delineated the types of work that should be exempt from the upcoming rules. We argue that research should be exempt from the rulemaking if the research: (1) aims to promote security, stability, and reliability of networks, (2) does not have the end-goal of violating user privacy; (3) has benefits that outweigh the privacy risks; (4) takes steps to mitigate privacy risks; (5) would be enhanced by access to the ISP data.  In delineating this type of research, our goal was to explicitly “carve out” researchers at universities and research labs without opening a loophole for third-party advertisers.

Of course, the exception notwithstanding, researchers also should be mindful of user privacy when conducting research. Just because a researcher is “allowed” to receive a particular data trace from an ISP does not mean that such data should be shared. For example, much network and security research is possible with de-identified network traffic data (e.g., data with anonymized IP addresses), or without packet “payloads” (i.e., the kind of traffic data collected with Deep Packet Inspection). Researchers and ISPs should always take care to apply data minimization techniques that limit the disclosure of private information to only the granularity that is necessary to perform the research. Various practices for minimization exist, such as hashing or removing IP addresses, aggregating statistics over longer time windows, and so forth. The network and security research communities should continue developing norms and standard practices for deciding when, how, and to what degree private data from ISPs can be minimized when it is shared.

The FCC, ISPs, customers, and researchers should all care about the security, operation, and performance of the Internet.  Achieving these goals often involves sharing customer data with third-parties, such as the network and security research community. As a member of the research community, I am looking forward to reading the text of the rule, which, if our comments are incorporated, will help preserve both customer privacy and the research that keeps the Internet secure and performing well.

The Interconnection Measurement Project

Building on the March 11 release of the “Revealing Utilization at Internet Interconnection Points” working paper, today, CITP is excited to announce the launch of the Interconnection Measurement Project. This unprecedented initiative includes the launch of a project-specific website and the ongoing collection, analysis, and release of capacity and utilization data from ISP interconnection points. CITP’s Interconnection Measurement Project uses the same method that I detailed in the working paper and includes the participation of seven ISPs—Bright House Networks, Comcast, Cox, Mediacom, Midco, Suddenlink, and Time Warner Cable.

The project website—which we aim to update regularly—includes additional views of the data that are not included in the working paper. The visualizations are organized into three categories: (1) Aggregate Views; (2) Regional Views; and (3) Views by Interconnect. The Aggregate Views provide peak utilization, growth in capacity and usage, as well as the distribution of peak utilization across interconnects and across participating ISPs, on a monthly basis across the entire data set. The Regional Views provide monthly peak utilization by region and distribution of peak utilization across interconnects by region. Finally, the Views by Interconnect provide details into daily per-link utilization statistics, as well as the distribution of peak utilization by link and by capacity, also on a monthly basis.The website visualizations also include an additional month of data (March 2016) beyond what the original working paper included. CITP plans to regularly update the visualizations with new data to provide a picture of how the Internet is evolving, and we will assess the project annually to ensure that the data, reports, and insights that we offer remain relevant.

The March data is consistent with the initial findings detailed in the working paper: that many interconnects have significant spare capacity, that this spare capacity exists both across ISPs in each region and in aggregate for any individual ISP, and that the aggregate utilization across interconnects is roughly 50 percent during peak periods.

The seven participating ISPs collectively account for about 50 percent of all US broadband subscribers. We at CITP hope that these ISPs are merely the pioneers of what may eventually become a much larger effort. As we continue to advance this field of research and deepen our understanding of traffic characteristics at interconnection points, we welcome the participation of even more ISPs as well as other network operators and edge providers in this important effort.

An Unprecedented Look into Utilization at Internet Interconnection Points

Measuring the performance of broadband networks is an important area of research, and efforts to characterize the performance of these networks continues to evolve. Measurement efforts to date have largely relied on in­home devices and are primarily designed to characterize access network performance. Yet, a user’s experience also relies on factors that lie upstream of ISP access networks, which is why measuring interconnection is so important. Unfortunately, as I have previously written about, visibility about performance at the interconnection points to ISPs have been extremely limited, and efforts to date to characterize interconnection have largely been indirect, relying on inferences made at network endpoints.

Today, I am pleased to release analysis taken from direct measurement of Internet interconnection points, which represents advancement in this important field of research. To this end, I am releasing a working paper that includes data from seven Internet Service Providers (ISPs) who collectively serve approximately half of all US broadband subscribers.

Each ISP has installed a common measurement system from DeepField Networks to provide an aggregated and anonymized picture of interconnection capacity and utilization. Collectively, the measurement system captures data from 99% of the interconnection capacity for these participating ISPs, comprising more than 1,200 link groups. I have worked with these ISPs to expose interesting insights around this very important aspect of the Internet. Analysis and views of the dataset are available in my working paper,which also includes a full review of the method used. 

The research community has long recognized the need for this foundational information, which will help us understand how capacity is provisioned across a number of ISPs and how content traverses the links that connect broadband networks together. 

Naturally, the proprietary nature of Internet interconnection prevents us from revealing everything that the public would like to see—notably, we can’t expose information about individual interconnects because both the existence and capacity of individual interconnects is confidential. Yet, even the aggregate views yield many interesting insights.

One of the most significant findings from the initial analysis of five months of data—from October 2015 through February 2016—is that aggregate capacity is roughly 50% utilized during peak periods (and never exceeds 66% for any individual participating ISP, as shown in the figure below. Moreover, aggregate capacity at the interconnects continues to grow to offset the growth of broadband data consumption. 

Distribution of 95th percentile peak ingress utilization across all ISPs.

I am very excited to provide this unique and unprecedented view into the Internet. It is in everyone’s interest to advance this field of research in a rigorous and thoughtful way.