October 1, 2022

Routing Attacks on Internet Services

by Yixin Sun, Annie Edmundson, Henry Birge-Lee, Jennifer Rexford, and Prateek Mittal

[In this post, we discuss a recent thread of research that highlights the insecurity of Internet services due to the underlying insecurity of Internet routing. We hope that this thread facilitates important dialog in the networking, security, and Internet policy communities to drive change and adoption of secure mechanisms for Internet routing]

The underlying infrastructure of the Internet comprises physical connections between more than 60,000 entities known as Autonomous Systems (such as AT&T and Verizon). Internet routing protocols such as the Border Gateway Protocol (BGP) govern how our communications are routed over a series of autonomous systems to form an end-to-end communication channel between a sender and receiver.

Unfortunately, Internet routing protocols were not designed with security in mind. The insecurity in the BGP protocol allows potential adversaries to manipulate how routing on the Internet occurs. For example, see this recent real-world example of BGP attacks against Mastercard, Visa, and Symantec. The insecurity of BGP is well known, and a number of protocols have been designed to secure Internet routing. However, we are a long ways away from large-scale deployment of secure Internet routing protocols.  

This status quo is unacceptable.

Historically, routing attacks have been viewed primarily from the perspective of an attack on availability of Internet applications.  For example, an adversary can hijack Internet traffic towards a victim application server and cause unavailability (see YouTube’s 2008 hijack). A secondary perspective is that of confidentiality of unencrypted Internet communications. For example, an adversary can manipulate Internet routing to position itself on the communication path between a client and the application server and record unencrypted traffic: http://dyn.com/blog/mitm-internet-hijacking/

In this post, we  argue that conventional wisdom significantly underestimates the vulnerabilities introduced due to insecurity of Internet routing. In particular, we discuss recent research results that exploit BGP insecurity to attack the Tor network, TLS encryption, and the Bitcoin network.

BGP attacks on anonymity systems/Tor: The Tor network is a deployed system for anonymous communication that aims to protect user identity (IP address) in online communications. The Tor network comprises of over 7,000 relays which together carry terabytes of traffic every day. Tor serves millions of users, including political dissidents, whistle-blowers, law-enforcement, intelligence agencies, journalists, businesses and ordinary citizens concerned about the privacy of their online communications.

Tor clients redirect their communications via a series of proxies for anonymous communication. Layered encryption is used such that each proxy only observes the identity of the previous hop and the next hop in the communication, and no single proxy observes the identities of both the client and the destination.

However, if an adversary can observe the traffic from the client to the Tor network, and from the Tor network to the destination, then it can leverage correlation between packet timing and sizes to infer the network identities of clients and servers (end-to-end timing analysis). Therefore, an adversary can first use BGP attacks to hijack or intercept Internet traffic towards the Tor network (Tor relays), and perform traffic analysis of encrypted communications to compromise user anonymity.

It is important to note that this timing analysis works even if the communication is encrypted. This illustrates an important point — the insecurity of Internet routing has important consequences for traffic-analysis attacks, which allow adversaries to infer sensitive information from communication meta-data (such as source IP, destination IP, packet size and packet timing), even if communication is encrypted.

We introduced the threat of “Routing Attacks on Privacy in Tor” (RAPTOR attacks) at USENIX Security in 2015. We demonstrated the feasibility of RAPTOR attacks on the Tor network by performing real-world Internet routing manipulation in a controlled and ethical manner.  Interested readers can see the technical paper and our project webpage for more details.

Routing attacks challenge conventional beliefs about security of anonymity systems, and also have broad applicability to low-latency anonymous communication (including systems beyond Tor, such as I2P). Our work also motivates the design of anonymity systems that successfully resist the threat of Internet routing manipulation. The Tor project is already implementing design changes (such as Tor proposal 247 and Tor proposal 271) that make it harder for an adversary to infer and manipulate the client’s entry point (proxy) into the Tor network. Our follow-up work on Counter-RAPTOR defenses (presented at the IEEE Security and Privacy Symposium in 2017) presents a monitoring framework to analyze routing updates for the Tor network, which is being integrated into the Tor metrics portal.

BGP attacks on TLS/Digital Certificates: The Transport Layer Security (TLS) protocol allows a client to establish a secure communication channel with a destination website using cryptographic key exchange protocols. To prevent man-in-the-middle attacks, clients using the TLS protocols need to authenticate the public key corresponding to the destination site, such as a web-server. Digital certificates issued by trusted Certificate Authorities (such as Let’s Encrypt) provide an authentic binding between destination server and its public key, allowing a client to validate the destination server. Given the widespread use of TLS for secure Internet communications, the security of the digital certificate ecosystem is paramount.  

We have shown that the process for obtaining digital certificates from trusted certificate authorities (called domain validation) is vulnerable to attack.

A domain owner can perform a Certificate Signing Request (CSR) to a trusted Certificate Authority to obtain a digital certificate.  The Certificate Authority must verify that the party submitting the request actually has control over the domains that are covered by that CSR. This process is known as domain control verification and is a core part of the Public Key Infrastructure (PKI) used in the TLS protocol.

In our ongoing work in progress, presented at the HotPETS workshop in 2017, we demonstrated the feasibility of exploiting BGP attacks to compromise the domain validation protocol. For example,  HTTP domain verification is a common method of domain control verification that requires the domain owner to upload a string specified by the CA to a specific HTTP URL at the domain. The CA can then verify the domain via a HTTP GET request. However, an adversary can manipulate inter-domain routing via BGP attacks to intercept all traffic towards the victim web-server, and successfully obtain a fraudulent digital certificate by spoofing a HTTP response corresponding to the CA challenge message. We have performed real-world Internet routing manipulation in a controlled and ethical manner to demonstrate the feasibility of these attacks. See our attack demonstration video for a demo.

This attack has significant consequences for privacy of our online communications, as adversaries can bypass cryptographic protection offered by encryption using fraudulently obtained digital certificates. Our work is leading to deployment of suggested countermeasures (verification from multiple vantage points) at Let’s Encrypt. Please see the Let’s Encrypt deployment for more details.

So far, we have discussed our research results from Princeton University. Below, I’ll briefly discuss research from Laurent Vanbever’s group at ETHZ and Sharon Goldberg’s Group at Boston University that have shown that it is possible to use inter-domain routing manipulation for attacking Bitcoin and for bypassing legal protections.

BGP attacks on Crypto-currencies/Bitcoin: BGP manipulation can be used to perform two main types of attacks on crypto-currencies such as Bitcoin: (1) partitioning attacks, in which an adversary aims to disconnect a set of victim Bitcoin nodes from the network, or (2) delaying attacks, in which an adversary can slow down the propagation of data towards victim Bitcoin nodes. Both of these attacks result in potential economic loss to Bitcoin nodes.

BGP attacks for bypassing legal protections: Domestic communications between US citizens have legal protections against surveillance. However, adversaries can manipulate inter-domain routing such that the actual communication path involves a foreign country, which could invalidate the legal protections and allow large-scale surveillance of online communications.

Concluding Thoughts:  The emergence of routing attacks on anonymity systems, Internet domain validation, and cryptocurrencies showcases that conventional wisdom has significantly underestimated the attack surface introduced due to the insecurity of Internet routing. It is imperative for critical Internet applications to be aware of the insecurity of Internet routing, and analyze the resulting security threats.

Given the vulnerabilities in Internet routing, applications should consider domain specific defense mechanisms for enhancing user security and privacy. Examples include our Counter-RAPTOR analytics for Tor and Multiple vantage point defense for domain validation). We hope that our work, and the research discussed above is an enabler for this vision.

While it is important to design and deploy application-specific defenses for protecting our systems against routing attacks that exploit current insecure Internet infrastructure, it is even more important to rethink the status quo of insecure routing protocols. Our ultimate goal ought to be to fundamentally eliminate the insecurity in today’s Internet routing protocols by moving towards the adoption of secure countermeasures. How do we drive this change?

Is It Time for an Data Sharing Clearinghouse for Internet Researchers?

Today’s Senate hearing with Facebook’s Mark Zuckerberg will start a long discussion on data collection and privacy from Internet companies. Although the spotlight is currently on Facebook, we shouldn’t forget that the picture is  broader: companies from device manufacturers to ISPs collect network traffic and use it for a variety of purposes.

The uses that we will hear about today are largely about the widespread collection of data about Internet users for targeted content delivery and advertising.  Meanwhile, yesterday Facebook announced an initiative to share data with independent researchers to study social media’s impact on elections. At the same time Facebook is being raked over the coals for sharing their data with “researchers” (Cambridge Analytica), they’ve announced a program to share their data with (presumably more “legitimate”) researchers.

Internet researchers depend on data. Sometimes, we can gather the data ourselves, using measurement tools deployed at the edge of the Internet (e.g., in home networks, on phones). In other cases, we need data from the companies that operate parts of the Internet, such as an Internet service provider (ISP), an Internet registrar, or an application provider (e.g., Facebook).

  • If incentives align, data flows to the researcher. Interacting with a company can work very well when goals are aligned. I’ve worked well with companies to develop new spam filtering algorithms, to develop new botnet detection algorithms, and to highlight empirical results that have informed policy debates.
  • If incentives do not align, then the researcher probably won’t get the data.  When research is purely technical, incentives often align. When the technical work crosses over into policy (as it does in areas like net neutrality, and as we are seeing with Facebook), there can be (insurmountable) hurdles to data access.

How an Internet Researcher Gets Data Today

How do Internet researchers get data from companies today? An Internet operator I know aptly characterizes the status quo:

“Show Internet operators you can do something useful, and they’ll give you data.”

Researchers get access to Internet data from companies in two ways: (1) working for the company (as an employee), or (2) working with the company (as an “independent” researcher).

Option #1: Work for a Company.

Working for a company offers privileged access to data, which can be used to mint impressive papers (irreproducibility aside) simply because nobody else has the same data. I have taken this approach myself on a number of occasions, having worked for an ISP (AT&T), a DNS company (Verisign), and an Internet security service provider (Damballa).

How this approach works. In the 2000s, research labs at AT&T and Sprint had privileged access to data, which gave rise to a proliferation of papers on “Everything You Wanted to Know About the Internet Backbone But Were Afraid to Ask”.  Today, the story repeats itself, except that the players are Google and Facebook, and the topic du jour is data-center networks.

Shortcomings of This Approach. Much research—from projects with a longer arc to certain policy-oriented questions—would never come to light if we only relied on company employees to do it. By the nature of their work, however, company employees lack independence. They lack both autonomy of selecting problems and in the ability to take positions or publish results that run counter to the company’s goals or priorities. This shortcoming may not matter if what the researcher wants to work on and what the company want to accomplish are the same. For many technical problems, this is the case (although there is still the tendency for the technical community to develop tunnel vision around areas where there is an abundance of data, while neglecting other areas). But for many problems—ranging from problems with a longer arc to deployment to those that may run counter to priorities—we can’t rely on industry to do the work.

#2: Work with a Company. 

How this approach works. A researcher may instead work with a company, typically gaining privileged access to data for a particular project. Sometimes, we demonstrate the promise of a technique with some data that we can gather or bootstrap without any help and use that initial study to pique the interest of a company who may then share data with us to further develop the idea.

Shortcomings of this approach. Research done in collaboration with a company often has similar shortcomings as the research that is done within a company’s walls. If the results of the research align with the company’s perspectives and viewpoints, then data sharing is copacetic. Even these cooperative settings do pose some risks to researchers, who may create the perception that they are not independent, merely by their association with the company. With purely technical research risks are lower, though still non-zero: for example, because the work depends on privileged data access, the researcher may still face challenges in presenting the research in a way that could help others reproduce it in the future.

With technical work that can inform or speak to policy questions, there are some concerns. First, certain types of research or results may never come to light—if a company doesn’t like the result that may result from the data analysis, then they may simply not share the data, or they may ask for “pre-publication review” for results based on that data (this practice is common for research that is conducted within companies as well). There is also a second, more subtle concern. Even when the work is technically watertight, a researcher can still face questions—fair or unfair—about the soundness of the work due to the perceived motivations or agendas of cooperative parties involved.

Current Data Sharing Approaches are Good, But They are Not Sufficient

The above methods for data sharing can work well for certain types of research. In my career, I have made hay playing by these rules—often working with a company, first by demonstrating the viability of an idea with a smaller dataset that we gather ourselves and “pitching” the idea to a company.

Yet, in my experience these approaches have two shortcomings. The first relates to incentives. The second relates to privacy.

Problem #1: Incentives.

Certain types of work depend on access to Internet data, but the company who holds the data may not have a direct incentive to facilitate the research. Possible studies of Facebook’s effect on elections certainly fall into this category: They simply may not like the results of the research.

But, there are plenty of other lines of research that fall into the category where incentives may not align. Other examples range from measurements of Internet capacity and performance as they relate to broadband regulation (e.g., net neutrality) to evaluation of an online platform’s content moderation algorithms and techniques. Lots of other work relating to consumer protection falls into this category as well. We have to rely on users and researchers measuring things at the edge of the network to figure out what’s going on; from this vantage point, certain activities may naturally slip under the radar more easily.

The current Internet data sharing roadmap doesn’t paint a rosy picture for research where incentives don’t align. Even when incentives do align, there can be perceptions of “capture”—effectively shilling an intellectual or technical finding in exchange for data access.

It is in the interests of everyone—the academics and their industry partners alike—to establish more formal modes of data exchange when either (1) there is determination that the problem is important to study for the health of the Internet, or for the benefit of consumers; (2) there is the potential that the research will be perceived as not objective due to the nature of the data sharing agreement.

Problem #2: Privacy.

Sharing Internet data with researchers can introduce substantial privacy risks, and the need to share data with any researcher who works with a company should be evaluated carefully—ideally by an independent third party.

When helping develop the researcher exception to the FCC’s broadband privacy rules, I submitted a comment that proposed the following criteria for sharing ISP data with researchers:

  1. Purpose of research. The data satisfies research that aims to promote security, stability, and reliability of networks. The research should have clear benefits for Internet innovation, operations, or security.
  2. Research goals do not violate privacy. The goals of the research does not include compromising consumer privacy;
  3. Privacy risks of data sharing are offset by benefits of the research. The risks of the data exchange are offset by the benefits of the research;
  4. Privacy risks of the data sharing are mitigated. Researchers should strive to use de-­identified data wherever possible.
  5. The data adds value to the research. The research is enhanced by access to the data.

Yet, outlining the criteria is one thing. The thornier question (which we did not address!) is: Who gets to decide the answers?

Universities have institutional review boards that can help evaluate the merits of such a data sharing agreement. But, Cambridge Analytica might have the veneer of “research”, and a company may have no internal incentive to independently evaluate the data sharing agreement on its merits. In light of recent events, we may be headed towards the conclusion that such data-sharing agreements should always be vetted by independent third-party review. If the research doesn’t involve a university, however, the natural question is: Who is that third party?

Looking Ahead: Data Clearinghouses for Internet Data?

Certain types of Internet research—particularly those that involve thorny regulatory or policy questions—could benefit from an independent clearing house, where researchers could propose studies and experiments for independent evaluation and have them evaluated and selected by an independent third party, based on their benefits and risks. Facebook is exploring this avenue in the limited setting of election integrity. This is an exciting step.

Moving forward, it will be interesting to see how Facebook’s meta-experiment on data sharing plays out, and whether it—or some variant—can serve as a model for Internet data sharing for other types of work writ large. In purely technical areas, such a clearinghouse could allow a broader range of researchers to explore, evaluate, reproduce and extend the types of work that for now remains largely irreproducible because data is under lock and key. For these questions, there could be significant benefit to the scientific community. In areas where the technical work or data analysis informs policy questions, the benefits to consumers could be even greater.

Oblivious DNS: Plugging the Internet’s Biggest Privacy Hole

by Annie Edmundson, Paul Schmitt, Nick Feamster

The recent news that Cloudflare is deploying their own DNS recursive resolver has once again raised hopes that users will enjoy improved privacy, since they can send DNS traffic encrypted to Cloudflare, rather than to their ISP. In this post, we explain why this approach only moves your private data from the ISP to (yet another) third party. You might trust that third party more than your ISP, but you still have to trust them.  In this post, we present an alternative design—Oblivious DNS—that prevents you from having to make that choice at all.

The Domain Name System (DNS)

When your client turns a domain name like google.com into an IP address, it relies on a recursive DNS resolver to do so. The operator of that resolver sees both your IP address and the domains that you query.

When you—or any of your devices—accesses the Internet, the first step is typically to look up a domain name (e.g., “google.com”, “princeton.edu”) in the Domain Name System (DNS) to determine which Internet address to contact. The DNS is, in essence a phone book for the Internet’s domain names.

Clients that you operate—including your browser, your smartphone, and any IoT device in your home—sends a DNS query for each domain name to a so-called “recursive DNS resolver”.On a typical home network, the default recursive DNS resolver may be operated by your Internet service provider (ISP) (e.g., Comcast, Verizon). Other entities such as Google and Quad9 also operate “open” recursive resolvers that anyone can use, with the idea that these alternative recursive resolvers give users another option for resolving DNS queries other than their ISP. Such alternatives have been useful in the past for circumventing censorship.

DNS: The Internet’s Biggest Privacy Hole

DNS queries are typically sent in cleartext, and they can reveal significant information that an Internet user may want to keep private, including the websites that user is visiting, the IP address or subnet of the device that issued the initial query, and even the types of devices that a user has in his or her home network. For example, our previous research has shown that DNS lookups can be used to de-anonymize traffic from the Tor network.

Because the queries and responses are unencrypted, any third party who can observe communication between a client and a recursive resolver, a recursive resolver, or an authoritative server may also be able to observe various steps in the DNS resolution process.

Operators of recursive DNS resolvers—typically your ISP, but typically whoever the user relies on to resolve recursive DNS queries (e.g., Google) may see individual IP addresses (which may correspond to an ISP subscriber, or perhaps an individual end-device) coupled with the fully qualified domain name that accompanies the query. Even in the case of authoritative resolvers, extensions to DNS such as EDNS0 Client Subnet may reveal information about the user’s IP address or subnet to authoritative DNS servers higher in the DNS hierarchy.

Existing Approaches

Existing proposed standards, including DNS Query Name Minimization and DNS over TLS protect certain aspects of user privacy.

Yet, these approaches do not prevent the operator of the recursive DNS server from learning which IP addresses are issuing queries for particular domain names—the fundamental problem with DNS privacy:

  • DNS Query Name Minimization provides a mechanism that the DNS servers that are authoritative for different parts of the DNS name hierarchy would not learn about the full DNS query. For example, a server that is authoritative for all of *.com would not necessarily learn about a query for maps.google.com, but would only learn that a client needs to resolve some subdomain of google.com. Yet, this mechanism doesn’t prevent the recursive DNS resolver from learning the full DNS query and the IP address of the client that issued the query.
  • DNS over TLS provides a mechanism for encrypting DNS queries. Yet, even with DNS over TLS, the recursive resolver still needs to decrypt the initial query so that it can resolve the query for the client.  It still does not prevent the recursive DNS resolver from learning the query and the IP address that send the query.

Third parties have recently been standing up new DNS resolvers that claim to respect user privacy: Quad9 (9.9.9.9) and Cloudflare’s 1.1.1.1 operate such open DNS recursive resolvers that claim to purge information about user queries. Cloudflare additionally support DNS over HTTPS, which (like DNS over TLS) will ensure that your DNS queries are encrypted from your browser to its recursive DNS resolver.

Yet, in all of these cases, a user has no guarantee that information that an operator learns might be retained, for operational or other purposes. Once such information is retained, of course, it may become vulnerable to other threats to user privacy, including data requests from law enforcement. In short, these services transfer the point of trust from your ISP to some other third party, but you still have to trust that third party.

Oblivious DNS

While you may have good reason to trust a provider that claims to purge all information about your DNS queries, we believe that user’s shouldn’t even have to make that choice in the first place.

The goal of Oblivious DNS (ODNS) is to ensure that no single party observes both the DNS query and the IP address (or subnet) that issued the query. ODNS runs as an overlay of sorts on conventional DNS; it requires no changes to any DNS infrastructure that is already deployed.

Oblivious DNS (ODNS) adds a custom stub resolver at the client to obfuscate the original query, which the authoritative server for ODNS can decrypt. But, the ODNS authoritative server never sees the IP address of the client that issued the query.

Oblivious DNS (ODNS) operates similarly to conventional DNS, but has two new components: 1) each client runs a local ODNS stub resolver, and 2) we add an ODNS authoritative zone that also operates as a recursive DNS resolver. The figure illustrates the basic approach.

When a client application initiates a DNS lookup, the client’s stub resolver obfuscates the domain that the client is requesting (via symmetric encryption), resulting in the recursive resolver being unaware of the requested domain. The authoritative name server for ODNS separates the clients’ identities from their corresponding DNS requests, such that the name servers cannot learn who is requesting specific domains. The steps taken in the ODNS protocol are as follows:

  1. When the client generates a request for www.foo.com, its stub resolver generates a session key k, encrypts the requested domain, and appends the TLD domain .odns, resulting in {www.foo.com}k.odns.
  2. The client forwards this request, with the session key encrypted under the .odns authoritative server’s public key ({k}PK) in the “Additional Information” record of the DNS query to the recursive resolver, which then forwards it to the authoritative name server for .odns.
  3. The authoritative server for ODNS queries decrypts the session key with its private key and subsequently decrypts the requested domain with the session key.
  4. The authoritative server forwards a recursive DNS request to the appropriate name server for the original domain, which then returns the answer to the ODNS authoritative server.
  5. The ODNS authoritative server can thus return the answer (with both the domain and IP address encrypted) to the recursive resolver, which forwards it on to the client’s stub resolver.  In turn, the stub resolver can decrypt both the domain and the IP address.

Other name servers see incoming DNS requests, but these only see the IP address of the ODNS recursive resolver, which effectively proxies the DNS request for the original client. These steps correspond to the following figure.

Prototype Implementation and Preliminary Evaluation

We implemented a prototype in Go to evaluate the feasibility of deploying ODNS, as well as the performance costs of using ODNS as compared to conventional DNS. We implemented an ODNS stub resolver and implemented an authoritative name server that can also issue recursive queries.

ODNS adds 10-20 milliseconds to the resolution time for an uncached DNS query.

We first compared the performance of running an ODNS  query overhead to that of conventional DNS. We issued DNS queries to the Alexa Top 10,000 domains using both ODNS and conventional DNS. The CDF below shows that ODNS adds about 10-20 milliseconds to each query. Of course, in practice DNS makes extensive use of caching, and this experiment shows a worst-case scenario. We expect the overhead in practice to be much smaller.

Along these lines, we also measured how ODNS would affect a typical Internet user’s browsing experience by evaluating the overhead of a full web page load, which involves fetching the page, and conducting any subsequent DNS lookups for embedded objects and resources in the page. We fetched popular web pages that have a lot of content using ODNS and compared the results to performing the same operations with ODNS.

Overhead for loading an entire web page is minimal.

In each group in the bar plot, the left bar in the figure is using conventional DNS and the right bar represents the time it takes using ODNS. We see that there is no significant difference in page load time between ODNS and conventional DNS because DNS lookups contribute minimal overhead to the entire page load process. As before, these experiments were run with a “cold cache”, and in practice we expect the overhead to be even less.

Summary and Next Steps

The past several years have seen much (warranted) concerns over the privacy risks that DNS queries expose. Existing approaches that allow users to use alternative DNS resolvers are a helpful step, but in some sense they merely shift the trust from the user’s ISP to another party. We believe that a better end state is one where the user doesn’t have to place trust in the operator of any DNS recursive resolver. Towards this goal, we have built ODNS to help decouple clients’ identities with their corresponding DNS queries, and have implemented a prototype. As ongoing work, we are working on a larger-scale implementation, deployment and evaluation. Additional information on ODNS can be found at our project website. We welcome any feedback and comments. We are ready to explore opportunities for broader deployment, and we are actively seeking partners to help us deploy ODNS resolvers in operational settings.

When The Choice Is To Delete Facebook Or Buy A Loaf Of Bread

By Julieanne Romanosky and Marshini Chetty

In the last week, there has been a growing debate around Facebook and privacy. On Twitter, the newly formed #deletefacebook movement calls for users who are upset over the data breach of over 50 million Facebook accounts by Cambridge Analytica to rid themselves of the platform altogether. But like others have stated, deleting Facebook may not be the easy option for everyone on the platform because in some countries, Facebook is the Internet. In fact, in 63 countries around the world, Facebook has introduced the Free Basics platform which includes Facebook and offers marginalized users limited “free” browsing on the Internet. More importantly, our recent study, jointly conducted with the University of Maryland [5], suggests that deleting Facebook and Free Basics for low income users could be the difference between saving enough money to afford a loaf of bread or not.

What is Facebook’s Free Basics and why is it being used by low income users?: Free Basics was founded in 2013 by Facebook with the goal of connecting rural and low-income populations to the Internet for the first time. While Free Basics appears as a single app, it is actually a platform for hosting a variety of data-charge free or “zero-rated” applications and the available content changes depending on the country and unpaid partnerships with local service providers, i.e., no two Free Basics offerings are the same. However, all versions provide access to a lite version of Facebook (with no images or video) and select other third party apps such as Bing and Wikipedia. Educational materials, news, weather reports dominate the application topics in Free Basics across countries. Other apps cover health care, job listings, search engines, and classifieds. Here is what the app interface looks like in South Africa:

Free Basics in South Africa

What did we do to investigate Facebook and Free Basics usage?: We interviewed 35 Free Basics users in South Africa, one of the countries that the platform is offered in. We spoke to a combination of current low-income users and non-regular student users. Including both groups in our study allowed us to form a more comprehensive understanding of the impact of zero-rated services, the factors that affect the adoption of these services, and the possible use of these services in more developed countries than if we studied users or non-users alone or those who were unconnected and low-income only. Both groups were asked to talk about their online habits (i.e. time spent online, what websites or apps they used etc), how much money they typically spent on Internet access, and how, if at all, they worked to keep their mobile Internet costs down.

How do low income users use Facebook’s Free Basics?: We found, particularly, the low income users on Free Basics were able to cut their mobile data costs significantly, with one participant in our study exclaiming that they could now afford a loaf of bread with the money saved from being online for “free”. The service also drove users to the “free” apps included in the platform even when they preferred other apps that were not “free” to use. Interestingly, all the participants who used Free Basics regularly were not “unconnected” users who had never been online prior to using the platform. Instead, these participants had been using the Internet as paying customers but they had heard about the platform from others (often through word of mouth) as a way to save on Internet costs. For these users, deleting Facebook and its relevant resources would be like deleting a lifeline in an already expensive data landscape. The platform was not without limitations for our participants however. Since our participants were already online, they were also very conscious of the fact that the apps included in the platform were, in their perception, “second-rate” – for instance, the Facebook app on the platform does not include images or video unless users pay for them. [Read more…]

New Jersey Takes Up Net Neutrality: A Summary, and My Experiences as a Witness

On Monday afternoon, I testified before the New Jersey State Assembly Committee on Science, Technology, and Innovation, which is chaired by Assemblyman Andrew Zwicker, who also happens to represent Princeton’s district.

On the committee agenda were three bills related to net neutrality.

Let’s quickly review the recent events. In December 2017, the Federal Communications Commission (FCC) recently rolled back the now-famous 2015 Open Internet Order, which required Internet service providers (ISPs) to abide by several so-called “bright line” rules, which can be summarized as (1) no blocking lawful Internet traffic; (2) no throttling or degrading the performance of lawful Internet traffic; (3) no paid prioritization of one type of traffic over another; (4) transparency about network management practices that may affect the forwarding of traffic.  In addition to these rules, the FCC order also re-classified Internet service as a “Title II” telecommunications service—placing it under the jurisdiction of the FCC’s rulemaking authority—overturning the previous “Title I” information services classification that ISPs previously enjoyed.

The distinction of Title I vs. Title II classification is nuanced and complicated, as I’ve previously discussed. Re-classification of ISPs as a Title II service certainly comes with a host of complicated regulatory strings attached.  It also places the ISPs in a different regulatory regime than the content providers (e.g., Google, Facebook, Amazon, Netflix).

The rollback of the Open Internet Order reverted not only the ISPs’ classification of Title II service, but also the four “bright line rules”. In response, many states have recently been considering and enacting their own net neutrality legislation, including Washington, Oregon, California, and now New Jersey. Generally speaking, these state laws are far less complicated than the original FCC order. They typically involve re-instating the FCC’s bright-line rules, but entirely avoid the question of Title II classification.

On Monday, the New Jersey State Assembly considered three bills relating to net neutrality. Essentially, all three bills amount to providing financial and other incentives to ISPs to abide by the bright line rules.  The bills require ISPs to follow the bright line rules as a condition for:

  1.  securing any contract with the state government (which can often be a significant revenue source);
  2. gaining access to utility poles (which is necessary for deploying infrastructure);
  3. municipal consent (which is required to occupy a city’s right-of-way).

I testified at the hearing, and I also submitted written testimony, which you can read here. This was my first experience testifying before a legislative committee; it was an interesting and rewarding experience.  Below, I’ll briefly summarize the hearing and my testimony (particularly in the context of the other testifying witnesses), as well as my experience as a testifying witness (complete with some lessons learned).

My Testimony

Before I wrote my testimony, I thought hard about what a computer scientist with my expertise could bring to the table as a testifying expert. I focused my testimony on three points:

  • No blocking and no throttling are technically simple to implement. One of the arguments that those opposed to the legislation are making is that different state laws on blocking and throttling could become exceedingly difficult to implement, particularly if each state has its own laws. In short, the argument is that state laws could create a complex regulatory “patchwork” that is burdensome to implement. If we were considering a version of the several-hundred-page FCC’s Open Internet Order in each state, I might tend to agree. But, the New Jersey laws are simple and concise: each law is only a couple of pages. The laws basically say “don’t block or throttle lawful content”. There are clear carve-outs for illegal traffic, attack traffic, and so forth. My comments essentially focused on the simplicity of implementation, and that we need not fear a patchwork of laws if the default is a simple rule that simply prevents blocking or throttling. In my oral testimony, I added (mostly for color) that the Internet, by the way, is already a patchwork of tens of thousands of independently operated networks across hundreds of countries, and that our protocols support carrying Internet traffic over a variety of physical media, from optical networks to wireless networks to carrier pigeon. I also took the opportunity to make the point that, by the way, ISPs are in a relative sense, pretty good actors in this space right now, in contrast to other content providers who have regularly blocked access to content either for anti-competitive reasons, or as a condition for doing business in certain countries.
  • Prioritization can be useful for certain types of traffic, but it is distinct from paid prioritization. Some ISPs have been making arguments recently that prohibiting paid prioritization would prohibit (among other things) the deployment of high-priority emergency services over the Internet. Of course, anyone who has taking an undergraduate networking course will have learned about prioritization (e.g., Weighted Fair Queueing), as well as how prioritization (and even shaping) can improve application performance, by ensuring that interactive, delay-sensitive applications such as gaming are not queued behind lower priority bulk transfers, such as a cloud backup. Yet, prioritization of certain classes of applications over others is a different matter from paid prioritization, whereby one customer might pay an ISP for higher prioritization over competing traffic. I discussed the differences at length.I also talked about how prioritization and paid prioritization could more generally: it’s not just about what a router does, but about who has access to what infrastructure. The bills address “prioritization” merely as a packet scheduling exercise—a router services one queue of packets at a faster rate than another queue. But, there are plenty of other ways that some content can be made to “go faster” than others; one such example is the deployment of content across a so-called Content Delivery Network (CDN)—a distributed network of content caches that are close to users. Some application or content providers may enjoy unfair advantage (“priority”) over others merely by virtue of the infrastructure it has access to. Today’s laws—neither the repealed FCC rules nor the state law—do not say anything about this type of prioritization, which could be applied in anti-competitive ways.Finally, I talked about how prioritization is a bit of a red herring as long as there is spare capacity. Again, in an undergraduate networking course, we talk about resource allocation concepts such as max-min fairness, where every sender gets the capacity they require as long as capacity exceeds total demand. Thus, it is also important to ensure that ISPs and application providers continue to add capacity, both in their networks and at the interconnects between their networks.
  • Transparency is important for consumers, but figuring out exactly what ISPs should expose, in a way that’s meaningful to consumers and not unduly burdensome, is technically challenging. Consumers have a right to know about the service that they are purchasing from their ISP, as well as whether (and how well) that service can support different applications. Disclosure of network management practices and performance certainly makes good sense on the surface, but here the devil is in the details. An ISP could be very specific in disclosure. It could say, for example, that it has deployed a token bucket filter of a certain size, fill rate, and drain rate and detail the places in its network where such mechanisms are deployed. This would constitute a disclosure of a network management practice, but it would be meaningless for consumers. On the other hand, other disclosures might be so vague as to be meaningless; a statement from the ISP that says they might throttle certain types of high volume traffic a times of high demand might not be meaningful in helping a consumer figure out how certain applications might perform. In this sense, paragraph 226 of the Restoring Internet Freedom Order, which talks about consumers’ needs to understand how the network is delivering service for the applications that they care about is spot on. There’s only one problem with that provision: Technically, ISPs would have a hard time doing this without direct access to the client or server side of an application. In short: Transparency is challenging. To be continued.

The Hearing and Vote

The hearing itself was a interesting. There were several testifying witnesses opposing the bills: Jon Leibowitz, from Davis Polk (retained by Internet Service Providers); and a representative from US Telecom. The arguments against the bills were primarily legal and business-oriented. Essentially, the legal argument against the bills is that the states should leave this problem to the federal government. The arguments are (roughly) as follows: (1) The Restoring Internet Freedom Order prevents state pre-emption; (2) The Federal Trade Commission has this well-in-hand, now that ISPs are back in Title I territory (and as former commissioner, Leibowitz would know well the types of authority that the FTC has to bring such cases, as well as many cases they have brought against Google, Facebook, and others); (3) The state laws will create a patchwork of laws and introduce regulatory uncertainty, making it difficult for the ISPs to operate efficiently, and creating uncertainty for future investment.

The arguments in opposition to the bill are orthogonal to the points I made in my own testimony. In particular, I disclaimed any legal expertise on pre-emption. I was, however, able to comment on whether I thought the second and third arguments held water from a technical perspective. While the second point about the FTC authority is mostly a legal question, I understood enough about the FTC act, and the circumstances under which they bring cases, to comment on whether technically the bills in question give consumers more power than they might otherwise have with just the FTC rules in place. My perspective was that they do, although this point is a really interesting case of the muddy distinction between technology and the law: To really dive into arguments around this point, it helps to know a bit about both technology and the law. I was able to comment on the “patchwork” assertion from a technical perspective, as I discussed above.

At the end of the hearing, there was a committee vote on all three bills. It was interesting to see both the voting process, and the commentary that each committee member made with their votes.  In the end, there were two abstentions, with the rest in favor. The members who abstained did so largely on the legal question concerning state pre-emption—perhaps foreshadowing the next round of legal battles.

Lessons Learned

Through this experience, I once again saw the value in having technologists at the table in these forums, where the laws that govern the future of the Internet are being written and decided on. I learned a couple of important lessons, which I’ve briefly summarized below.

My job was to bring technical clarity, not to advocate policy. As a witness, technically I am picking a side. And, in these settings, even when making technical points, one is typically doing so to serve one side of a policy or legal argument. Naturally, given my arguments, I registered for a witness in favor of the legislation.

However, and importantly: that doesn’t mean my job was to advocate policy.  As a technologist, my role as a witness is to explain to the lawmakers technical concepts that can help them make better sense of the various arguments from others in the room. Additionally, I steered clear of rendering legal opinions, and where my comments did rely on legal frameworks, I made it clear that I was not an expert in those matters, but was speaking on technical points within the context of the laws, as I understood them.  Finally, when figuring out how to frame my testimony, I consulted many people: the lawmakers, my colleagues at Princeton, and even the ISPs themselves. In all cases, I asked these stakeholders about the topics I might focus on, as opposed to asking what, specifically I should say. I thought hard about what a computer scientist could bring to the discussion, as well as ensuring that what I said was technically accurate and correct.

A simple technical explanation is of utmost importance. In such a committee hearing, advocates and lobbyists abound (on both sides); technologists are rare. I suspect I was the only technologist in the room. Additionally, most of the people in the room have jobs to make arguments that serve a particular stakeholder.  In doing so, they may muddy the waters, either accidentally or intentionally. To advance their arguments, some people may even say things that are blatantly false (thankfully that didn’t happen on Monday, but I’ve seen it happen in similar forums). Perhaps surprisingly, such discourse can fly by completely unnoticed, because the people in the room—especially the decision-makers—don’t have as deep of an understanding of the technology as the technologists.  Technologists need to be in the room, to shed light and to call foul—and, importantly, to do so using accessible language and examples that non-technical policy-makers can understand.