July 15, 2018

Routing Attacks on Internet Services

by Yixin Sun, Annie Edmundson, Henry Birge-Lee, Jennifer Rexford, and Prateek Mittal

[In this post, we discuss a recent thread of research that highlights the insecurity of Internet services due to the underlying insecurity of Internet routing. We hope that this thread facilitates important dialog in the networking, security, and Internet policy communities to drive change and adoption of secure mechanisms for Internet routing]

The underlying infrastructure of the Internet comprises physical connections between more than 60,000 entities known as Autonomous Systems (such as AT&T and Verizon). Internet routing protocols such as the Border Gateway Protocol (BGP) govern how our communications are routed over a series of autonomous systems to form an end-to-end communication channel between a sender and receiver.

Unfortunately, Internet routing protocols were not designed with security in mind. The insecurity in the BGP protocol allows potential adversaries to manipulate how routing on the Internet occurs. For example, see this recent real-world example of BGP attacks against Mastercard, Visa, and Symantec. The insecurity of BGP is well known, and a number of protocols have been designed to secure Internet routing. However, we are a long ways away from large-scale deployment of secure Internet routing protocols.  

This status quo is unacceptable.

Historically, routing attacks have been viewed primarily from the perspective of an attack on availability of Internet applications.  For example, an adversary can hijack Internet traffic towards a victim application server and cause unavailability (see YouTube’s 2008 hijack). A secondary perspective is that of confidentiality of unencrypted Internet communications. For example, an adversary can manipulate Internet routing to position itself on the communication path between a client and the application server and record unencrypted traffic: http://dyn.com/blog/mitm-internet-hijacking/

In this post, we  argue that conventional wisdom significantly underestimates the vulnerabilities introduced due to insecurity of Internet routing. In particular, we discuss recent research results that exploit BGP insecurity to attack the Tor network, TLS encryption, and the Bitcoin network.

BGP attacks on anonymity systems/Tor: The Tor network is a deployed system for anonymous communication that aims to protect user identity (IP address) in online communications. The Tor network comprises of over 7,000 relays which together carry terabytes of traffic every day. Tor serves millions of users, including political dissidents, whistle-blowers, law-enforcement, intelligence agencies, journalists, businesses and ordinary citizens concerned about the privacy of their online communications.

Tor clients redirect their communications via a series of proxies for anonymous communication. Layered encryption is used such that each proxy only observes the identity of the previous hop and the next hop in the communication, and no single proxy observes the identities of both the client and the destination.

However, if an adversary can observe the traffic from the client to the Tor network, and from the Tor network to the destination, then it can leverage correlation between packet timing and sizes to infer the network identities of clients and servers (end-to-end timing analysis). Therefore, an adversary can first use BGP attacks to hijack or intercept Internet traffic towards the Tor network (Tor relays), and perform traffic analysis of encrypted communications to compromise user anonymity.

It is important to note that this timing analysis works even if the communication is encrypted. This illustrates an important point — the insecurity of Internet routing has important consequences for traffic-analysis attacks, which allow adversaries to infer sensitive information from communication meta-data (such as source IP, destination IP, packet size and packet timing), even if communication is encrypted.

We introduced the threat of “Routing Attacks on Privacy in Tor” (RAPTOR attacks) at USENIX Security in 2015. We demonstrated the feasibility of RAPTOR attacks on the Tor network by performing real-world Internet routing manipulation in a controlled and ethical manner.  Interested readers can see the technical paper and our project webpage for more details.

Routing attacks challenge conventional beliefs about security of anonymity systems, and also have broad applicability to low-latency anonymous communication (including systems beyond Tor, such as I2P). Our work also motivates the design of anonymity systems that successfully resist the threat of Internet routing manipulation. The Tor project is already implementing design changes (such as Tor proposal 247 and Tor proposal 271) that make it harder for an adversary to infer and manipulate the client’s entry point (proxy) into the Tor network. Our follow-up work on Counter-RAPTOR defenses (presented at the IEEE Security and Privacy Symposium in 2017) presents a monitoring framework to analyze routing updates for the Tor network, which is being integrated into the Tor metrics portal.

BGP attacks on TLS/Digital Certificates: The Transport Layer Security (TLS) protocol allows a client to establish a secure communication channel with a destination website using cryptographic key exchange protocols. To prevent man-in-the-middle attacks, clients using the TLS protocols need to authenticate the public key corresponding to the destination site, such as a web-server. Digital certificates issued by trusted Certificate Authorities (such as Let’s Encrypt) provide an authentic binding between destination server and its public key, allowing a client to validate the destination server. Given the widespread use of TLS for secure Internet communications, the security of the digital certificate ecosystem is paramount.  

We have shown that the process for obtaining digital certificates from trusted certificate authorities (called domain validation) is vulnerable to attack.

A domain owner can perform a Certificate Signing Request (CSR) to a trusted Certificate Authority to obtain a digital certificate.  The Certificate Authority must verify that the party submitting the request actually has control over the domains that are covered by that CSR. This process is known as domain control verification and is a core part of the Public Key Infrastructure (PKI) used in the TLS protocol.

In our ongoing work in progress, presented at the HotPETS workshop in 2017, we demonstrated the feasibility of exploiting BGP attacks to compromise the domain validation protocol. For example,  HTTP domain verification is a common method of domain control verification that requires the domain owner to upload a string specified by the CA to a specific HTTP URL at the domain. The CA can then verify the domain via a HTTP GET request. However, an adversary can manipulate inter-domain routing via BGP attacks to intercept all traffic towards the victim web-server, and successfully obtain a fraudulent digital certificate by spoofing a HTTP response corresponding to the CA challenge message. We have performed real-world Internet routing manipulation in a controlled and ethical manner to demonstrate the feasibility of these attacks. See our attack demonstration video for a demo.

This attack has significant consequences for privacy of our online communications, as adversaries can bypass cryptographic protection offered by encryption using fraudulently obtained digital certificates. Our work is leading to deployment of suggested countermeasures (verification from multiple vantage points) at Let’s Encrypt. Please see the Let’s Encrypt deployment for more details.

So far, we have discussed our research results from Princeton University. Below, I’ll briefly discuss research from Laurent Vanbever’s group at ETHZ and Sharon Goldberg’s Group at Boston University that have shown that it is possible to use inter-domain routing manipulation for attacking Bitcoin and for bypassing legal protections.

BGP attacks on Crypto-currencies/Bitcoin: BGP manipulation can be used to perform two main types of attacks on crypto-currencies such as Bitcoin: (1) partitioning attacks, in which an adversary aims to disconnect a set of victim Bitcoin nodes from the network, or (2) delaying attacks, in which an adversary can slow down the propagation of data towards victim Bitcoin nodes. Both of these attacks result in potential economic loss to Bitcoin nodes.

BGP attacks for bypassing legal protections: Domestic communications between US citizens have legal protections against surveillance. However, adversaries can manipulate inter-domain routing such that the actual communication path involves a foreign country, which could invalidate the legal protections and allow large-scale surveillance of online communications.

Concluding Thoughts:  The emergence of routing attacks on anonymity systems, Internet domain validation, and cryptocurrencies showcases that conventional wisdom has significantly underestimated the attack surface introduced due to the insecurity of Internet routing. It is imperative for critical Internet applications to be aware of the insecurity of Internet routing, and analyze the resulting security threats.

Given the vulnerabilities in Internet routing, applications should consider domain specific defense mechanisms for enhancing user security and privacy. Examples include our Counter-RAPTOR analytics for Tor and Multiple vantage point defense for domain validation). We hope that our work, and the research discussed above is an enabler for this vision.

While it is important to design and deploy application-specific defenses for protecting our systems against routing attacks that exploit current insecure Internet infrastructure, it is even more important to rethink the status quo of insecure routing protocols. Our ultimate goal ought to be to fundamentally eliminate the insecurity in today’s Internet routing protocols by moving towards the adoption of secure countermeasures. How do we drive this change?

Differential Privacy is Vulnerable to Correlated Data — Introducing Dependent Differential Privacy

[This post is joint work with Princeton graduate student Changchang Liu and IBM researcher Supriyo Chakraborty. See our paper for full details. — Prateek Mittal ]

The tussle between data utility and data privacy

Information sharing is important for realizing the vision of a data-driven customization of our environment. Data that were earlier locked up in private repositories are now being increasingly shared for enabling new context-aware applications, better monitoring of population statistics, and facilitating academic research in diverse fields. However, sharing personal data gives rise to serious privacy concerns as the data can contain sensitive information that a user might want to keep private. Thus, while on one hand, it is imperative to release utility-providing information, on the other hand, the privacy of users whose data is being shared also needs to be protected.

What is differential privacy?

Among existing data privacy metrics, differential privacy (DP) has emerged has a rigorous mathematical framework for defining and preserving privacy, and has received considerable attention from the privacy community. DP is used for protecting the privacy of individual users’ data while publishing aggregate query responses over statistical databases, and guarantees that the distribution of query responses changes only slightly with the addition or deletion of a single tuple (entry or record) in the database. Thus, after observing the query response, an adversary is limited in its ability to infer sensitive data about any tuple and in turn about the user contributing the tuple. For example, if the government aims to publish the average salary of individuals in New Jersey, DP can reveal the approximate average salary of this region while protecting the privacy of any individual user’s salary. DP achieves this balance between utility and privacy by adding noise (perturbation) to the true query result, for example, via the Laplace perturbation mechanism.

Differential privacy is vulnerable to data correlation!

To provide its guarantees, DP implicitly assumes that the data tuples in the database, each from a different user, are all independent. This is a weak assumption, especially because tuple dependence occurs naturally in datasets due to social interactions between users. For example, consider a social network graph with nodes representing users, and edges representing friendship relationships. An adversary can infer the existence of a social link between two users that are not explicitly connected in the graph using the existence of edges among other users in the graph ( since two friends are likely to have many other friends in common). Similarly, private attributes in a user’s record can be inferred by exploiting the public attributes of other users sharing similar interests. An adversary can easily infer a user’s susceptibility to a contagious disease using access to the noisy query result and also the fact that the user’s immediate family members are part of the database. In an era of big-data, the tuples within a database present rich semantics and complex structures that inevitably lead to data correlation. Therefore, data correlation, i.e., the probabilistic dependence relationship among tuples in the database, which is implicitly ignored in DP, can seriously deteriorate the privacy properties of DP.

Previous work has investigated this problem and proposed several interesting privacy metrics, such as zero-knowledge privacy, pufferfish privacy, membership privacy, inferential privacy, etc. However, the privacy community is lacking effective perturbation mechanisms that can achieve these proposed privacy metrics.

In a paper that we presented at the Network and Distributed System Security Symposium (NDSS 2016), we first demonstrate a Bayesian attack on differentially private mechanisms using real-world datasets (such as Gowalla location data). Our attack exploits the correlation between location information and social information of users in the Gowalla dataset. We show that it is possible to infer a user’s sensitive location information from differentially private responses by exploiting her social relationships. When data is correlated, DP underestimates the amount of noise that must be added to achieve a desired bound on the adversary’s ability to make sensitive inferences. Therefore, correlation (or dependency) among data tuples would break the expected privacy guarantees.

How can we improve differential privacy to protect correlated data? Introducing dependent differential privacy.

Overall, our work generalizes the classic differential privacy framework to explicitly consider correlations in datasets, thus enhancing the applicability of the approach to a wide range of datasets and applications. We formalize the notion of dependent differential privacy (DDP); DDP aims to explicitly incorporate probabilistic dependency constraints among tuples in the privacy formulation. We also propose a perturbation mechanism to achieve the privacy guarantees in DDP; our approach extends the conventional Laplace perturbation mechanism by incorporating the correlation between data tuples. To do so, we introduce the concept of dependent coefficient between two tuples, which aims to capture the correlation between data tuples from the perspective of privacy. (The dependent coefficient between two tuples evaluates the ratio between the dependent indistinguishability of one tuple induced by the other tuple and the self indistinguishability of the tuple itself.) Our perturbation mechanism can be applied for arbitrary data queries and we validated its effectiveness using real-world datasets.

What are future research directions for dependent differential privacy?

First, to form a deeper understanding of our dependent differential privacy framework, it would be interesting to explore the application of standard concepts in differential privacy to our framework, such as local sensitivity and smooth sensitivity. Second, the effectiveness of our proposed perturbation mechanism depends on how well the dependence among data can be modeled and computed. One limitation of our work is that the dependence coefficient is exactly known to both the adversary and the database owner. How to accurately compute the dependence coefficient and deal with potential underestimation would be an interesting future work (note that the overestimation of dependence coefficient continues to provide rigorous DDP guarantees). Finally, in our dependent differential privacy framework, we only considered pairwise correlations between tuples in a database. An important research direction is to generalize the pairwise correlations that we considered in our paper to joint correlations in the dataset.