July 19, 2018

Oblivious DNS: Plugging the Internet’s Biggest Privacy Hole

by Annie Edmundson, Paul Schmitt, Nick Feamster

The recent news that Cloudflare is deploying their own DNS recursive resolver has once again raised hopes that users will enjoy improved privacy, since they can send DNS traffic encrypted to Cloudflare, rather than to their ISP. In this post, we explain why this approach only moves your private data from the ISP to (yet another) third party. You might trust that third party more than your ISP, but you still have to trust them.  In this post, we present an alternative design—Oblivious DNS—that prevents you from having to make that choice at all.

The Domain Name System (DNS)

When your client turns a domain name like google.com into an IP address, it relies on a recursive DNS resolver to do so. The operator of that resolver sees both your IP address and the domains that you query.

When you—or any of your devices—accesses the Internet, the first step is typically to look up a domain name (e.g., “google.com”, “princeton.edu”) in the Domain Name System (DNS) to determine which Internet address to contact. The DNS is, in essence a phone book for the Internet’s domain names.

Clients that you operate—including your browser, your smartphone, and any IoT device in your home—sends a DNS query for each domain name to a so-called “recursive DNS resolver”.On a typical home network, the default recursive DNS resolver may be operated by your Internet service provider (ISP) (e.g., Comcast, Verizon). Other entities such as Google and Quad9 also operate “open” recursive resolvers that anyone can use, with the idea that these alternative recursive resolvers give users another option for resolving DNS queries other than their ISP. Such alternatives have been useful in the past for circumventing censorship.

DNS: The Internet’s Biggest Privacy Hole

DNS queries are typically sent in cleartext, and they can reveal significant information that an Internet user may want to keep private, including the websites that user is visiting, the IP address or subnet of the device that issued the initial query, and even the types of devices that a user has in his or her home network. For example, our previous research has shown that DNS lookups can be used to de-anonymize traffic from the Tor network.

Because the queries and responses are unencrypted, any third party who can observe communication between a client and a recursive resolver, a recursive resolver, or an authoritative server may also be able to observe various steps in the DNS resolution process.

Operators of recursive DNS resolvers—typically your ISP, but typically whoever the user relies on to resolve recursive DNS queries (e.g., Google) may see individual IP addresses (which may correspond to an ISP subscriber, or perhaps an individual end-device) coupled with the fully qualified domain name that accompanies the query. Even in the case of authoritative resolvers, extensions to DNS such as EDNS0 Client Subnet may reveal information about the user’s IP address or subnet to authoritative DNS servers higher in the DNS hierarchy.

Existing Approaches

Existing proposed standards, including DNS Query Name Minimization and DNS over TLS protect certain aspects of user privacy.

Yet, these approaches do not prevent the operator of the recursive DNS server from learning which IP addresses are issuing queries for particular domain names—the fundamental problem with DNS privacy:

  • DNS Query Name Minimization provides a mechanism that the DNS servers that are authoritative for different parts of the DNS name hierarchy would not learn about the full DNS query. For example, a server that is authoritative for all of *.com would not necessarily learn about a query for maps.google.com, but would only learn that a client needs to resolve some subdomain of google.com. Yet, this mechanism doesn’t prevent the recursive DNS resolver from learning the full DNS query and the IP address of the client that issued the query.
  • DNS over TLS provides a mechanism for encrypting DNS queries. Yet, even with DNS over TLS, the recursive resolver still needs to decrypt the initial query so that it can resolve the query for the client.  It still does not prevent the recursive DNS resolver from learning the query and the IP address that send the query.

Third parties have recently been standing up new DNS resolvers that claim to respect user privacy: Quad9 (9.9.9.9) and Cloudflare’s 1.1.1.1 operate such open DNS recursive resolvers that claim to purge information about user queries. Cloudflare additionally support DNS over HTTPS, which (like DNS over TLS) will ensure that your DNS queries are encrypted from your browser to its recursive DNS resolver.

Yet, in all of these cases, a user has no guarantee that information that an operator learns might be retained, for operational or other purposes. Once such information is retained, of course, it may become vulnerable to other threats to user privacy, including data requests from law enforcement. In short, these services transfer the point of trust from your ISP to some other third party, but you still have to trust that third party.

Oblivious DNS

While you may have good reason to trust a provider that claims to purge all information about your DNS queries, we believe that user’s shouldn’t even have to make that choice in the first place.

The goal of Oblivious DNS (ODNS) is to ensure that no single party observes both the DNS query and the IP address (or subnet) that issued the query. ODNS runs as an overlay of sorts on conventional DNS; it requires no changes to any DNS infrastructure that is already deployed.

Oblivious DNS (ODNS) adds a custom stub resolver at the client to obfuscate the original query, which the authoritative server for ODNS can decrypt. But, the ODNS authoritative server never sees the IP address of the client that issued the query.

Oblivious DNS (ODNS) operates similarly to conventional DNS, but has two new components: 1) each client runs a local ODNS stub resolver, and 2) we add an ODNS authoritative zone that also operates as a recursive DNS resolver. The figure illustrates the basic approach.

When a client application initiates a DNS lookup, the client’s stub resolver obfuscates the domain that the client is requesting (via symmetric encryption), resulting in the recursive resolver being unaware of the requested domain. The authoritative name server for ODNS separates the clients’ identities from their corresponding DNS requests, such that the name servers cannot learn who is requesting specific domains. The steps taken in the ODNS protocol are as follows:

  1. When the client generates a request for www.foo.com, its stub resolver generates a session key k, encrypts the requested domain, and appends the TLD domain .odns, resulting in {www.foo.com}k.odns.
  2. The client forwards this request, with the session key encrypted under the .odns authoritative server’s public key ({k}PK) in the “Additional Information” record of the DNS query to the recursive resolver, which then forwards it to the authoritative name server for .odns.
  3. The authoritative server for ODNS queries decrypts the session key with its private key and subsequently decrypts the requested domain with the session key.
  4. The authoritative server forwards a recursive DNS request to the appropriate name server for the original domain, which then returns the answer to the ODNS authoritative server.
  5. The ODNS authoritative server can thus return the answer (with both the domain and IP address encrypted) to the recursive resolver, which forwards it on to the client’s stub resolver.  In turn, the stub resolver can decrypt both the domain and the IP address.

Other name servers see incoming DNS requests, but these only see the IP address of the ODNS recursive resolver, which effectively proxies the DNS request for the original client. These steps correspond to the following figure.

Prototype Implementation and Preliminary Evaluation

We implemented a prototype in Go to evaluate the feasibility of deploying ODNS, as well as the performance costs of using ODNS as compared to conventional DNS. We implemented an ODNS stub resolver and implemented an authoritative name server that can also issue recursive queries.

ODNS adds 10-20 milliseconds to the resolution time for an uncached DNS query.

We first compared the performance of running an ODNS  query overhead to that of conventional DNS. We issued DNS queries to the Alexa Top 10,000 domains using both ODNS and conventional DNS. The CDF below shows that ODNS adds about 10-20 milliseconds to each query. Of course, in practice DNS makes extensive use of caching, and this experiment shows a worst-case scenario. We expect the overhead in practice to be much smaller.

Along these lines, we also measured how ODNS would affect a typical Internet user’s browsing experience by evaluating the overhead of a full web page load, which involves fetching the page, and conducting any subsequent DNS lookups for embedded objects and resources in the page. We fetched popular web pages that have a lot of content using ODNS and compared the results to performing the same operations with ODNS.

Overhead for loading an entire web page is minimal.

In each group in the bar plot, the left bar in the figure is using conventional DNS and the right bar represents the time it takes using ODNS. We see that there is no significant difference in page load time between ODNS and conventional DNS because DNS lookups contribute minimal overhead to the entire page load process. As before, these experiments were run with a “cold cache”, and in practice we expect the overhead to be even less.

Summary and Next Steps

The past several years have seen much (warranted) concerns over the privacy risks that DNS queries expose. Existing approaches that allow users to use alternative DNS resolvers are a helpful step, but in some sense they merely shift the trust from the user’s ISP to another party. We believe that a better end state is one where the user doesn’t have to place trust in the operator of any DNS recursive resolver. Towards this goal, we have built ODNS to help decouple clients’ identities with their corresponding DNS queries, and have implemented a prototype. As ongoing work, we are working on a larger-scale implementation, deployment and evaluation. Additional information on ODNS can be found at our project website. We welcome any feedback and comments. We are ready to explore opportunities for broader deployment, and we are actively seeking partners to help us deploy ODNS resolvers in operational settings.

How have In-Flight Web Page Modification Practices Changed over the Past Ten Years?

When we browse the web, there are many parties and organizations that can see which websites we visit, because they sit on the path between web clients (our computers and mobile devices), and the web servers hosting the sites we request. Most obviously, Internet Service Providers (ISPs) are responsible for transmitting our web traffic, but reports (e.g. [1], [2], [3]) have shown that they may also inject ads into users’ requested web pages to increase revenue. Other parties may also intercept our web traffic for a wide variety of reasons: content-distribution networks (or CDNs) receive requests for websites that are geographically farther away to speed up response time, enterprise software and programs running on our devices may check incoming websites for added security or privacy before passing the website to our browser, and malicious adversaries may attempt to inject malware into requested web content before we receive it.

 

In 2007, a research group at the University of Washington conducted a study to measure how often these web page modifications occur in practice, and to determine who is responsible for the modifications. Web page modifications were identified using a small piece of software embedded in a test web page, a so-called “web tripwire”, that compared a known good representation of the web page with the version of the test web page users saw in their browsers. The researchers then attributed the modifications to ISPs, malicious attackers, and client software such as ad blockers, using IP addresses and by finding identifying keywords in the injected web content. They found that only about 1.3% of participating web clients saw page modifications. But much about how we interact with and browse the web has changed over the past ten years. More specifically, with the emergence of mobile technologies and new network parties such as CDNs, it is important to learn if and how these new developments have affected in-flight modification practices.

 

We invite you to take part in our research study. Following the same setup as the UW study, we have created a test web page containing a “web tripwire”. If it detects any in-flight page modifications in our test page, it sends us a copy of the modified version of our web page that your browser received. We minimize the information that we collect to detect page modifications. In addition to page modification data, we only record information that web servers normally record, such as IP address, browser type, date and time of page request, and a cookie to differentiate between users. We will permanently remove any personal information found in the page modifications before sending the modification data to our servers.

 

By participating in this study, you are helping us gather information crucial for guiding research and building tools to improve web privacy. If you’re willing to contribute to our study, it’s as simple as visiting our test web page: http://stormship.cs.princeton.edu. If possible, we also ask you to visit our page through multiple different devices and browsers, as this will help diversify our collected data. Our test page contains more details about our study, and we will post our results there when we have completed our measurements.

Please reach out to or with any questions, concerns, or feedback. We greatly appreciate your help in our efforts to improve web privacy!

Routing Detours: Can We Avoid Nation-State Surveillance?

Since 2013, Brazil has taken significant steps to build out their networking infrastructure to thwart nation-state mass surveillance.  For example, the country is deploying a 3,500-mile fiber cable from Fortaleza, Brazil to Portugal; they’ve switched their government email system from Microsoft Outlook to a state-built system called Expresso; and they now have the largest IXP ecosystem in the world.  All of these measures aim to prevent the country’s Internet traffic from traversing the United States, thereby preventing the United States from conducting surveillance on their citizens’ data.  But Brazil isn’t the only country that has concerns about their Internet traffic passing through the United States.  Deutsche Telekom lobbied for tougher privacy protection by keeping German traffic within its national borders.  Canadian traffic has been found to routinely pass through the United States, which is a violation of Canadian network sovereignty.  Russian president Putin has called for “better protection of communication networks” and passed a law that requires foreign companies to keep Russian users’ data on servers inside the country.  

To quantify which countries Internet traffic traverses and measure how successful any particular country might be at detouring its traffic around known surveillance states, we actively measured and analyzed the traffic originating in five different countries: Brazil, Netherlands, Kenya, India, and the United States.  

  • First, to understand the current state of transnational routing (the “status quo”), we measured the country-level traffic paths for the Alexa Top 100 domains in each respective country using RIPE Atlas probes and the MaxMind geolocation service.  
  • Next, we measured how successful clients in Brazil, Netherlands, Kenya, India, and the United States might be at avoiding other countries of interest using open DNS resolvers and using an overlay network.  

The rest of this post summarizes these two parts of the study and highlights some of the results.

The Status Quo: Even Local Traffic Can Detour through Surveillance States

Despite the extreme efforts of certain countries to “keep local traffic local”, and in particular to avoid having traffic traverse the United States, our measurement study indicates that these goals have not yet been reached, for two reasons: 1) lack of domain hosting diversity and 2) lack of routing diversity.

Lack of Hosting Diversity. We find that hosting for many popular websites lacks diversity. We found that about half of the Alexa Top 100 domains are hosted in a single country; in these cases, a user cannot avoid the domain’s hosting country when accessing it.  In many cases, even popular local websites are hosted outside the country where citizens are trying to access them.  For example, more than 50% of the top domains in Brazil and India are hosted in the United States; in total, about 50% of the .br domains are hosted outside Brazil. More hosting diversity, as could be enabled with CDNs, would allow for the potential to avoid more countries more often.

Lack of Geographic Diversity. Internet paths also lack geographic diversity: about half of the paths originating in Kenya to the most popular Kenyan websites traverse the United States or Great Britain.  Much of this phenomenon is due to “tromboning,” whereby an Internet path starts and ends in a country, yet transits an intermediate country; for example, about 13% of the paths that we explored from RIPE Atlas probes in Brazil to the top domains in Brazil trombone through the United States. More than 50% of the paths from the Netherlands to their top domains transit the United States, and about half of Kenyan paths traverse the United States and Great Britain.

Towards User-Controlled Routing Detours

We next asked whether clients could take advantage of the fact that many popular websites are georeplicated, coupled with a client’s ability to selectively “bounce” packets through overlay nodes, might give some users opportunities to avoid certain countries. We studied whether users could exploit open DNS resolvers to discover hosting diversity, and overlay network relays to intentionally introduce routing detours. Previous work in overlay networks, such as RON, tries to route around failures, whereas our work tries to route around countries.  Our results show that in some cases, users can select paths to specifically avoid certain countries; in cases where local traffic leaves the country only to return (a phenomenon sometimes called “tromboning”), the use of local relays can sometimes ensure that local traffic stays within the country.  For example, without these techniques, Brazilian traffic transited Spain, Italy, France, Great Britain, Argentina, Ireland (among others). Using even a modest overlay network deployment of 12 relays across 10 countries (Brazil, United States, Ireland, Germany, Spain, France, Singapore, Japan, South Korea, and Australia), clients in Brazil could completely avoid these countries for the top 100 domains.  The overlay network can also be used to keep local traffic local; the percentage of tromboning paths from Brazil decreases from 13.2% of domestic paths to 9.7%.

Unfortunately, some of the more prominent surveillance states are also some of the least avoidable countries.  Most countries depend highly on the United States for connectivity to other locations on the Internet.  Neither Brazil, India, Kenya, nor the Netherlands can completely avoid the United States with the country avoidance techniques.  The inability of these techniques to successfully avoid the United States typically results from the lack of hosting diversity for many websites, which are solely hosted in the United States. Using the overlay network, both Brazilian and Netherlands clients were able to avoid the United States for about 65% of sites; even in these cases, the United States is completely unavoidable for about 10% of sites.  Traffic from Kenya can avoid the United States for only about 40% of the top domains.  On the other hand, the United States can avoid every other country for all sites, with the exception of France and the Netherlands which the United States can nonetheless avoid for 99% of the top 100 domains.  

More Information and Next Steps

A more detailed writeup is available on the RANSOM project website (https://ransom.cs.princeton.edu/). Encouraged by the ability to use overlay networks to avoid surveillance states in certain cases, we are in the process of designing and building a RANSOM prototype. We welcome feedback on this project as we embark on the next steps.