June 24, 2017

The Effect of DNS on Tor’s Anonymity

This blog post is joint work with Benjamin Greschbach, Tobias Pulls, Laura M. Roberts, and Nick Feamster.

Counting almost two million daily users and 7,000 relays, the Tor network is the largest anonymity network operating today. The Tor Project is maintaining a privacy-enhanced version of the popular Firefox web browser—Tor Browser—that bounces its network traffic over the Tor network. By using Tor Browser, users can protect their web browsing from government surveillance, stalkers, advertisement-injecting ISPs, and nosy neighbors. Tor works by decoupling a user’s identity (i.e., the IP address, which reveals where in the world you are) from her activity (e.g., visiting Facebook). When using Tor, your ISP can no longer learn what websites you visit, but it does know that you are using Tor. The website you visit (e.g., Facebook) will also know that you are using Tor, but it will not learn your IP address. The EFF maintains a great interactive diagram, illustrating information leakage in several scenarios.

While the use of Tor constitutes a significant privacy gain over off-the-shelf web browsers, it is no panacea, and the Tor Project is upfront about its limitations. These limitations are not news to the research community. It is well understood that low-latency anonymity networks such as Tor cannot protect against so-called global passive adversaries. We define such adversaries as those with the ability to monitor both network traffic that enters and exits the network. Then the adversary can run a correlation attack, meaning that it can match packets that go into the network to packets that leave it, or in other words, it can link a client’s identity (her IP address) to her activity (e.g., visiting Facebook), and thus, break anonymity. By design, Tor cannot protect against these global passive adversaries. However, there is still merit in understanding which organizations are in a position to launch such attacks and how practical these attacks are. Only then can we gain an understanding of how much of a real-world threat these attacks constitute.

The research community has put a lot of effort into modeling correlation attacks and their effect on Tor users. To date, these studies have modeled mostly web traffic, i.e., HTTP over port 80 and HTTPS over port 443, which makes sense as web traffic likely constitutes the bulk of a typical Tor user’s online activity. However, there is more to network traffic than just web traffic. Before a browser can fetch a web page such as Facebook, it has to send out a DNS request to learn that the domain facebook.com is reachable via its IP address 173.252.120.68 (among others). DNS traffic is ubiquitous, both over and outside of Tor, so how does this relate to correlation attacks? DNS is a very different beast than HTTP, which is what most research has studied. HTTP traffic travels from a user’s exit relay directly to one or more web servers. DNS traffic, however, can traverse very different paths, depending on how exit relays (the relays in the Tor network that provide the link between Tor and the Internet) are configured. In particular, we find that DNS traffic traverses many networks that are entirely different than the networks that subsequent web traffic traverses. This means that past research likely underestimated the threat of correlation attacks because there are entities on the Internet such as ISPs, autonomous systems, or Internet exchange points that can monitor some DNS traffic but not web traffic coming out of the Tor network and potentially use the DNS traffic to deanonymize Tor users.

Overview of concept

Past traffic correlation studies have focused on linking the TCP stream entering the Tor network to the one(s) exiting the network. We show that an adversary can also link the associated DNS traffic, which can be exposed to many more autonomous systems than the TCP stream.

After many months of work, we have published a research paper on how DNS affects Tor’s anonymity. In our work, we show how an adversary can combine monitored DNS requests with well-understood website fingerprinting attacks to create a new type of DNS-enhanced correlation attack, or DNS-enhanced website fingerprinting attack, depending on how you look at it. We know of several organizations that are in a position to use these attacks, notably Google. Because it operates a popular open DNS resolver, the company can already observe many DNS requests. Additionally, Google can monitor some network traffic that is entering the Tor network: for example, via Google Fiber, via guard relays that are occasionally run in Google’s cloud, and formerly via meek app engine, which is now defunctOur results show that by monitoring DNS traffic, an attacker can enhance a website fingerprinting attack to be highly reliable. When our new, combined attack identifies a website, it is highly likely to be correct for a large number of websites on the Internet. Furthermore, we show that our new type of attack works well on websites that are infrequently visited over the entire Tor network globally. Presumably, this includes censored websites in different regions of the world and websites for whistleblowers and activists, who are in many ways Tor users in most need of protection. Beyond further motivating the deployment of website fingerprinting defenses in Tor, our attack has implications for the design of such defenses: they should probably be stronger than previously thought.

Our study also provides new insight into the current state of exit relays in the Tor network: We find that a significant fraction of exit relays send DNS requests to Google’s public resolvers. Google sees about onethird of DNS requests that exit from the Tor network—an alarmingly high fraction for a single company, particularly because Tor’s very design avoids centralized points of control and observation. Although Tor is reasonably decentralized, our work shows that this does not hold for the wider ecosystem that Tor exists in. We also simulate the Internet-scale implications of our work using the Tor Path Simulator (TorPS). This allows us to understand what-if scenarios such as the implications for Tor users if all exit relays were to run their own local resolvers. Our results show that while running local DNS resolvers on exit relays has its benefits, it also means that DNS requests are very exposed to network-level adversaries.

So what does our work mean for Tor users? We believe that there is no immediate cause for concern. Adversaries that can already monitor large fractions of the Internet—for many people, the biggest threat—will not do any better with our attack. Instead, we investigate how “semi-global” adversaries can get the most out of the data they have. Finally, the Tor Project is already working on techniques to make website fingerprinting attacks harder.

Our project page has more information about our study. Together with the research paper that is now under review, we also publish our code, data, and detailed instructions to replicate our work.

Improving Bitcoin’s Privacy and Scalability with TumbleBit

Last week we unveiled TumbleBit, a new anonymous payments scheme that addresses two major technical challenges faced by Bitcoin today: (1) scaling Bitcoin to meet increasing use, and (2) protecting the privacy of payments made via Bitcoin. Our proof-of-concept source code and a pre-print of the latest version of our paper were both posted online last week. In this post, I’ll walk through the motivation and the design of TumbleBit, and explain how it achieves the following three important features:

  1. TumbleBit works with today’s Bitcoin protocol. No changes to Bitcoin are required.
  2. TumbleBit payments are processed off of the Bitcoin blockchain, helping Bitcoin scale to higher transaction velocity and volume. Like Bitcoin’s on-blockchain transactions, the safety and security of payments sent via TumbleBit do not require trust in any third party. The TumbleBit payment hub can not steal a user’s Bitcoins.
  3. TumbleBit provides anonymity for its off-blockchain payments even against the TumbleBit service. The exact property we guarantee is unlinkability (as explained in detail below).

Bitcoin Anonymity Concerns

Bitcoin is the ultimate transparent currency. Each transaction reveals the value of bitcoin that was transferred, and the addresses (Bitcoin identities) of both the party transferring the payment and the party receiving the payment. Even worse, these transaction details are recorded in an immutable and massively-replicated ledger which can be read by anyone, i.e. Bitcoin’s blockchain. Feeding this concern is a burgeoning blockchain analytics and surveillance industry.

This presents problems for businesses that want to use Bitcoin, but don’t want to open their books to competitors via the blockchain. It also presents problems for regular users. Much has been made of companies using social networking profiles to make employment decisions and a transparent currency would make this even worse. Just as common wisdom suggests users should be empowered to choose appropriate privacy settings on social networks, users also need to tools to that limit the exposure of their private financial and purchasing history on Bitcoin’s blockchain.

Bitcoin Scalability Concerns

Regarding scalability, Bitcoin faces a problem that many protocol designers only dream of having: too many users want to use it. “On Scaling Decentralized Blockchains (A Position Paper)” explains the issue as follows:

“Today’s representative blockchain such as Bitcoin takes 10 min or longer to confirm transactions, achieves 7 transactions/sec maximum throughput. In comparison, a mainstream payment processor such as Visa credit card confirms a transaction within seconds, and processes 2000 transactions/sec on average, with a peak rate of 56,000 transactions/sec [11]. Clearly, a large gap exists between where Bitcoin is today, and the scalability of a mainstream payment processor.”

Applying Metcalfe’s law to Bitcoin suggests that the more people who use Bitcoin the more useful Bitcoin becomes. Thus scalability both in terms of volume (how many transactions can be confirmed per second) and velocity (how long a transaction takes to be confirmed) are significant limiters of the utility of Bitcoin.

How Does TumbleBit Work?

TumbleBit is implemented as a layer built on top of Bitcoin, and thus is fully compatible with today’s Bitcoin protocol. (This is analogous to how Tor is built as a layer on top of IP.) TumbleBit payments are sent via an intermediary called the Tumbler. In our paper, we describe how the core TumbleBit protocol can be run in in two modes. The first mode is a classic Bitcoin tumbler (also called a mixing service). In this post I will only address the second mode, where TumbleBit is used as an unlinkable payment hub.

A Bitcoin payment hub is a system that allows users to make off-blockchain transactions (via an intermediary, aka the payment hub) with the same level of security as on-blockchain transactions. Users join the payment hub by “opening a payment channel” that escrows bitcoins in an on-blockchain Bitcoin transaction. Next, users send payments to each other off-blockchain. These payments can be high volume and fast, because they are not subject to the time it takes to confirm transactions on Bitcoin’s blockchain. Finally, a user “closes its payment channel” via an on-blockchain transaction that reflects its new balance of bitcoins after all off-blockchain payments have been made.

In order to provide anonymity, TumbleBit requires all its users to open payment channels (in the “Escrow Phase”) and to close their payment channels (in the “Cash-Out Phase”)*. These two TumbleBit phases are relatively slow, because they require each user to post on-blockchain Bitcoin transaction that takes about 10 minutes to be confirmed. However, between these two phases in the “Payment Phase” where users send fast off-blockchain payments. TumbleBit payments can complete in seconds, thus scaling the velocity of Bitcoin payments. Perhaps more importantly, off-blockchain TumbleBit payments do not take up space on Bitcoin’s blockchain. This allows Bitcoin to process many off-blockchain transactions for only two on-blockchain transactions (i.e. the transactions which open and close the payment channel), scaling the maximum transaction volume which Bitcoin can handle.

Payment hubs have a long history in Bitcoin. TumbleBit’s main innovation is to provide unlinkability.

Unlinkability is defined as follows. Informally, unlinkability ensures that no one can tell which payer (Alice) paid which payee (Bob) during the payment phase of TumbleBit. More formally, the amount of bitcoin each user escrowed on the blockchain during the Escrow Phase is public. The amount of bitcoin cashed-out by each user during the “Cash-Out Phase” is also public. Let an interaction graph be any mapping of payments from the payers using TumbleBit (Alice1, …, AliceN) to payees using TumbleBit (Bob1, …, BobM). An interaction graph is compatible if it explains the transfer of funds from TumbleBit’s “Escrow Phase” to TumbleBit’s “Cash-Out Phase”. Unlinkability requires all compatible interaction graphs to be equally likely. TumbleBit payments are unlinkable because the TumbleBit protocol ensures that all compatible interaction graphs are equally likely. The anonymity provided by TumbleBit grows with the number of compatible interaction graphs.

To achieve unlinkability without compromising the security and safety of Bitcoin payments, TumbleBit must also ensure that the hub can’t steal bitcoins from the users or “print money” for itself. To do this, TumbleBit uses two interleaved protocols, the puzzle-promise-protocol and the RSA-puzzle-solver protocol. I give a brief overview of each protocol here. Our paper has the full details.

TumbleBit Walk-Through

TumbleBit replaces on-blockchain payments with off-blockchain puzzle solving, where Alice pays Bob by providing Bob with the solution to a puzzle. The puzzle is generated through interaction between Bob and the Tumbler (the puzzle-promise protocol) solved through an interaction between Alice and the Tumbler (the puzzle-solver protocol). Each time a puzzle is solved, 1 bitcoin is transferred from Alice to the Tumbler and each time Bob learns a solution 1 bitcoin is transferred from the Tumbler to Bob. A TumbleBit puzzle z is simply an RSA evaluation of its solution value ϵ under the RSA public key (pk, N) of the Tumbler, ie

z = ϵᵖᵏ mod N

In the puzzle-promise protocol, Bob is convinced that he has received an valid puzzle $z$, and that the solution to this puzzle will allow Bob to increase the value of his escrow by 1 bitcoin. The puzzle-solver protocol allows the Tumbler to take 1 Bitcoin from Alice’s escrow if and only if the hub reveals the solution to an RSA-puzzle of Alice’s choice. Putting these two protocols together, Alice can buy puzzle solutions from the Tumbler using the puzzle-solver protocol, which is a cryptographically-enforced fair exchange where 1 Bitcoin (from Alice) is exchanged for 1 puzzle solution (from the Tumbler). By purchasing a solution to Bob’s puzzle, Alice in essence pays 1 Bitcoin to Bob via the Tumbler.

Naturally, if Alice gives the Tumbler the exact same puzzle that Bob got from the Tumbler, the Tumbler will know Alice is paying Bob. To solve this problem, we borrow a technique called RSA blinding from Chaum’s original eCash paper ‘Blind Signatures for Untraceable Payments’. Bob blinds the puzzle (by multiplying it with a random value) before sending it to Alice. This way, no one other than Alice and Bob, not even the Tumbler can link Bob’s (unblinded) puzzle to (blinded) puzzle that Alice will pay the Tumbler to solve. This blinding is information theoretically secure. When Bob receives the solution to the blinded puzzle, he can unblind it and learn the solution to the original RSA puzzle which the Tumbler issued to him.

This allows Alice to pay Bob by paying the Tumbler for a solution to a blinded RSA puzzle without revealing to the Tumbler which puzzle is being solved. All the Tumbler observes is Alice buying solutions to random puzzles that the Tumbler never issued. Thus, the Tumbler can not link the solution to any puzzle to an issued puzzle.

The Future of the TumbleBit Project

TumbleBit is a rapidly evolving project. We are currently developing our proof of concept implementation into production ready software, adding increased functionality into our core protocols and exploring new applications.

A major milestone for our protocol was using our proof of concept TumbleBit implementation (available on github) to perform a test of TumbleBit protocol on Bitcoin’s blockchain tumbling 800 Bitcoin addresses to 800 new Bitcoin addresses. Having demonstrated that TumbleBit works at scale on today’s Bitcoin, our next goal is to develop TumbleBit into software the average Bitcoin user can use to protect their privacy when paying with Bitcoin (see our software roadmap on github).

We are also working to improve our protocols and find new uses for them. In the current TumbleBit protocol, all payments must be of the same domination. If this domination is set to 0.01 Bitcoin, and Alice wants to pay Bob 0.05 she needs to make five payments. At the moment we are exploring extensions to our protocols to allow for TumbleBit payments of arbitrary denominations while still providing unlinkability. We are also exploring other applications for TumbleBit, like increasing the privacy of the Lightning Network or performing unlinkable cross-chain trading.


* TumbleBit is currently a unidirectional payment hub: each payment channel can either send funds or receive funds. Users who wish to do both must open two distinct TumbleBit payment channels.

No silver bullet: De-identification still doesn't work

Paul Ohm’s 2009 article Broken Promises of Privacy spurred a debate in legal and policy circles on the appropriate response to computer science research on re-identification techniques. In this debate, the empirical research has often been misunderstood or misrepresented. A new report by Ann Cavoukian and Daniel Castro is full of such inaccuracies, despite its claims of “setting the record straight.”

In a response to this piece, Ed Felten and I point out eight of our most serious points of disagreement with Cavoukian and Castro. The thrust of our arguments is that (i) there is no evidence that de-identification works either in theory or in practice and (ii) attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do. [Read more…]