January 17, 2017

The AT&T Deal Is About the Data

Most of the mainstream media coverage of the proposed AT&T acquisition of Time Warner has missed an important risk. Much of the discussion has focused on the potential market power the combined entity would have to raise prices, limit choice or otherwise disadvantage consumers.

A primary motivation for the deal, however, as readers of Freedom to Tinker well understand, is the desire to access more and deeper data about consumer behavior. The motivation to combine companies is not monopolistic control, but rather a timely effort to become a player in the lucrative, $77 billion world of targeted digital advertising, now controlled by Google and Facebook.

Some media, especially those covering the FCC and FTC, have begun detailing the data privacy issues raised by the deal. Hopefully, mainstream media will soon follow suit.

Here are some links:

BNA: FCC Privacy Rules Could Hamper AT&T-Time Warner Data Mining

Inside Sources: Mega Mergers Like AT&T-Time Warner are Becoming a Problem for Privacy Regulation

Bloomberg: Privacy Rule Imperils Data Riches as AT&T Pursues Time Warner

Fortune: Media Companies Want U.S. to Force AT&T-Time Warner to Share Customer Data

Are you really anonymous online?

As you browse the internet, online advertisers track nearly every site you visit, amassing a trove of information on your habits and preferences. When you visit a news site, they might see you’re a fan of basketball, opera and mystery novels, and accordingly select ads tailored to your tastes. Advertisers use this information to create highly personalized experiences, but they typically don’t know exactly who you are. They observe only your digital trail, not your identity itself, and so you might feel that you’ve retained a degree of anonymity.

In new work with Ansh Shukla, Sharad Goel and Arvind Narayanan, we show that these anonymous web browsing records can in fact often be tied back to real-world identities. (Check out our demo, and see if we can figure out who you are.)

At a high level, our approach is based on a simple observation. Each person has a highly distinctive social network, comprised of family and friends from school, work, and various stages throughout one’s life. As a consequence, the set of links in your Facebook and Twitter feeds is likewise highly distinctive, and clicking on these links leaves a tell-tale mark in your browsing history.

Given only the set of web pages an individual has visited, we determine which social media feeds are most similar to it, yielding a list of candidate users who likely generated that web browsing history. In this manner, we can tie a person’s real-world identity to the near complete set of links they have visited, including links that were never posted on any social media site. This method requires only that one click on the links appearing in their social media feeds, not that they post any content.

Carrying out this strategy involves two key challenges, one theoretical and one engineering. The theoretical problem is quantifying how similar a specific social media feed is to a given web browsing history. One simple similarity measure is the fraction of links in the browsing history that also appear in the feed. This metric works reasonably well in practice, but it overstates similarity for large feeds, since those simply contain more links. We instead take an alternative approach. We posit a stylized, probabilistic model of web browsing behavior, and then compute the likelihood a user with that social media feed generated the observed browsing history. It turns out that this method is approximately equivalent to scaling the fraction of history links that appear in the feed by the log of the feed size.

The engineering challenge is identifying the most similar feeds in real time. Here we turn to Twitter, since Twitter feeds (in contrast to Facebook) are largely public. However, even though the feeds are public, we cannot simply create a local copy of Twitter against which we can run our queries. Instead we apply a series of heuristics to dramatically reduce the search space. We then combine caching techniques with on-demand network crawls to construct the feeds of the most promising candidates. On this reduced candidate set, we apply our similarity measure to produce the final results. Given a browsing history, we can typically carry out this entire process in under 60 seconds.

Our initial tests indicate that for people who regularly browse Twitter, we can deduce their identity from their web browsing history about 80% of the time. Try out our web application, and let us know if it works on you!

Improving Bitcoin’s Privacy and Scalability with TumbleBit

Last week we unveiled TumbleBit, a new anonymous payments scheme that addresses two major technical challenges faced by Bitcoin today: (1) scaling Bitcoin to meet increasing use, and (2) protecting the privacy of payments made via Bitcoin. Our proof-of-concept source code and a pre-print of the latest version of our paper were both posted online last week. In this post, I’ll walk through the motivation and the design of TumbleBit, and explain how it achieves the following three important features:

  1. TumbleBit works with today’s Bitcoin protocol. No changes to Bitcoin are required.
  2. TumbleBit payments are processed off of the Bitcoin blockchain, helping Bitcoin scale to higher transaction velocity and volume. Like Bitcoin’s on-blockchain transactions, the safety and security of payments sent via TumbleBit do not require trust in any third party. The TumbleBit payment hub can not steal a user’s Bitcoins.
  3. TumbleBit provides anonymity for its off-blockchain payments even against the TumbleBit service. The exact property we guarantee is unlinkability (as explained in detail below).

Bitcoin Anonymity Concerns

Bitcoin is the ultimate transparent currency. Each transaction reveals the value of bitcoin that was transferred, and the addresses (Bitcoin identities) of both the party transferring the payment and the party receiving the payment. Even worse, these transaction details are recorded in an immutable and massively-replicated ledger which can be read by anyone, i.e. Bitcoin’s blockchain. Feeding this concern is a burgeoning blockchain analytics and surveillance industry.

This presents problems for businesses that want to use Bitcoin, but don’t want to open their books to competitors via the blockchain. It also presents problems for regular users. Much has been made of companies using social networking profiles to make employment decisions and a transparent currency would make this even worse. Just as common wisdom suggests users should be empowered to choose appropriate privacy settings on social networks, users also need to tools to that limit the exposure of their private financial and purchasing history on Bitcoin’s blockchain.

Bitcoin Scalability Concerns

Regarding scalability, Bitcoin faces a problem that many protocol designers only dream of having: too many users want to use it. “On Scaling Decentralized Blockchains (A Position Paper)” explains the issue as follows:

“Today’s representative blockchain such as Bitcoin takes 10 min or longer to confirm transactions, achieves 7 transactions/sec maximum throughput. In comparison, a mainstream payment processor such as Visa credit card confirms a transaction within seconds, and processes 2000 transactions/sec on average, with a peak rate of 56,000 transactions/sec [11]. Clearly, a large gap exists between where Bitcoin is today, and the scalability of a mainstream payment processor.”

Applying Metcalfe’s law to Bitcoin suggests that the more people who use Bitcoin the more useful Bitcoin becomes. Thus scalability both in terms of volume (how many transactions can be confirmed per second) and velocity (how long a transaction takes to be confirmed) are significant limiters of the utility of Bitcoin.

How Does TumbleBit Work?

TumbleBit is implemented as a layer built on top of Bitcoin, and thus is fully compatible with today’s Bitcoin protocol. (This is analogous to how Tor is built as a layer on top of IP.) TumbleBit payments are sent via an intermediary called the Tumbler. In our paper, we describe how the core TumbleBit protocol can be run in in two modes. The first mode is a classic Bitcoin tumbler (also called a mixing service). In this post I will only address the second mode, where TumbleBit is used as an unlinkable payment hub.

A Bitcoin payment hub is a system that allows users to make off-blockchain transactions (via an intermediary, aka the payment hub) with the same level of security as on-blockchain transactions. Users join the payment hub by “opening a payment channel” that escrows bitcoins in an on-blockchain Bitcoin transaction. Next, users send payments to each other off-blockchain. These payments can be high volume and fast, because they are not subject to the time it takes to confirm transactions on Bitcoin’s blockchain. Finally, a user “closes its payment channel” via an on-blockchain transaction that reflects its new balance of bitcoins after all off-blockchain payments have been made.

In order to provide anonymity, TumbleBit requires all its users to open payment channels (in the “Escrow Phase”) and to close their payment channels (in the “Cash-Out Phase”)*. These two TumbleBit phases are relatively slow, because they require each user to post on-blockchain Bitcoin transaction that takes about 10 minutes to be confirmed. However, between these two phases in the “Payment Phase” where users send fast off-blockchain payments. TumbleBit payments can complete in seconds, thus scaling the velocity of Bitcoin payments. Perhaps more importantly, off-blockchain TumbleBit payments do not take up space on Bitcoin’s blockchain. This allows Bitcoin to process many off-blockchain transactions for only two on-blockchain transactions (i.e. the transactions which open and close the payment channel), scaling the maximum transaction volume which Bitcoin can handle.

Payment hubs have a long history in Bitcoin. TumbleBit’s main innovation is to provide unlinkability.

Unlinkability is defined as follows. Informally, unlinkability ensures that no one can tell which payer (Alice) paid which payee (Bob) during the payment phase of TumbleBit. More formally, the amount of bitcoin each user escrowed on the blockchain during the Escrow Phase is public. The amount of bitcoin cashed-out by each user during the “Cash-Out Phase” is also public. Let an interaction graph be any mapping of payments from the payers using TumbleBit (Alice1, …, AliceN) to payees using TumbleBit (Bob1, …, BobM). An interaction graph is compatible if it explains the transfer of funds from TumbleBit’s “Escrow Phase” to TumbleBit’s “Cash-Out Phase”. Unlinkability requires all compatible interaction graphs to be equally likely. TumbleBit payments are unlinkable because the TumbleBit protocol ensures that all compatible interaction graphs are equally likely. The anonymity provided by TumbleBit grows with the number of compatible interaction graphs.

To achieve unlinkability without compromising the security and safety of Bitcoin payments, TumbleBit must also ensure that the hub can’t steal bitcoins from the users or “print money” for itself. To do this, TumbleBit uses two interleaved protocols, the puzzle-promise-protocol and the RSA-puzzle-solver protocol. I give a brief overview of each protocol here. Our paper has the full details.

TumbleBit Walk-Through

TumbleBit replaces on-blockchain payments with off-blockchain puzzle solving, where Alice pays Bob by providing Bob with the solution to a puzzle. The puzzle is generated through interaction between Bob and the Tumbler (the puzzle-promise protocol) solved through an interaction between Alice and the Tumbler (the puzzle-solver protocol). Each time a puzzle is solved, 1 bitcoin is transferred from Alice to the Tumbler and each time Bob learns a solution 1 bitcoin is transferred from the Tumbler to Bob. A TumbleBit puzzle z is simply an RSA evaluation of its solution value ϵ under the RSA public key (pk, N) of the Tumbler, ie

z = ϵᵖᵏ mod N

In the puzzle-promise protocol, Bob is convinced that he has received an valid puzzle $z$, and that the solution to this puzzle will allow Bob to increase the value of his escrow by 1 bitcoin. The puzzle-solver protocol allows the Tumbler to take 1 Bitcoin from Alice’s escrow if and only if the hub reveals the solution to an RSA-puzzle of Alice’s choice. Putting these two protocols together, Alice can buy puzzle solutions from the Tumbler using the puzzle-solver protocol, which is a cryptographically-enforced fair exchange where 1 Bitcoin (from Alice) is exchanged for 1 puzzle solution (from the Tumbler). By purchasing a solution to Bob’s puzzle, Alice in essence pays 1 Bitcoin to Bob via the Tumbler.

Naturally, if Alice gives the Tumbler the exact same puzzle that Bob got from the Tumbler, the Tumbler will know Alice is paying Bob. To solve this problem, we borrow a technique called RSA blinding from Chaum’s original eCash paper ‘Blind Signatures for Untraceable Payments’. Bob blinds the puzzle (by multiplying it with a random value) before sending it to Alice. This way, no one other than Alice and Bob, not even the Tumbler can link Bob’s (unblinded) puzzle to (blinded) puzzle that Alice will pay the Tumbler to solve. This blinding is information theoretically secure. When Bob receives the solution to the blinded puzzle, he can unblind it and learn the solution to the original RSA puzzle which the Tumbler issued to him.

This allows Alice to pay Bob by paying the Tumbler for a solution to a blinded RSA puzzle without revealing to the Tumbler which puzzle is being solved. All the Tumbler observes is Alice buying solutions to random puzzles that the Tumbler never issued. Thus, the Tumbler can not link the solution to any puzzle to an issued puzzle.

The Future of the TumbleBit Project

TumbleBit is a rapidly evolving project. We are currently developing our proof of concept implementation into production ready software, adding increased functionality into our core protocols and exploring new applications.

A major milestone for our protocol was using our proof of concept TumbleBit implementation (available on github) to perform a test of TumbleBit protocol on Bitcoin’s blockchain tumbling 800 Bitcoin addresses to 800 new Bitcoin addresses. Having demonstrated that TumbleBit works at scale on today’s Bitcoin, our next goal is to develop TumbleBit into software the average Bitcoin user can use to protect their privacy when paying with Bitcoin (see our software roadmap on github).

We are also working to improve our protocols and find new uses for them. In the current TumbleBit protocol, all payments must be of the same domination. If this domination is set to 0.01 Bitcoin, and Alice wants to pay Bob 0.05 she needs to make five payments. At the moment we are exploring extensions to our protocols to allow for TumbleBit payments of arbitrary denominations while still providing unlinkability. We are also exploring other applications for TumbleBit, like increasing the privacy of the Lightning Network or performing unlinkable cross-chain trading.

* TumbleBit is currently a unidirectional payment hub: each payment channel can either send funds or receive funds. Users who wish to do both must open two distinct TumbleBit payment channels.