August 22, 2017

When the cookie meets the blockchain

Cryptocurrencies are portrayed as a more anonymous and less traceable method of payment than credit cards. So if you shop online and pay with Bitcoin or another cryptocurrency, how much privacy do you have? In a new paper, we show just how little.

Websites including shopping sites typically have dozens of third-party trackers per site. These third parties track sensitive details of payment flows, such as the items you add to your shopping cart, and their prices, regardless of how you choose to pay. Crucially, we find that many shopping sites leak enough information about your purchase to trackers that they can link it uniquely to the payment transaction on the blockchain. From there, there are well-known ways to further link that transaction to the rest of your Bitcoin wallet addresses. You can protect yourself by using browser extensions such as Adblock Plus and uBlock Origin, and by using Bitcoin anonymity techniques like CoinJoin. These measures help, but we find that linkages are still possible.

 

An illustration of the full scope of our attack. Consider three websites that happen to have the same embedded tracker. Alice makes purchases and pays with Bitcoin on the first two sites, and logs in on the third. Merchant A leaks a QR code of the transaction’s Bitcoin address to the tracker, merchant B leaks a purchase amount, and merchant C leaks Alice’s PII. Such leaks are commonplace today, and usually intentional. The tracker links these three purchases based on Alice’s browser cookie. Further, the tracker obtains enough information to uniquely (or near-uniquely) identify coins on the Bitcoin blockchain that correspond to the two purchases. However, Alice took the precaution of putting her bitcoins through CoinJoin before making purchases. Thus, either transaction individually could not have been traced back to Alice’s wallet, but there is only one wallet that participated in both CoinJoins, and is hence revealed to be Alice’s.

 

Using the privacy measurement tool OpenWPM, we analyzed 130 e-commerce sites that accept Bitcoin payments, and found that 53 of these sites leak transaction details to trackers. Many, but not all, of these leaks are by design, to enable advertising and analytics. Further, 49 sites leak personal identifiers to trackers: names, emails, usernames, and so on. This combination means that trackers can link real-world identities to Bitcoin addresses. To be clear, all of this leaked data is sitting in the logs of dozens of tracking companies, and the linkages can be done retroactively using past purchase data.

On a subset of these sites, we made real purchases using bitcoins that we first “mixed” using the CoinJoin anonymity technique.[1] We found that a tracker that observed two of our purchases — a common occurrence — would be able to identify our Bitcoin wallet 80% of the time. In our paper, we present the full details of our attack as well as a thorough analysis of its effectiveness.

Our findings are a reminder that systems without provable privacy properties may have unexpected information leaks and lurking privacy breaches. When multiple such systems interact, the leaks can be even more subtle. Anonymity in cryptocurrencies seems especially tricky, because it inherits the worst of both data anonymization (sensitive data must be publicly and permanently stored on the blockchain) and anonymous communication (privacy depends on subtle interactions arising from the behavior of users and applications).

[1] In this experiment we used 1–2 rounds of mixing. We provide evidence in the paper that while a higher mixing depth decreases the effectiveness of the attack, it doesn’t defeat it. There’s room for a more careful study of the tradeoffs here.

Web Census Notebook: A new tool for studying web privacy

As part of the Web Transparency and Accountability Project, we’ve been visiting the web’s top 1 million sites every month using our open-source privacy measurement tool OpenWPM. This has led to numerous worrying findings such as the systematic abuse of newly introduced web features for fingerprinting, leading to better privacy tools and occasionally strong responses from browser vendors.

Enabling research is great — OpenWPM has led to 14 papers so far — but research is slow and requires expertise. To make our work more directly useful, today we’re announcing a new tool to study web privacy: a Jupyter notebook interface and a set of libraries to quickly answer most questions about web tracking by querying the the 500 GB of data we collect every month.

Jupyter notebook is an intuitive tool for data analysis using Python, and it’s what we use here internally for much of our own research. Notebooks are accessible with a simple web interface; yet the code, data, and memory persists on the server if you close the browser and return to it later (even from a different device). Notebooks combine code with visualizations, making them ideal for data exploration and analysis.

Who could benefit from this tool? We envision uses such as these:

  • Publishers could use our data to understand third-party tracking on their own websites.
  • Journalists could use our data to investigate and expose privacy-infringing practices.
  • Regulators and enforcement agencies could use our tool in investigations.
  • Creators of browser privacy tools could use our data to test their effectiveness.

Let’s look at an example that shows the feel of the interface. The code below computes the average number of embedded trackers on the top 100 websites in various categories such as “news” and “shopping”. It is intuitive and succinct. Without our interface, not only would the SQL version of this query be much more cumbersome, but it would require a ton of legwork and setup to even get to a point where you can write the query. Now you just need to point your browser at our notebook.

    for category, domains in census.first_parties.alexa_categories.items():
        avg = sum(1 for first_party in domains[:100]
                    for third_party in first_party.third_party_resources
                    if third_party.is_tracker) / 100
        print("Average number of trackers on %s sites: %.1f" % (category, avg))

The results confirm our finding that news sites have the most trackers, and adult sites the least. [1]

Here’s what happens behind the scenes:

  • census is a Python object that exposes all the relationships between websites and third parties as object attributes, hiding the messy details of the underlying database schema. Each first party is represented by a FirstParty object that gives access to each third-party resource (URI object) on the first party, and the ThirdParty that the URI belongs to. When the objects are accessed, they are instantiated automatically by querying the database.
  • census.first_parties is a container of FirstParty objects ordered by Alexa traffic rank, so you can easily analyze the top sites, or sites in the long tail, or specific sites. You can also easily slice the sites by category: in the example above, we iterate through each category of census.first_parties.alexa_categories.
  • There’s a fair bit of logic that goes into analyzing the crawl data which third parties are embedded on which websites, and cross-referencing that with tracking-protection lists to figure out which of those are trackers. This work is already done for you, and exposed via attributes such as ThirdParty.is_tracker.

Since the notebooks run on our server, we expect to be able to support only a limited number (a few dozen) at this point, so you need to apply for access. The tool is currently in beta as we smooth out rough edges and add features, but it is usable and useful. Of course, you’re welcome to run the notebook on your own server — the underlying crawl datasets are public, and we’ll release the code behind the notebooks soon. We hope you find this of use to you, and we welcome your feedback.

 

[1] The linked graph from our paper measures the number of distinct domains whereas the query above counts every instance of every tracker. The trends are the same in both cases, but the numbers are different. Here’s the output of the query:

 

Average number of third party trackers on computers sites: 41.0
Average number of third party trackers on regional sites: 68.8
Average number of third party trackers on recreation sites: 58.2
Average number of third party trackers on health sites: 38.4
Average number of third party trackers on news sites: 151.2
Average number of third party trackers on business sites: 55.0
Average number of third party trackers on kids_and_teens sites: 74.8
Average number of third party trackers on home sites: 94.5
Average number of third party trackers on arts sites: 108.6
Average number of third party trackers on sports sites: 86.6
Average number of third party trackers on reference sites: 43.8
Average number of third party trackers on science sites: 43.1
Average number of third party trackers on society sites: 73.5
Average number of third party trackers on shopping sites: 53.1
Average number of third party trackers on adult sites: 16.8
Average number of third party trackers on games sites: 70.5

How to buy physical goods using Bitcoin with improved security and privacy

Bitcoin has found success as a decentralized digital currency, but it is only one step toward decentralized digital commerce. Indeed, creating decentralized marketplaces and mechanisms is a nascent and active area of research. In a new paper, we present escrow protocols for cryptocurrencies that bring us closer to decentralized commerce.

In any online sale of physical goods, there is a circular dependency: the buyer only wants to pay once he receives his goods, but the seller only wants to ship them once she’s received payment. This is a problem regardless of whether one pays with bitcoins or with dollars, and the usual solution is to utilize a trusted third party. Credit card companies play this role, as do platforms such as Amazon and eBay. Crucially, the third party must be able to mediate in case of a dispute and determine whether the seller gets paid or the buyer receives a refund.

A key requirement for successful decentralized marketplaces is to weaken the role of such intermediaries, both because they are natural points of centralization and because unregulated intermediaries have tended to prove untrustworthy. In the infamous Silk Road marketplace, buyers would send payment to Silk Road, which would hold it in escrow. Note that escrow is necessary because it is not possible to reverse cryptocurrency transactions, unlike credit card payments. If all went well, Silk Road would forward the money to the seller; otherwise, it would mediate the dispute. Time and time again, the operators of these marketplaces have absconded with the funds in escrow, underscoring that this isn’t a secure model.

Lately, there have been various services that offer a more secure version of escrow payment. Using 2-of-3 multisignature transactions, the buyer, seller, and a trusted third party each hold one key. The buyer pays into a multisignature address that requires that any two of these three keys sign in order for the money to be spent. If the buyer and seller are in agreement, they can jointly issue payment. If there’s a dispute, the third party mediates. The third party and the winner of the dispute will then use their respective keys to issue a payout transaction to the winner.

This escrow protocol has two nice features. First, if there’s no dispute, the buyer and seller can settle without involving the third party. Second, the third party cannot run away with the money as it only holds one key, while two are necessary spend the escrowed funds.

Until now, the escrow conversation has generally stopped here. But in our paper we ask several further important questions. To start, there are privacy concerns. Unless the escrow protocol is carefully designed, anyone observing the blockchain might be able to spot escrow transactions. They might even be able to tell which transactions were disputed, and connect those to specific buyers and sellers.

In a previous paper, we showed that using multisignatures to split control over a wallet leads to major privacy leaks, and we advocated using threshold signatures instead of multisignatures. It turns out that using multisignatures for escrow has similar negative privacy implications. While using 2-of-3 threshold signatures instead of multisignatures would solve the privacy problem, it would introduce other undesirable features in the context of escrow as we explain in the paper.

Moreover, the naive escrow protocol above has a gaping security flaw: even though the third party cannot steal the money, it can refuse to mediate any disputes and thus keep the money locked up.

In addition to these privacy and security requirements, we study group escrow. In such a system, the transacting parties may choose multiple third parties from among a set of escrow service providers and have them mediate disputes by majority vote. Again, we analyze both the privacy and the security of the resulting schemes, as well as the details of group formation and communication.

Our goal in this paper is not to provide a definitive set of requirements for escrow services. We spoke with many Bitcoin escrow companies in the course of our research — it’s a surprisingly active space — and realized that there is no single set of properties that works for every use-case. For example, we’ve looked at privacy as a desirable property so far, but buyers may instead want to be able to examine the blockchain and identify how often a given seller was involved in disputes. In our paper, we present a toolbox of escrow protocols as well as a framework for evaluating them, so that anyone can choose the protocol that best fits their needs and be fully aware of the security and privacy implications of that choice.

We’ll present the paper at the Financial Cryptography conference in two weeks.