October 3, 2022

BlockSci: a platform for blockchain science and exploration

The Bitcoin blockchain — currently 140GB and growing — contains a massive amount of data that can give us insights into the Bitcoin ecosystem, including how users, businesses, and miners operate. Today we’re announcing BlockSci, an open-source software tool that enables fast and expressive analysis of Bitcoin’s and many other blockchains, and an accompanying working paper that explains its design and applications. Our Jupyter notebook demonstrates some of BlockSci’s capabilities.

Current tools for blockchain analysis depend on general-purpose databases that have full support for transactions. But that’s unnecessary for blockchain analysis where the data structures are append-only. We take advantage of this observation in the design of our custom in-memory blockchain database as well as an analysis library.

BlockSci’s core infrastructure is written in C++ and optimized for speed. (For example, traversing every transaction input and output on the Bitcoin blockchain takes only 10.3 seconds on our r4.2xlarge EC2 machine.) To make analysis more convenient, we provide Python bindings and a Jupyter notebook interface. This interface is slower, but is ideal for exploratory analyses and allows users to quickly iterate when developing new queries.

The code below shows the convenience of traversing the blockchain using straightforward Python idioms, built-in currency conversion using historical exchange-rate data, and the use of pandas DataFrames for analysis and visualization..

fees = [sum(block.fees) for block in chain.range('2017')]
times = [block.time for block in chain.range('2017')]
converter = blocksci.CurrencyConverter()
df = pandas.DataFrame({"Fee":fees}, index=times)
df = converter.satoshi_to_currency_df(df, chain)

When plotted, it results in the following graph showing the average transaction fee per block:

BlockSci uses a custom data format; it comes with a parser that generates this data from the serialized blockchain format recorded by cryptocurrency nodes such as bitcoind. The parser supports incremental updates when new blocks are received, and making it easy to stay up to date with the latest version of the blockchain. We’ve used BlockSci to analyze Bitcoin, Bitcoin Cash, Litecoin, Namecoin, Dash, and ZCash; many other cryptocurrencies make no changes to the blockchain format, and so should be supported with no changes to BlockSci.

In our working paper, we present four analyses that show BlockSci’s usefulness for answering research questions. We show how multisignatures unfortunately weaken privacy and confidentiality; we apply the cluster intersection attack to Dash, a privacy-focused altcoin; we analyze inefficiencies in the usage of block space; and we present improved methods for estimating of how often coins change possession as opposed to just being shuffled around.

Here’s an illustrative example. Exploratory graph analysis using BlockSci allowed us to discover a behavioral pattern in the usage of multisignatures that weakens security. Multisignatures are a security-enhancing mechanism that distribute control of an address over a number of different public keys. Surprisingly, we found that users often negate this security by moving their funds from a multisig address to a regular address and then back again after a period of a few hours to days. We think this happens when users are changing the access control policy on their wallet, although it is unclear why they transfer their funds to a regular address in the interim, and not directly to the new multisig address. This pattern of behavior has led over $12 million dollars to be left insecure over the course of  over 22,000 transactions. What users may not appreciate is that the temporary weakening of security is advertised to potential attackers on the blockchain.

There’s far more to explore on public blockchains. BlockSci is publicly available now, and we hope you’ll find it useful. It is easy to get started using the EC2 image we’ve released, which includes the Bitcoin blockchain data in addition to the tool. BlockSci is open-source, and we welcome contributions. This is an alpha release; we’re continuing to improve it and the interface may change a bit in future releases. We look forward to working with the community and to hearing about other creative uses of the data and the tool.

When the cookie meets the blockchain

Cryptocurrencies are portrayed as a more anonymous and less traceable method of payment than credit cards. So if you shop online and pay with Bitcoin or another cryptocurrency, how much privacy do you have? In a new paper, we show just how little.

Websites including shopping sites typically have dozens of third-party trackers per site. These third parties track sensitive details of payment flows, such as the items you add to your shopping cart, and their prices, regardless of how you choose to pay. Crucially, we find that many shopping sites leak enough information about your purchase to trackers that they can link it uniquely to the payment transaction on the blockchain. From there, there are well-known ways to further link that transaction to the rest of your Bitcoin wallet addresses. You can protect yourself by using browser extensions such as Adblock Plus and uBlock Origin, and by using Bitcoin anonymity techniques like CoinJoin. These measures help, but we find that linkages are still possible.

 

An illustration of the full scope of our attack. Consider three websites that happen to have the same embedded tracker. Alice makes purchases and pays with Bitcoin on the first two sites, and logs in on the third. Merchant A leaks a QR code of the transaction’s Bitcoin address to the tracker, merchant B leaks a purchase amount, and merchant C leaks Alice’s PII. Such leaks are commonplace today, and usually intentional. The tracker links these three purchases based on Alice’s browser cookie. Further, the tracker obtains enough information to uniquely (or near-uniquely) identify coins on the Bitcoin blockchain that correspond to the two purchases. However, Alice took the precaution of putting her bitcoins through CoinJoin before making purchases. Thus, either transaction individually could not have been traced back to Alice’s wallet, but there is only one wallet that participated in both CoinJoins, and is hence revealed to be Alice’s.

 

Using the privacy measurement tool OpenWPM, we analyzed 130 e-commerce sites that accept Bitcoin payments, and found that 53 of these sites leak transaction details to trackers. Many, but not all, of these leaks are by design, to enable advertising and analytics. Further, 49 sites leak personal identifiers to trackers: names, emails, usernames, and so on. This combination means that trackers can link real-world identities to Bitcoin addresses. To be clear, all of this leaked data is sitting in the logs of dozens of tracking companies, and the linkages can be done retroactively using past purchase data.

On a subset of these sites, we made real purchases using bitcoins that we first “mixed” using the CoinJoin anonymity technique.[1] We found that a tracker that observed two of our purchases — a common occurrence — would be able to identify our Bitcoin wallet 80% of the time. In our paper, we present the full details of our attack as well as a thorough analysis of its effectiveness.

Our findings are a reminder that systems without provable privacy properties may have unexpected information leaks and lurking privacy breaches. When multiple such systems interact, the leaks can be even more subtle. Anonymity in cryptocurrencies seems especially tricky, because it inherits the worst of both data anonymization (sensitive data must be publicly and permanently stored on the blockchain) and anonymous communication (privacy depends on subtle interactions arising from the behavior of users and applications).

[1] In this experiment we used 1–2 rounds of mixing. We provide evidence in the paper that while a higher mixing depth decreases the effectiveness of the attack, it doesn’t defeat it. There’s room for a more careful study of the tradeoffs here.

Breaking, fixing, and extending zero-knowledge contingent payments

The problem of fair exchange arises often in business transactions — especially when those transactions are conducted anonymously over the internet. Alice would like to buy a widget from Bob, but there’s a circular problem: Alice refuses to pay Bob until she receives the widget whereas Bob refuses to send Alice the widget until he receives payment.

In a previous post, we described the fair-exchange problem and solutions for buying physical goods using Bitcoin. Today is the first of two posts in which we focus on purchasing digital goods and services. In a new paper together with researchers from City University of New York and IMDEA Software Institute, Madrid, we show that Zero-knowledge contingent payments (ZKCP), a well known protocol for the fair exchange of digital goods over the blockchain is insecure in its current form. We show how to fix ZKCP, and also extend it to a new class of problems. [Read more…]

How to buy physical goods using Bitcoin with improved security and privacy

Bitcoin has found success as a decentralized digital currency, but it is only one step toward decentralized digital commerce. Indeed, creating decentralized marketplaces and mechanisms is a nascent and active area of research. In a new paper, we present escrow protocols for cryptocurrencies that bring us closer to decentralized commerce.

In any online sale of physical goods, there is a circular dependency: the buyer only wants to pay once he receives his goods, but the seller only wants to ship them once she’s received payment. This is a problem regardless of whether one pays with bitcoins or with dollars, and the usual solution is to utilize a trusted third party. Credit card companies play this role, as do platforms such as Amazon and eBay. Crucially, the third party must be able to mediate in case of a dispute and determine whether the seller gets paid or the buyer receives a refund.

A key requirement for successful decentralized marketplaces is to weaken the role of such intermediaries, both because they are natural points of centralization and because unregulated intermediaries have tended to prove untrustworthy. In the infamous Silk Road marketplace, buyers would send payment to Silk Road, which would hold it in escrow. Note that escrow is necessary because it is not possible to reverse cryptocurrency transactions, unlike credit card payments. If all went well, Silk Road would forward the money to the seller; otherwise, it would mediate the dispute. Time and time again, the operators of these marketplaces have absconded with the funds in escrow, underscoring that this isn’t a secure model.

Lately, there have been various services that offer a more secure version of escrow payment. Using 2-of-3 multisignature transactions, the buyer, seller, and a trusted third party each hold one key. The buyer pays into a multisignature address that requires that any two of these three keys sign in order for the money to be spent. If the buyer and seller are in agreement, they can jointly issue payment. If there’s a dispute, the third party mediates. The third party and the winner of the dispute will then use their respective keys to issue a payout transaction to the winner.

This escrow protocol has two nice features. First, if there’s no dispute, the buyer and seller can settle without involving the third party. Second, the third party cannot run away with the money as it only holds one key, while two are necessary spend the escrowed funds.

Until now, the escrow conversation has generally stopped here. But in our paper we ask several further important questions. To start, there are privacy concerns. Unless the escrow protocol is carefully designed, anyone observing the blockchain might be able to spot escrow transactions. They might even be able to tell which transactions were disputed, and connect those to specific buyers and sellers.

In a previous paper, we showed that using multisignatures to split control over a wallet leads to major privacy leaks, and we advocated using threshold signatures instead of multisignatures. It turns out that using multisignatures for escrow has similar negative privacy implications. While using 2-of-3 threshold signatures instead of multisignatures would solve the privacy problem, it would introduce other undesirable features in the context of escrow as we explain in the paper.

Moreover, the naive escrow protocol above has a gaping security flaw: even though the third party cannot steal the money, it can refuse to mediate any disputes and thus keep the money locked up.

In addition to these privacy and security requirements, we study group escrow. In such a system, the transacting parties may choose multiple third parties from among a set of escrow service providers and have them mediate disputes by majority vote. Again, we analyze both the privacy and the security of the resulting schemes, as well as the details of group formation and communication.

Our goal in this paper is not to provide a definitive set of requirements for escrow services. We spoke with many Bitcoin escrow companies in the course of our research — it’s a surprisingly active space — and realized that there is no single set of properties that works for every use-case. For example, we’ve looked at privacy as a desirable property so far, but buyers may instead want to be able to examine the blockchain and identify how often a given seller was involved in disputes. In our paper, we present a toolbox of escrow protocols as well as a framework for evaluating them, so that anyone can choose the protocol that best fits their needs and be fully aware of the security and privacy implications of that choice.

We’ll present the paper at the Financial Cryptography conference in two weeks.

Bitcoin is unstable without the block reward

With Miles Carlsten, Harry Kalodner, and Matt Weinberg, I have a new paper titled On the instability of Bitcoin without the block reward, which Harry will present at ACM CCS next week. The paper predicts that miner incentives will start to go haywire as Bitcoin rewards shift from block rewards to transaction fees, based on theoretical results that closely match up with findings from our new Bitcoin mining simulator.

Bitcoin provides two incentives for miners: block rewards and transaction fees. Currently the vast majority of miner revenues come from block rewards, but in the long run they will come primarily from transaction fees as block rewards dwindle. This design decision has been discussed a lot, but in terms of monetary policy and hardly ever in terms of security. There has been an implicit belief that the transition to transaction fees will not affect the security and stability of the block chain, and in particular that it is immaterial whether miners receive (say) 25 bitcoins as a fixed reward or 25 bitcoins in expectation via transaction fees.

We reexamine this assumption in our paper, and our findings make disturbing news for the future security of Bitcoin and many other cryptocurrencies. Our key insight is that with only transaction fees, the variance of the miner reward is very high due to the randomness of the block arrival time, and it becomes attractive to fork a “wealthy” block to “steal” the rewards therein. [1]

keyinsight2


The figure shows a scenario where forking might be more profitable than extending the longest chain. See the paper for a full explanation.

Here’s how things could go wrong. Due to the possibility of profitable forking, the default strategy is no longer best; we lay out a menagerie of interesting and bizarre strategies in the paper. The most worrisome is “undercutting,” where miners capture as little of the available transaction fees as they can get away with, leaving the rest in the pool as an incentive for the next miner to extend their block rather than a competing block.

We also show rigorously that selfish mining gets worse when block rewards are replaced by transaction fees, motivated by the following intuition: if you happen to mine a new block just seconds after the last one was found, you gain nothing by publishing, so you might as well keep it for selfish mining in case you get lucky. The variance in transaction fees enables strategies like this that simply don’t make sense when the block reward is fixed.

If miners switch to these deviant strategies, the blockchain will be much less secure because of the mining power wasted due to constant forking, undercutting, and withholding of found blocks.

We derive most of our results in two separate ways: analytically, i.e., using game theory, and with a new mining simulator that we created. This gives us added confidence in our findings. For example, in one setting, the theory predicts a rather grotesque equilibrium involving on the Lambert W function, with the proof running to several pages. Sure enough, in our simulations of the same setting, the Lambert miner does best. We hope that our analytical techniques as well as our simulator will be useful to other researchers. We have made the simulator code open-source.

What is the impact of our findings? The Bitcoin community will probably need to respond to this problem in the long run, potentially via a fork, to discourage deviant strategies. We aren’t predicting that deviant strategies will arise in the short term, and there is a long runway for mitigation steps to be rolled out. The fact that blocks have filled up due to their 1MB limit decreases the variance of transaction fees between different blocks, and this mitigates the problem somewhat, although it is far from a complete and satisfactory solution. For example, at the time of writing our paper, the previous 1000 blocks included per-block transaction fees ranging from 0.03 BTC to 4.51, with a mean of 0.49 and standard deviation of 0.25 (over half the mean!). So simply maintaining the block-size limit probably won’t resolve the underlying issues.

At a deeper level, our results suggest a fundamental rethinking of the role of block rewards in cryptocurrency design. The prevailing view is that the block reward is a necessary but temporary evil to achieve an initial allocation of coins in the absence of a central authority. The transaction-fee regime is seen as the ideal steady state of the system. But our work shows that incentivizing compliant miner behavior in the transaction fee regime is a significantly more daunting task than in the block reward regime. So perhaps designers of new cryptocurrencies should make the block reward permanent and accept monetary inflation as inevitable. Transaction fees would still exist, but merely as an incentive for miners to include transactions in their blocks.

One final point: there is a science of designing economic incentives so that rational players will behave in a desired way, and it’s called mechanism design. Creators of cryptocurrencies (as well as creators of applications such as the DAO) are essentially doing mechanism design. But mechanism design is hard, and our paper is the latest among many to point out that the mechanisms embedded in cryptocurrencies have flaws. Yet, sadly, the cryptocurrency community is currently disjoint from the mechanism design community. That is why I’m thrilled that mechanism design expert Matt Weinberg, who’s behind all the sophisticated theory in our paper, is joining Princeton’s faculty next semester. Expect more research from us on the mechanism design of cryptocurrencies!

[1] The problems uncover arise not because transaction fees may arrive erratically, but because blocks inevitably arrive unpredictably.  We model transaction fees as arriving at a uniform rate. The rate is non-uniform in practice, which is an additional complication. This is a theme throughout our paper: we show that undesirable behaviors will arise even in simplified, “clean” models. This is bad news both because we think things will probably be worse in practice and because we want cryptocurrency mining games to be analytically tractable. Our work shows that in a transaction-fee regime, predicting behavior will be fiendishly complex.

Update: see also Bryan Ford’s response to this post (and paper).