July 18, 2019

What’s new with BlockSci, Princeton’s blockchain analysis tool

Six months ago we released the initial version of BlockSci, a fast and expressive tool to analyze public blockchains. In the accompanying paper we explained how we used it to answer scientific questions about security, privacy, miner behavior, and economics using blockchain data. BlockSci has a number of other applications including forensics and as an educational tool.

Since then we’ve heard from a number of researchers and developers who’ve found it useful, and there’s already a published paper on ransomware that has made use of it. We’re grateful for the pull requests and bug reports on GitHub from the community. We’ve also used it to deep-dive into some of the strange corners of blockchain data. We’ve made enhancements including a 5x speed improvement over the initial version (which was already several hundred times faster than previous tools).

Today we’re happy to announce BlockSci 0.4.5, which has a large number of feature enhancements and bug fixes. As just one example, Bitcoin’s SegWit update introduces the concept of addresses that have different representations but are equivalent; tools such as blockchain.info are confused by this and return incorrect (or at least unexpected) values for the balance held by such addresses. BlockSci handles these nuances correctly. We think BlockSci is now ready for serious use, although it is still beta software. Here are a number of ideas on how you can use it in your projects or contribute to its development.

We plan to release talks and tutorials on BlockSci, and improve its documentation. I’ll give a brief talk about it at the MIT Bitcoin Expo this Saturday; then Harry Kalodner and Malte Möser will join me for a BlockSci tutorial/workshop at MIT on Monday, March 19, organized by the Digital Currency Initiative and Fidelity Labs. Videos of both events will be available.

We now have two priorities for the development of BlockSci. The first is to make it possible to implement almost all analyses in Python with the speed of C++. To enable this we are building a function composition interface to automatically translate Python to C++. The second is to better support graph queries and improved clustering of the transaction graph. We’ve teamed up with our colleagues in the theoretical computer science group to adapt sophisticated graph clustering algorithms to blockchain data. If this effort succeeds, it will be a foundational part of how we understand blockchains, just as PageRank is a fundamental part of how we understand the structure of the web. Stay tuned!