October 23, 2017

Bitcoin’s history deserves to be better preserved

Much of Bitcoin’s development has happened in the open in a transparent manner through the mailing list and the bitcoin-dev IRC channel. The third-party website BitcoinStats maintains logs of the bitcoin-dev IRC chats. [1] This resource has proved useful is linked to by other sources such as the Bitcoin wiki.

When reading a blog post about the 2013 Bitcoin fork, I noticed something strange about a discussion on BitcoinStats that was linked from it. Digging around, I found that Wayback Machine version of the logs from BitcoinStats are different; the log had been changed at some point. I was curious if only this conversation was truncated, or if other logs had changed.

After scraping the current version of the BitcoinStats website and scraping the Wayback Machine versions, I found that many pages are different from their Wayback Machine version. For example on the log for January 11, 2016 many entries for the user by the username ‘Lightsword’ are now blank. The number and nature of the errors makes it appear there might be a bug in the backend of the BitcoinStats website, rather than a malicious censure of certain conversations. There may not be a complete history of the IRC channels anywhere, as the Wayback Machine also has holes in its coverage.

It is unfortunate that artifacts of Bitcoin’s development history are being lost. There is value in knowing how critical decisions were made in frantic hours of the 2013 fork. An important part of learning from history is having access to historical data. Decisions that shape what Bitcoin is today were originally discussed on IRC, and those decisions will continue to shape Bitcoin. Understanding what went right and what went wrong can inform future technology and community design.

The lesson is that online communities must make deliberate efforts to preserve important digital artifacts. Often this is merely a matter of picking the right technology. If GitHub were to disappear tomorrow, all of Bitcoin’s code history would not be lost thanks to git’s decentralized and distributed nature. All of Bitcoin’s transaction history is likewise famously replicated and resilient to corruption or loss.

Preserving the IRC logs would not be difficult. The community could distribute the logs via BitTorrent, as Wikipedia does with its content. Another option is to use the form the Wayback Machine provides to ensure the archiving of a page (to minimize effort, one could automate the invocation of this functionality). Given how important preserving this data is and how easy it is, it seems worthwhile.

[1] IRC as a whole has a culture of ephemerality, and so Freenode, the server that hosts the bitcoin-dev IRC channel doesn’t provide logs.