January 24, 2017

Bitcoin’s history deserves to be better preserved

Much of Bitcoin’s development has happened in the open in a transparent manner through the mailing list and the bitcoin-dev IRC channel. The third-party website BitcoinStats maintains logs of the bitcoin-dev IRC chats. [1] This resource has proved useful is linked to by other sources such as the Bitcoin wiki.

When reading a blog post about the 2013 Bitcoin fork, I noticed something strange about a discussion on BitcoinStats that was linked from it. Digging around, I found that Wayback Machine version of the logs from BitcoinStats are different; the log had been changed at some point. I was curious if only this conversation was truncated, or if other logs had changed.

After scraping the current version of the BitcoinStats website and scraping the Wayback Machine versions, I found that many pages are different from their Wayback Machine version. For example on the log for January 11, 2016 many entries for the user by the username ‘Lightsword’ are now blank. The number and nature of the errors makes it appear there might be a bug in the backend of the BitcoinStats website, rather than a malicious censure of certain conversations. There may not be a complete history of the IRC channels anywhere, as the Wayback Machine also has holes in its coverage.

It is unfortunate that artifacts of Bitcoin’s development history are being lost. There is value in knowing how critical decisions were made in frantic hours of the 2013 fork. An important part of learning from history is having access to historical data. Decisions that shape what Bitcoin is today were originally discussed on IRC, and those decisions will continue to shape Bitcoin. Understanding what went right and what went wrong can inform future technology and community design.

The lesson is that online communities must make deliberate efforts to preserve important digital artifacts. Often this is merely a matter of picking the right technology. If GitHub were to disappear tomorrow, all of Bitcoin’s code history would not be lost thanks to git’s decentralized and distributed nature. All of Bitcoin’s transaction history is likewise famously replicated and resilient to corruption or loss.

Preserving the IRC logs would not be difficult. The community could distribute the logs via BitTorrent, as Wikipedia does with its content. Another option is to use the form the Wayback Machine provides to ensure the archiving of a page (to minimize effort, one could automate the invocation of this functionality). Given how important preserving this data is and how easy it is, it seems worthwhile.

[1] IRC as a whole has a culture of ephemerality, and so Freenode, the server that hosts the bitcoin-dev IRC channel doesn’t provide logs.

Ancestry.com can use your DNA to target ads

With the reduction in costs of genotyping technology, genetic genealogy has become accessible to more people. Various websites such as Ancestry.com offer genetic genealogy services. Users of these services are mailed an envelope with a DNA collection kit, in which users deposit their saliva. The users then mail their kits back to the service and their samples are processed. The genealogy company will try to match the user’s DNA against other users in its genealogy and genetic database. As these services become more popular, we need more public discourse about the implications of releasing our genetic information to commercial enterprises.

Given that genetic information can be very sensitive, I found that the privacy policy of Ancestry’s DNA services has some surprising disclosures about how they could use your genetic information.

Here are some excerpts with the worrying parts in bold:

Subject to the restrictions described in this Privacy Statement and applicable law, we may use personal information for any reasonable purpose related to the business, including to communicate with you, to provide you information about Ancestry’s and AncestryDNA’s products and services, to respond to your requests, to update our product offerings, to improve the content and User experience on the AncestryDNA Website, to help you and others discover more about your family, to let you know about offers of interest from AncestryDNA or Ancestry, and to prepare and perform demographic, benchmarking, advertising, marketing, and promotional studies.

To distribute advertisements: AncestryDNA strives to show relevant advertisements. To that end, AncestryDNA may use the information you provide to us, as well as any analyses we perform, aggregated demographic information (such as women between the ages of 45-60), anonymized data compared to data from third parties, or the placement of cookies and other tracking technologies… In these ways, AncestryDNA can display relevant ads on the AncestryDNA Website, third party websites, or elsewhere.

The privacy policy gives Ancestry permission to use its users’ genetic information for advertising purposes. When I inquired with Ancestry, they pointed to the following part of their privacy policy:

We do not provide advertisers with access to individual account information. AncestryDNA does not sell, rent or otherwise distribute the personal information you provide us to these advertisers unless you have given us your consent to do so.

However, it is not clear how your personal information can be used to display “relevant ads” unless either Ancestry operates as an ad network itself or Ancestry communicates some personal information to third party advertisers in order to target the ads. Below, I expand on concerns raised by this privacy policy:

Users may “consent” to the use of their genetic data unknowingly. The privacy policy says Ancestry can distribute users’ private information if Ancestry gets permission first. That permission could be granted by a dialog that users click through without much thought. Research has shown that users are already desensitized to privacy and security warnings.

Even if only Ancestry is using the personal information to target ads, the data might accidentally find its way to third parties. Researchers have demonstrated how it can be difficult to avoid information leakage through URLs or cookies or more sophisticated attacks. If Ancestry categorizes its users according to their genetic traits and then stores and transfers these categories in cookies and URL parameters (a common practice for the analogous “behavioral segment” categories used for many targeted ads), then the genetic data can easily leak to third parties.

The genetic data collected by these services may endanger the privacy of users and their families. A genome is not something easily made unlinkable. Only 33 bits of entropy are necessary to uniquely identify a person. The DNA profiles used by law enforcement in the US today take samples from 13 location on the genome, and have about 54 bits of entropy. The test that Ancestry uses samples 700,000 locations on the genome, which will likely have much more than 33 bits of entropy. In fact, I believe this is enough entropy to compromise not only an individual’s privacy, but also the privacy of family members. With the 13 CODIS locations, law enforcement can already do familial searches for close family members. I hope to touch on the familial aspects of DNA privacy at a later date. The compromise of familial privacy is in part what makes collecting and distributing DNA even more sensitive that just collecting an individual’s full name or address.

Genetic data can be used to discriminate against people on the basis of characteristics they cannot control. More than identity, DNA data may allow someone to infer behavior and health attributes. Major concerns about the impact of genetic information on employment and health insurance led Congress to pass the Genetic Information Nondiscrimination Act, which makes it illegal to use genetics to decide hiring or health insurance pricing. However, GINA may not effectively deter people who 1) are not employers or insurers (e.g., landlords discriminating in their choice of tenants, which is prohibited by California state law but not by the federal provisions in GINA); 2) do not believe they will be caught; or 3) are not aware that they are discriminating, as discussed next.

Unintentional discrimination may occur. The big data report from the White House warns that the “increasing use of algorithms to make eligibility decisions must be carefully monitored for potential discriminatory outcomes for disadvantaged groups, even absent discriminatory intent.” An algorithm that takes genetic information as an input likely will lead to results that differ based on genes. This outcome already discriminates on the basis of genetics, and because genes are correlated with other sensitive attributes, it can also discriminate on the basis of characteristics such as race or health status. The discrimination occurs whether or not the algorithm’s user intended it.

Does cloud mining make sense?

[Paul Ellenbogen is a second year Ph.D. student at Princeton who’s been looking into the economics and game theory of Bitcoin, among other topics. He’s a coauthor of our recent paper on Namecoin and namespaces. — Arvind Narayanan]

Currently, if I wanted to mine Bitcoin I would need to buy specialized hardware, called application-specific integrated circuits (ASICs). I would need to find space for my hardware, which could take up a considerable amount of space. I might need to install a new cooling system into the facility to dissipate the considerable amounts of heat generated by the hardware.

Or I could buy a cloud mining contract. Cloud mining companies bill themselves as companies that take care of all of the gritty details and allow the consumer to directly buy hash power with dollars. Most cloud mining companies offer contracts for varying term lengths, going anywhere from on the order of weeks to perpetuity. For example, I could pay $300, and receive one terrahash per second for the next year. As soon as the cloud hashing provider receives my money, they start up a miner, or allocate me existing cycles, and I should start earning bitcoins in short order. Sounds easy right?

Cloud mining has a bad track record. Many cloud mining services have closed up shop and run off with customer money. Examples include PBmining, lunaminer, and cloudminr.io. Gavin Andresen, a Bitcoin Core developer, once speculated that cloud mining doesn’t make any sense and that most of these services will end up as scams.

Cloud mining has been a popular front for Ponzi schemes, investment frauds where old customers or investors are paid with the money of new customers. In the case of cloud mining Ponzi schemes, bitcoins to pay old contracts are furnished from the payment of new customers. Ponzi schemes tend to collapse when the flow of new customers dries up, or when a large number of customers try to cash out. Cloud mining is a particularly appealing target for Ponzi schemes because the second failure case, cashing out, is not an option for those holding mining contracts. The contracts stipulate a return of bitcoins determined by hash rate. This means Ponzi scheme operators only need to keep recruiting new users for as long as possible. Bitcointalk user Puppet points out a set of 7 useful criteria for spotting cloud mining scams. Out of the 42 operations puppet examines, they identify 30 operations as scams, 14 of which have already ceased operation.

Yet cloud mining persists. That so many cloud mining operations end up being scams may appeal to our basic business intuition. Compare a cloud miner to a traditional bitcoin miner. A traditional bitcoin miner mines bitcoins and sells them on the exchange at their current market rate. It seems that the only way for a cloud miner to do better than a traditional bitcoin miner selling bitcoins at market price is at the expense of the cloud mining customer. It appears there is no way for both cloud miner and their customer to walk away better off.

Yet cloud mining and at least some interest in cloud mining persists. I would like to offer some possible scenarios where cloud mining may deliver the hashes that customers order.

Hired guns? Papers that propose attacks against bitcoin often pose “An attacker with X% of the hash power could do Y.” For example, in selfish mining, as first described by Eyal et al, with 33% of the mining power an attacker could force the rest of the network to mine on top of their blocks. Cloud miners could be used for block withholding attacks too. An important feature of many of these attacks is that the mining power need not be used all the time. These attacks would require flexibility in the mining software the attackers are using, as most off the shelf mining software (thankfully) does not have these attacks built in. Most cloud mining set ups I have looked at don’t allow for enough flexibility to launch attacks, nor are the contract periods on most services short enough. Cloud mining customers typically have a simple web interface, and in the best case are able to chose which pools they join, but they do not have any sort of scriptable direct interface to the mining hardware. At the moment, cloud miners are probably not supporting themselves by executing attacks for others.

Regulatory loophole? Individuals may try to use cloud mining to circumvent Bitcoin regulations, such as know-your-customer. If I want to turn my dollars into bitcoins, I can buy bitcoins at an exchange, but that exchange would have to know my true identity in order to comply with regulations. Unscrupulous individuals may not want to link their identity and cash flow reported to the government. Cloud mining operators and unscrupulous customers may try to skirt these regulations by claiming cloud mining operations are not exchanges or banks, rather they merely rent computer hardware like any cloud computing provider, meaning they do not need to comply with banking regulation. It is unlikely this would be viable long term, or even short term, as regulators would become wise to these sorts of regulatory loopholes and close. This paragraph is the most speculative on my part, as I am neither a regulator nor a lawyer, so I don’t have expertise to draw on from either of those fields.

Financial instrument? Currently most bitcoin miners take on two roles, managing the mining hardware and managing the financial risk involved in mining. A more compelling justification for cloud miners existence is that cloud mining contracts allow a cloud mining provider to avoid volatility in the exchange rate of bitcoin and the variability in the hash rate. Cloud mining is a means of hedging risk. If cloud miners can enter contracts to provide a certain hash rate to a customer for a length of time, the cloud miner does not need to concern themselves with the exchange rate nor hash rate once the contract begins. It then becomes the job of the customer contracting the cloud miner to manage the risk presented by volatility in the exchange rate. This would allow the cloud miner to specialize in buying, configuring, and maintaining mining hardware, and other individuals to specialize in managing risk related to bitcoin. As the financial instruments surrounding cryptocurrencies become more sophisticated, a terrahash could become another just another cryptocurrency security that is traded.


Acknowledgment: I would like to thank Joseph Bonneau for the contribution of “cloud mining as a means of managing risk” concept.