November 25, 2015


Provisions: how Bitcoin exchanges can prove their solvency

Millions of Bitcoin users store their bitcoins with online exchanges (e.g. Coinbase, Kraken) which store bitcoins on their customers’ behalf. They present an interface that looks somewhat like an online bank, allowing users to log in and request payments to other users or withdrawals. For many users this approach makes a lot more sense than the traditional approach of storing private keys on your laptop or phone and interacting with the Bitcoin network directly. Online exchanges require no software installation, enable a familiar password-based authentication model, and can guard against the risk of losing funds with a stolen laptop. Online exchanges can also improve the scalability and efficiency of Bitcoin by settling many logical transactions between users without actually moving funds on the block chain.

Of course, users must trust these exchanges not to get hacked or simply abscond with their money, both of which happened frequently in the early days of Bitcoin (nearly half of exchanges studied in a 2013 research paper failed). Famously, Mt. Gox was the largest online exchange until 2014 when it lost most of its customers’ funds under murky circumstances.

It has long been a goal of the Bitcoin community for exchanges to be able to cryptographically prove solvency—that is, to prove that they still control enough bitcoins to cover all of their customers’ accounts. Greg Maxwell first proposed an approach using Merkle trees in 2013, but this requires revealing (at a minimum) the total value of the exchange’s assets and which addresses the exchange controls. Exchanges have specifically cited these privacy risks as a reason they have not deployed proofs of solvency, relying on trusted audit instead.

In a new paper presented this month at CCS (co-authored with Gaby G. Dagher, Benedikt Bünz, Jeremy Clark and Dan Boneh), we present Provisions, the first cryptographic proof-of-solvency with strong privacy guarantees. Our protocol is suitable for Bitcoin but would work for most other cryptocurrencies (e.g. Litecoin, Ethereum). Our protocol hides the total assets and liabilities of the exchange, proving only that assets are strictly greater than liabilities. If desired, the value of this surplus can be proven. Provisions also hides all customer balances and hides which Bitcoin addresses the bank controls within a configurable anonymity set of other addresses on the block chain. The proofs are large, but reasonable to compute on a daily basis (in the tens of GB for a large exchange, computable in about an hour). Best of all, it is very simple and fast for each user to verify that they have been correctly included. We can even extend the protocol to prevent collusion between exchanges. The details are in the paper, the full version of which is now online.

While our Provisions protocol removes the privacy concerns of performing a cryptographic proof-of-solvency, there are still some practical deployment questions because the proof requires the exchange to compute using its private keys. Exchanges rightly go to great lengths to protect these keys, often keeping them offline and/or in hardware security modules. Performing a regular solvency proof requires careful thinking about the right internal procedure for accessing these keys.

These deployment questions can be solved. We hope that cryptographic proofs of solvency will soon be expected of upstanding exchanges. Incidents like that of Mt. Gox have greatly damaged public perception of the entire Bitcoin ecosystem. While solvency proofs can’t prevent exchange compromises, they would have made Mt. Gox’s troubles public earlier and more clearly. They would also shore up confidence in today’s exchanges which are (presumably) solvent.

Taking a step back, solvency proofs are yet another example where we can replace an  expensive and trust-laden process in the offline world (financial inspection by a trusted auditor) with a “trustless” cryptographic protocol. It’s always exciting to take a new step in that direction. There remain limits as to what cryptography can do though. Critically, solvency proofs do not create a binding obligation to pay. A malicious exchange could complete a Provisions proof and then immediately abscond with all of the money. For this reason, some form of government regulation of online exchanges makes sense. Though regulation is dreaded by many in the Bitcoin community, it appears to be on the horizon. Bills have been proposed in several states, largely aimed at exchanges. Interestingly, the model regulatory framework proposed by the Conference of State Bank Supervisors in September already mentions cryptographic solvency proofs as a means of demonstrating solvency. We hope this recommendation is enacted in law and solvency proofs are a tool to avoid the cost of the heavyweight auditing requirements traditionally demanded of banks, while simultaneously increasing transparency for exchange customers.


Classified material in the public domain: what’s a university to do?

Yesterday I posted some thoughts about Purdue University’s decision to destroy a video recording of my keynote address at its Dawn or Doom colloquium. The organizers had gone dark, and a promised public link was not forthcoming. After a couple of weeks of hoping to resolve the matter quietly, I did some digging and decided to write up what I learned. I posted on the web site of the Century Foundation, my main professional home:

It turns out that Purdue has wiped all copies of my video and slides from university servers, on grounds that I displayed classified documents briefly on screen. A breach report was filed with the university’s Research Information Assurance Officer, also known as the Site Security Officer, under the terms of Defense Department Operating Manual 5220.22-M. I am told that Purdue briefly considered, among other things, whether to destroy the projector I borrowed, lest contaminants remain.

I was, perhaps, naive, but pretty much all of that came as a real surprise.

Let’s rewind. Information Assurance? Site Security?

These are familiar terms elsewhere, but new to me in a university context. I learned that Purdue, like a number of its peers, has a “facility security clearance” to perform classified U.S. government research. The manual of regulations runs to 141 pages. (Its terms forbid uncleared trustees to ask about the work underway on their campus, but that’s a subject for another day.) The pertinent provision here, spelled out at length in a manual called Classified Information Spillage, requires “sanitization, physical removal, or destruction” of classified information discovered on unauthorized media.

Two things happened in rapid sequence around the time I told Purdue about my post.

First, the university broke a week-long silence and expressed a measure of regret:

UPDATE: Just after posting this item I received an email from Julie Rosa, who heads strategic communications for Purdue. She confirmed that Purdue wiped my video after consulting the Defense Security Service, but the university now believes it went too far.

“In an overreaction while attempting to comply with regulations, the video was ordered to be deleted instead of just blocking the piece of information in question. Just FYI: The conference organizers were not even aware that any of this had happened until well after the video was already gone.”

“I’m told we are attempting to recover the video, but I have not heard yet whether that is going to be possible. When I find out, I will let you know and we will, of course, provide a copy to you.”

Then Edward Snowden tweeted the link, and the Century Foundation’s web site melted down. It now redirects to Medium, where you can find the full story.

I have not heard back from Purdue today about recovery of the video. It is not clear to me how recovery is even possible, if Purdue followed Pentagon guidelines for secure destruction. Moreover, although the university seems to suggest it could have posted most of the video, it does not promise to do so now. Most importantly, the best that I can hope for here is that my remarks and slides will be made available in redacted form — with classified images removed, and some of my central points therefore missing. There would be one version of the talk for the few hundred people who were in the room on Sept. 24, and for however many watched the live stream, and another version left as the only record.

For our purposes here, the most notable questions have to do with academic freedom in the context of national security. How did a university come to “sanitize” a public lecture it had solicited, on the subject of NSA surveillance, from an author known to possess the Snowden documents? How could it profess to be shocked to find that spillage is going on at such a talk? The beginning of an answer came, I now see, in the question and answer period after my Purdue remarks. A post-doctoral research engineer stood up to ask whether the documents I had put on display were unclassified. “No,” I replied. “They’re classified still.” Eugene Spafford, a professor of computer science there, later attributed that concern to “junior security rangers” on the faculty and staff. But the display of Top Secret material, he said, “once noted, … is something that cannot be unnoted.”

Someone reported my answer to Purdue’s Research Information Assurance Officer, who reported in turn to Purdue’s representative at the Defense Security Service. By the terms of its Pentagon agreement, Purdue decided it was now obliged to wipe the video of my talk in its entirety. I regard this as a rather devout reading of the rules, which allowed Purdue to “realistically consider the potential harm that may result from compromise of spilled information.” The slides I showed had been viewed already by millions of people online. Even so, federal funding might be at stake for Purdue, and the notoriously vague terms of the Espionage Act hung over the decision. For most lawyers, “abundance of caution” would be the default choice. Certainly that kind of thinking is commonplace, and sometimes appropriate, in military and intelligence services.

But universities are not secret agencies. They cannot lightly wear the shackles of a National Industrial Security Program, as Purdue agreed to do. The values at their core, in principle and often in practice, are open inquiry and expression.

I do not claim I suffered any great harm when Purdue purged my remarks from its conference proceedings. I do not lack for publishers or public forums. But the next person whose talk is disappeared may have fewer resources.

More importantly, to my mind, Purdue has compromised its own independence and that of its students and faculty. It set an unhappy precedent, even if the people responsible thought they were merely following routine procedures.

One can criticize the university for its choices, and quite a few have since I published my post. What interests me is how nearly the results were foreordained once Purdue made itself eligible for Top Secret work.

Think of it as a classic case of mission creep. Purdue invited the secret-keepers of the Defense Security Service into one cloistered corner of campus (“a small but significant fraction” of research in certain fields, as the university counsel put it). The trustees accepted what may have seemed a limited burden, confined to the precincts of classified research.

Now the security apparatus claims jurisdiction over the campus (“facility”) at large. The university finds itself “sanitizing” a conference that has nothing to do with any government contract.

I am glad to see that Princeton takes the view that “[s]ecurity regulations and classification of information are at variance with the basic objectives of a University.” It does not permit faculty members to do classified work on campus, which avoids Purdue’s “facility” problem. And even so, at Princeton and elsewhere, there may be an undercurrent of self-censorship and informal restraint against the use of documents derived from unauthorized leaks.

Two of my best students nearly dropped a course I taught a few years back, called “Secrecy, Accountability and the National Security State,” when they learned the syllabus would include documents from Wikileaks. Both had security clearances, for summer jobs, and feared losing them. I told them I would put the documents on Blackboard, so they need not visit the Wikileaks site itself, but the readings were mandatory. Both, to their credit, stayed in the course. They did so against the advice of some of their mentors, including faculty members. The advice was purely practical. The U.S. government will not give a clear answer when asked whether this sort of exposure to published secrets will harm job prospects or future security clearances. Why take the risk?

Every student and scholar must decide for him- or herself, but I think universities should push back harder, and perhaps in concert. There is a treasure trove of primary documents in the archives made available by Snowden and Chelsea Manning. The government may wish otherwise, but that information is irretrievably in the public domain. Should a faculty member ignore the Snowden documents when designing a course on network security architecture? Should a student write a dissertation on modern U.S.-Saudi relations without consulting the numerous diplomatic cables on Wikileaks? To me, those would be abdications of the basic duty to seek out authoritative sources of knowledge, wherever they reside.

I would be interested to learn how others have grappled with these questions. I expect to write about them in my forthcoming book on surveillance, privacy and secrecy.

avatar can use your DNA to target ads

With the reduction in costs of genotyping technology, genetic genealogy has become accessible to more people. Various websites such as offer genetic genealogy services. Users of these services are mailed an envelope with a DNA collection kit, in which users deposit their saliva. The users then mail their kits back to the service and their samples are processed. The genealogy company will try to match the user’s DNA against other users in its genealogy and genetic database. As these services become more popular, we need more public discourse about the implications of releasing our genetic information to commercial enterprises.

Given that genetic information can be very sensitive, I found that the privacy policy of Ancestry’s DNA services has some surprising disclosures about how they could use your genetic information.

Here are some excerpts with the worrying parts in bold:

Subject to the restrictions described in this Privacy Statement and applicable law, we may use personal information for any reasonable purpose related to the business, including to communicate with you, to provide you information about Ancestry’s and AncestryDNA’s products and services, to respond to your requests, to update our product offerings, to improve the content and User experience on the AncestryDNA Website, to help you and others discover more about your family, to let you know about offers of interest from AncestryDNA or Ancestry, and to prepare and perform demographic, benchmarking, advertising, marketing, and promotional studies.

To distribute advertisements: AncestryDNA strives to show relevant advertisements. To that end, AncestryDNA may use the information you provide to us, as well as any analyses we perform, aggregated demographic information (such as women between the ages of 45-60), anonymized data compared to data from third parties, or the placement of cookies and other tracking technologies… In these ways, AncestryDNA can display relevant ads on the AncestryDNA Website, third party websites, or elsewhere.

The privacy policy gives Ancestry permission to use its users’ genetic information for advertising purposes. When I inquired with Ancestry, they pointed to the following part of their privacy policy:

We do not provide advertisers with access to individual account information. AncestryDNA does not sell, rent or otherwise distribute the personal information you provide us to these advertisers unless you have given us your consent to do so.

However, it is not clear how your personal information can be used to display “relevant ads” unless either Ancestry operates as an ad network itself or Ancestry communicates some personal information to third party advertisers in order to target the ads. Below, I expand on concerns raised by this privacy policy:

Users may “consent” to the use of their genetic data unknowingly. The privacy policy says Ancestry can distribute users’ private information if Ancestry gets permission first. That permission could be granted by a dialog that users click through without much thought. Research has shown that users are already desensitized to privacy and security warnings.

Even if only Ancestry is using the personal information to target ads, the data might accidentally find its way to third parties. Researchers have demonstrated how it can be difficult to avoid information leakage through URLs or cookies or more sophisticated attacks. If Ancestry categorizes its users according to their genetic traits and then stores and transfers these categories in cookies and URL parameters (a common practice for the analogous “behavioral segment” categories used for many targeted ads), then the genetic data can easily leak to third parties.

The genetic data collected by these services may endanger the privacy of users and their families. A genome is not something easily made unlinkable. Only 33 bits of entropy are necessary to uniquely identify a person. The DNA profiles used by law enforcement in the US today take samples from 13 location on the genome, and have about 54 bits of entropy. The test that Ancestry uses samples 700,000 locations on the genome, which will likely have much more than 33 bits of entropy. In fact, I believe this is enough entropy to compromise not only an individual’s privacy, but also the privacy of family members. With the 13 CODIS locations, law enforcement can already do familial searches for close family members. I hope to touch on the familial aspects of DNA privacy at a later date. The compromise of familial privacy is in part what makes collecting and distributing DNA even more sensitive that just collecting an individual’s full name or address.

Genetic data can be used to discriminate against people on the basis of characteristics they cannot control. More than identity, DNA data may allow someone to infer behavior and health attributes. Major concerns about the impact of genetic information on employment and health insurance led Congress to pass the Genetic Information Nondiscrimination Act, which makes it illegal to use genetics to decide hiring or health insurance pricing. However, GINA may not effectively deter people who 1) are not employers or insurers (e.g., landlords discriminating in their choice of tenants, which is prohibited by California state law but not by the federal provisions in GINA); 2) do not believe they will be caught; or 3) are not aware that they are discriminating, as discussed next.

Unintentional discrimination may occur. The big data report from the White House warns that the “increasing use of algorithms to make eligibility decisions must be carefully monitored for potential discriminatory outcomes for disadvantaged groups, even absent discriminatory intent.” An algorithm that takes genetic information as an input likely will lead to results that differ based on genes. This outcome already discriminates on the basis of genetics, and because genes are correlated with other sensitive attributes, it can also discriminate on the basis of characteristics such as race or health status. The discrimination occurs whether or not the algorithm’s user intended it.


What should we do about re-identification? A precautionary approach to big data privacy

Computer science research on re-identification has repeatedly demonstrated that sensitive information can be inferred even from de-identified data in a wide variety of domains. This has posed a vexing problem for practitioners and policy makers. If the absence of “personally identifying information” cannot be relied on for privacy protection, what are the alternatives? Joanna Huey, Ed Felten, and I tackle this question in a new paper “A Precautionary Approach to Big Data Privacy”. Joanna presented the paper at the Computers, Privacy & Data Protection conference earlier this year.

[Read more…]


We can de-anonymize programmers from coding style. What are the implications?

In a recent post, I talked about our paper showing how to identify anonymous programmers from their coding styles. We used a combination of lexical features (e.g., variable name choices), layout features (e.g., spacing), and syntactic features (i.e., grammatical structure of source code) to represent programmers’ coding styles. The previous post focused on the overall results and techniques we used. Today I’ll talk about applications and explain how source code authorship attribution can be used in software forensics, plagiarism detection, copyright or copyleft investigations, and other domains.

[Read more…]


Android WebView security and the mobile advertising marketplace

Freedom to Tinker readers are probably aware of the current controversy over Google’s handling of ongoing security vulnerabilities in its Android WebView component. What sounds at first like a routine security problem turns out to have some deep challenges.  Let’s start by filling in some background and build up to the big problem they’re not talking about: Android advertising.
[Read more…]


Anonymous programmers can be identified by analyzing coding style

Every programmer learns to code in a unique way which results in distinguishing “fingerprints” in coding style. These fingerprints can be used to compare the source code of known programmers with an anonymous piece of source code to find out which one of the known programmers authored the anonymous code. This method can aid in finding malware programmers or detecting cases of plagiarism. In a recent paper, we studied this question, which we call source-code authorship attribution. We introduced a principled method with a robust feature set and achieved a breakthrough in accuracy.

[Read more…]


Verizon’s tracking header: Can they do better?

Verizon’s practice of injecting a unique ID into the HTTP headers of traffic originating on their wireless network has alarmed privacy advocates and researchers. Jonathan Mayer detailed how this header is already being used by third-parties to create zombie cookies. In this post, I summarize just how much information Verizon collects and shares under their marketing programs. I’ll show how the implementation of the header makes previous tracking methods trivial and explore the possibility of a more secure design.

[Read more…]


How cookies can be used for global surveillance

Today we present an updated version of our paper [0] examining how the ubiquitous use of online tracking cookies can allow an adversary conducting network surveillance to target a user or surveil users en masse.  In the initial version of the study, summarized below, we examined the technical feasibility of the attack. Now we’ve made the attack model more complete and nuanced as well as analyzed the effectiveness of several browser privacy tools in preventing the attack. Finally, inspired by Jonathan Mayer and Ed Felten’s The Web is Flat study, we incorporate the geographic topology of the Internet into our measurements of simulated web traffic and our adversary model, providing a more realistic view of how effective this attack is in practice. [Read more…]


Striking a balance between advertising and ad blocking

In the news, we have a consortium of French publishers, which somehow includes several major U.S. corporations (Google, Microsoft), attempting to sue AdBlock Plus developer Eyeo, a German firm with developers around the world. I have no idea of the legal basis for their case, but it’s all about the money. AdBlock Plus and the closely related AdBlock are among the most popular Chrome extensions, by far, and publishers will no doubt claim huge monetary damages around presumed “lost income”.
[Read more…]