August 19, 2017

Archives for August 2017

When the cookie meets the blockchain

Cryptocurrencies are portrayed as a more anonymous and less traceable method of payment than credit cards. So if you shop online and pay with Bitcoin or another cryptocurrency, how much privacy do you have? In a new paper, we show just how little.

Websites including shopping sites typically have dozens of third-party trackers per site. These third parties track sensitive details of payment flows, such as the items you add to your shopping cart, and their prices, regardless of how you choose to pay. Crucially, we find that many shopping sites leak enough information about your purchase to trackers that they can link it uniquely to the payment transaction on the blockchain. From there, there are well-known ways to further link that transaction to the rest of your Bitcoin wallet addresses. You can protect yourself by using browser extensions such as Adblock Plus and uBlock Origin, and by using Bitcoin anonymity techniques like CoinJoin. These measures help, but we find that linkages are still possible.

 

An illustration of the full scope of our attack. Consider three websites that happen to have the same embedded tracker. Alice makes purchases and pays with Bitcoin on the first two sites, and logs in on the third. Merchant A leaks a QR code of the transaction’s Bitcoin address to the tracker, merchant B leaks a purchase amount, and merchant C leaks Alice’s PII. Such leaks are commonplace today, and usually intentional. The tracker links these three purchases based on Alice’s browser cookie. Further, the tracker obtains enough information to uniquely (or near-uniquely) identify coins on the Bitcoin blockchain that correspond to the two purchases. However, Alice took the precaution of putting her bitcoins through CoinJoin before making purchases. Thus, either transaction individually could not have been traced back to Alice’s wallet, but there is only one wallet that participated in both CoinJoins, and is hence revealed to be Alice’s.

 

Using the privacy measurement tool OpenWPM, we analyzed 130 e-commerce sites that accept Bitcoin payments, and found that 53 of these sites leak transaction details to trackers. Many, but not all, of these leaks are by design, to enable advertising and analytics. Further, 49 sites leak personal identifiers to trackers: names, emails, usernames, and so on. This combination means that trackers can link real-world identities to Bitcoin addresses. To be clear, all of this leaked data is sitting in the logs of dozens of tracking companies, and the linkages can be done retroactively using past purchase data.

On a subset of these sites, we made real purchases using bitcoins that we first “mixed” using the CoinJoin anonymity technique.[1] We found that a tracker that observed two of our purchases — a common occurrence — would be able to identify our Bitcoin wallet 80% of the time. In our paper, we present the full details of our attack as well as a thorough analysis of its effectiveness.

Our findings are a reminder that systems without provable privacy properties may have unexpected information leaks and lurking privacy breaches. When multiple such systems interact, the leaks can be even more subtle. Anonymity in cryptocurrencies seems especially tricky, because it inherits the worst of both data anonymization (sensitive data must be publicly and permanently stored on the blockchain) and anonymous communication (privacy depends on subtle interactions arising from the behavior of users and applications).

[1] In this experiment we used 1–2 rounds of mixing. We provide evidence in the paper that while a higher mixing depth decreases the effectiveness of the attack, it doesn’t defeat it. There’s room for a more careful study of the tradeoffs here.

Getting serious about research ethics in computer science

Digital technology mediates our public and private lives. That makes computer science a powerful discipline, but it also means that ethical considerations are essential in the development of these technologies. Not all new developments may be welcomed by users, such as a patent application by Facebook that enables the company to identify their users’ emotions through cameras on their devices. A critical approach to developing digital technologies, guided by philosophical and ethical principles, will allow interventions that improve society in meaningful ways.

The Center for Information Technology Policy recently organized a conference to discuss research ethics in different computer science communities, such as machine learning, security, and Internet measurement.  This blog post is the first in a series that summarizes and builds on the panel discussions at the conference.

Prof. Arvind Narayanan points out that computer science sub-communities have traditionally developed their own community standards about what is considered to be ethical. See for example responsible vulnerability disclosure standards in information security, or the Menlo Report for the Internet measurement discipline. This allows norms and standards to be tailored to the needs of sub-disciplines. However, the increasing responsibilities of researchers and sub-communities, arising from the increasing power and reach of computer science, are sometimes met with confusion. There is a tendency to see ethical considerations as a “policy issue” to be dealt with by others.

Prof. Melissa Lane of the University Center for Human Values points out that while ethics is rooted in understanding community standards and norms, these do not exhaust it, as some researchers in computer science and other fields can sometimes be tempted to think.  Rather, the academic study of ethics provides the tools to critically reflect on these norms and challenge existing and new practices. A meaningful computer science research ethics therefore does not just translate existing norms into functional requirements, but explores how values are enabled, operationalized, or stifled through technology. A careful analysis of a particular context may even uncover new values that were previously taken for granted or not even considered to be a norm. Think, for example, of “disattendability” — the idea of going about your business without anyone tracking you or paying attention to you. We usually take this for granted in the physical world, but on the Internet, ad trackers, among others, actively violate this norm on an ongoing basis. By understanding the effects of design choices and methodologies, ethics guides technology designers to choose the most appropriate approach among the available alternatives.

Ethics is known for its somewhat conflicting theories, such as consequentialism (“Ends justify the Means”) and deontology (“Act in such a way that you treat humanity […] never merely as a means to an end, but always at the same time as an end”). Prof. Susan Brison cautions against an approach that simply takes an ethical theory and applies it to a technology. She raised the question whether computer science research and data science may require new types of ethics, or evolved theories. Digital data is changing the underlying properties of information, whereby our traditional ways of thinking are being challenged in important ways. For example, micro-targeting of bespoke political messages to individuals circumvents the ability to let ‘good speech’ drown out ‘bad speech’, which is a foundational idea for the concept of freedom of speech.

In my research, I’ve found that ethical guidelines can be incomplete, inaccessible, or conflicting, and existing legal statutes from previous technological eras may not be directly applicable to current technology. This has resulted in computer science communities being somewhat confused about their ethical and legal responsibilities. The upcoming posts in this series will explore some of the ethical standards in machine learning, security, algorithmic transparency, and Internet measurement. We welcome any feedback to move this discussion forward at a crucial time for the ethics of computer science.

See the introduction to the conference here.