May 27, 2015

avatar

No silver bullet: De-identification still doesn’t work

Paul Ohm’s 2009 article Broken Promises of Privacy spurred a debate in legal and policy circles on the appropriate response to computer science research on re-identification techniques. In this debate, the empirical research has often been misunderstood or misrepresented. A new report by Ann Cavoukian and Daniel Castro is full of such inaccuracies, despite its claims of “setting the record straight.”

In a response to this piece, Ed Felten and I point out eight of our most serious points of disagreement with Cavoukian and Castro. The thrust of our arguments is that (i) there is no evidence that de-identification works either in theory or in practice and (ii) attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do. [Read more…]

avatar

Wickr: Putting the “non” in anonymity

[Let’s welcome new CITP blogger Pete Zimmerman, a first-year graduate student in the computer security group at Princeton. — Arvind Narayanan]

Following the revelations of wide-scale surveillance by US intelligence agencies and their allies, a myriad of services offering end-to-end encrypted communications have cropped up to take advantage of the increasing demand for privacy from surveillance. When coupled with anonymity, end-to-end encryption can prevent a central service provider from obtaining any information about its users or their communications.  However, maintaining anonymity is difficult while simultaneously offering a straightforward way for users to find each other.

Enter Wickr.  This startup offers a simple app featuring “military grade encryption” of text, photo, video, and voice messages as well as anonymous registration for its users. Wickr claims that it cannot identify who has registered with the service or which of its users are communicating with each other.  During registration, users enter their email address and/or phone number (non-Wickr IDs).  The app utilizes a cryptographic hash function (SHA-256 in this case) to obtain “anonymous” Wickr IDs from the non-Wickr IDs.  Wickr IDs are then stored server-side and used for discovery.  When your friends want to find you, they enter your phone number or email address, which is then put through the same hash function, resulting in the same output (Wickr ID).  Wickr looks this up in its database to determine if you’ve registered with the service to facilitate message exchange. This process simplifies the discovery of other users, supposedly without Wickr having the ability to identify the users of the anonymous service.

The problem here is that while it’s not always possible to determine the input to a hash function given the output, we can leverage the fact that the same input always yields the same output. If the number of possible inputs is small, we can simply try all of them.  Unfortunately, this is a recurring theme in a variety of applications as a result of misunderstanding cryptography — specifically, the fact that hash functions are not one-way if the input space is small.  A great explanation on the use of cryptographic hash functions in attempts to anonymize data can be found here.
[Read more…]

avatar

The importance of anonymous cryptocurrencies

Recently I was part of a collaboration on Mixcoin, a set of proposals for improving Bitcoin’s anonymity. A natural question to ask is: why do this research? Before I address that, an even more basic question is whether or not Bitcoin is already anonymous. You may have seen back-and-forth arguments on this question. So which is it?

An analogy with Internet anonymity is useful. The Bitcoin protocol doesn’t require users to provide identities, just like the Internet Protocol doesn’t. This is what people usually mean when they say Bitcoin is anonymous. But that statement by itself tells us little. A meaningful answer cannot be found at the protocol level, but by analyzing the ecosystem of services that develop around the protocol. [*]

[Read more…]

avatar

Unlocking Hidden Consensus in Legislatures

A legislature is a small group with a big impact. Even for people who will never be part of one, the mechanics of a legislature matter — when they work well, we all benefit, and when they work poorly, we all lose out.

At the same time, with several hundred participants, legislatures are large enough to face some of the same collective action problems and risks that happen in larger groups.

As an example, consider the U.S. Congress. Many members might prefer to reduce agricultural subsidies (a view shared by most Americans, in polls) but few are eager to stick their necks out and attract the concentrated ire of the agricultural lobby. Most Americans care more about other things, and since large agricultural firms lobby and contribute heavily to protect the subsidies, individual members face a high political cost for acting on their own (and their constituents’) views on the issue. As another case in point, nearly all Congressional Republicans have signed Grover Norquist’s Taxpayer Protection Pledge, promising never to raise taxes at all (not even minimally, in exchange for massive spending cuts). But every plausible path to a balanced budget involves a combination of spending cuts and tax increases. Some Republicans may privately want to put a tax increase on the table, as they negotiate with Democrats to balance the budget — in which case there may be majority support for a civically fruitful compromise involving both tax increases and spending cuts. But any one Republican who defects from the pledge will face well-financed retaliation — because, and only because, with few or no defectors, Norquist can afford to respond to each one.

These are classic collective action problems. What if, as in other areas of life, we could build an online platform to help solve them?

[Read more…]

avatar

New Research Result: Bubble Forms Not So Anonymous

Today, Joe Calandrino, Ed Felten and I are releasing a new result regarding the anonymity of fill-in-the-bubble forms. These forms, popular for their use with standardized tests, require respondents to select answer choices by filling in a corresponding bubble. Contradicting a widespread implicit assumption, we show that individuals create distinctive marks on these forms, allowing use of the marks as a biometric. Using a sample of 92 surveys, we show that an individual’s markings enable unique re-identification within the sample set more than half of the time. The potential impact of this work is as diverse as use of the forms themselves, ranging from cheating detection on standardized tests to identifying the individuals behind “anonymous” surveys or election ballots.

If you’ve taken a standardized test or voted in a recent election, you’ve likely used a bubble form. Filling in a bubble doesn’t provide much room for inadvertent variation. As a result, the marks on these forms superficially appear to be largely identical, and minor differences may look random and not replicable. Nevertheless, our work suggests that individuals may complete bubbles in a sufficiently distinctive and consistent manner to allow re-identification. Consider the following bubbles from two different individuals:

These individuals have visibly different stroke directions, suggesting a means of distinguishing between both individuals. While variation between bubbles may be limited, stroke direction and other subtle features permit differentiation between respondents. If we can learn an individual’s characteristic features, we may use those features to identify that individual’s forms in the future.

To test the limits of our analysis approach, we obtained a set of 92 surveys and extracted 20 bubbles from each of those surveys. We set aside 8 bubbles per survey to test our identification accuracy and trained our model on the remaining 12 bubbles per survey. Using image processing techniques, we identified the unique characteristics of each training bubble and trained a classifier to distinguish between the surveys’ respondents. We applied this classifier to the remaining test bubbles from a respondent. The classifier orders the candidate respondents based on the perceived likelihood that they created the test markings. We repeated this test for each of the 92 respondents, recording where the correct respondent fell in the classifier’s ordered list of candidate respondents.

If bubble marking patterns were completely random, a classifier could do no better than randomly guessing a test set’s creator, with an expected accuracy of 1/92 ? 1%. Our classifier achieves over 51% accuracy. The classifier is rarely far off: the correct answer falls in the classifier’s top three guesses 75% of the time (vs. 3% for random guessing) and its top ten guesses more than 92% of the time (vs. 11% for random guessing). We conducted a number of additional experiments exploring the information available from marked bubbles and potential uses of that information. See our paper for details.

Additional testing—particularly using forms completed at different times—is necessary to assess the real-world impact of this work. Nevertheless, the strength of these preliminary results suggests both positive and negative implications depending on the application. For standardized tests, the potential impact is largely positive. Imagine that a student takes a standardized test, performs poorly, and pays someone to repeat the test on his behalf. Comparing the bubble marks on both answer sheets could provide evidence of such cheating. A similar approach could detect third-party modification of certain answers on a single test.

The possible impact on elections using optical scan ballots is more mixed. One positive use is to detect ballot box stuffing—our methods could help identify whether someone replaced a subset of the legitimate ballots with a set of fraudulent ballots completed by herself. On the other hand, our approach could help an adversary with access to the physical ballots or scans of them to undermine ballot secrecy. Suppose an unscrupulous employer uses a bubble form employment application. That employer could test the markings against ballots from an employee’s jurisdiction to locate the employee’s ballot. This threat is more realistic in jurisdictions that release scans of ballots.

Appropriate mitigation of this issue is somewhat application specific. One option is to treat surveys and ballots as if they contain identifying information and avoid releasing them more widely than necessary. Alternatively, modifying the forms to mask marked bubbles can remove identifying information but, among other risks, may remove evidence of respondent intent. Any application demanding anonymity requires careful consideration of options for preventing creation or disclosure of identifying information. Election officials in particular should carefully examine trade-offs and mitigation techniques if releasing ballot scans.

This work provides another example in which implicit assumptions resulted in a failure to recognize a link between the output of a system (in this case, bubble forms or their scans) and potentially sensitive input (the choices made by individuals completing the forms). Joe discussed a similar link between recommendations and underlying user transactions two weeks ago. As technologies advance or new functionality is added to systems, we must explicitly re-evaluate these connections. The release of scanned forms combined with advances in image analysis raises the possibility that individuals may inadvertently tie themselves to their choices merely by how they complete bubbles. Identifying such connections is a critical first step in exploiting their positive uses and mitigating negative ones.

This work will be presented at the 2011 USENIX Security Symposium in August.

avatar

Electronic Voting Researcher Arrested Over Anonymous Source

Updates: 8/28 Alex Halderman: Indian E-Voting Researcher Freed After Seven Days in Police Custody
8/26 Alex Halderman: Indian E-Voting Researcher Remains in Police Custody
8/24 Ed Felten: It’s Time for India to Face its E-Voting Problem
8/22 Rop Gonggrijp: Hari is in jail :-(

About four months ago, Ed Felten blogged about a research paper in which Hari Prasad, Rop Gonggrijp, and I detailed serious security flaws in India’s electronic voting machines. Indian election authorities have repeatedly claimed that the machines are “tamperproof,” but we demonstrated important vulnerabilities by studying a machine provided by an anonymous source.

The story took a disturbing turn a little over 24 hours ago, when my coauthor Hari Prasad was arrested by Indian authorities demanding to know the identity of that source.

At 5:30 Saturday morning, about ten police officers arrived at Hari’s home in Hyderabad. They questioned him about where he got the machine we studied, and at around 8 a.m. they placed him under arrest and proceeded to drive him to Mumbai, a 14 hour journey.

The police did not state a specific charge at the time of the arrest, but it appears to be a politically motivated attempt to uncover our anonymous source. The arresting officers told Hari that they were under “pressure [from] the top,” and that he would be left alone if he would reveal the source’s identity.

Hari was allowed to use his cell phone for a time, and I spoke with him as he was being driven by the police to Mumbai: (Video on YouTube)

The Backstory

India uses paperless electronic voting machines nationwide, and the Election Commission of India, the country’s highest election authority, has often stated that the machines are “perfect” and “fully tamper-proof.” Despite widespread reports of election irregularities and suspicions of electronic fraud, the Election Commission has never permitted security researchers to complete an independent evaluation nor allowed the public to learn crucial technical details of the machines’ inner workings. Hari and others in India repeatedly offered to collaborate with the Election Commission to better understand the security of the machines, but they were not permitted to complete a serious review.

Then, in February of this year, an anonymous source approached Hari and offered a machine for him to study. This source requested anonymity, and we have honored this request. We have every reason to believe that the source had lawful access to the machine and made it available for scientific study as a matter of conscience, out of concern over potential security problems.

Later in February, Rop Gonggrijp and I joined Hari in Hyderabad and conducted a detailed security review of the machine. We discovered that, far from being tamperproof, it suffers from a number of weaknesses. There are many ways that dishonest election insiders or other criminals with physical access could tamper with the machines to change election results. We illustrated two ways that this could happen by constructing working demonstration attacks and detailed these findings in a research paper, Security Analysis of India’s Electronic Voting Machines. The paper recently completed peer review and will appear at the ACM Computer and Communications Security conference in October.

Our work has produced a hot debate in India. Many commentators have called for the machines to be scrapped, and 16 political parties representing almost half of the Indian parliament have expressed serious concerns about the use of electronic voting.

Earlier this month at EVT/WOTE, the leading international workshop for electronic voting research, two representatives from the Election Commission of India joined in a panel discussion with Narasimha Rao, a prominent Indian electronic voting critic, and me. (I will blog more about the panel in coming days.) After listening to the two sides argue over the security of India’s voting machines, 28 leading experts in attendance signed a letter to the Election Commission stating that “India’s [electronic voting machines] do not today provide security, verifiability, or transparency adequate for confidence in election results.”

Nevertheless, the Election Commission continues to deny that there is a security problem. Just a few days ago, Chief Election Commissioner S.Y. Quraishi told reporters that the machines “are practically totally tamper proof.”

Effects of the Arrest

This brings us to today’s arrest. Hari is spending Saturday night in a jail cell, and he told me he expects to be interrogated by the authorities in the morning. Hari has retained a lawyer, who will be flying to Mumbai in the next few hours and who hopes to be able to obtain bail within days. Hari seemed composed when I spoke to him, but he expressed great concern for his wife and children, as well as for the effect his arrest might have on other researchers who might consider studying electronic voting in India.

If any good has come from this, it’s that there has been an outpouring of support for Hari. He has received positive messages from people all over India.

Unfortunately, the entire issue distracts from the primary problem: India’s electronic voting machines have fundamental security flaws, and do not provide the transparency necessary for voters to have confidence in elections. To fix these problems, the Election Commission will need help from India’s technical community. Arresting and interrogating a key member of that community is enormously counterproductive.


Professor J. Alex Halderman is a computer scientist at the University of Michigan.

avatar

Pseudonyms: The Natural State of Online Identity

I’ve been writing recently about the problems that arise when you try to use cryptography to verify who is at the other end of a network connection. The cryptographic math works, but that doesn’t mean you get the identity part right.

You might think, from this discussion, that crypto by itself does nothing — that cryptographic security can only be bootstrapped from some kind of real-world identity verification. That’s the way it works for website certificates, where a certificate authority has to check your bona fides before it will issue you a certificate.

But this intuition turns out to be wrong. There is one thing that crypto can do perfectly, without any real-world support: providing pseudonyms. Indeed, crypto is so good at supporting pseudonyms that we can practically say that pseudonyms are the natural state of identity online.

To explain why this is true, I need to offer a gentle introduction to a basic crypto operation: digital signatures. Suppose John Doe (“JD”) wants to use digital signatures. First, JD needs to create a private cryptographic key, which he does by generating some random numbers and combining them according to a special geeky recipe. The result is a unique private key that only JD knows. Next, JD uses a certain procedure to determine the public key that corresponds to his private key. He announces the public key to everyone. The math guarantees that (1) JD’s public key is unique and corresponds to JD’s private key, and (2) a person who knows JD’s public key can’t figure out JD’s private key.

Now JD can make digital signatures. If JD wants to “sign” a certain message M, he combines M with JD’s private key in a special way, and the result is JD’s “signature on M”. Now anybody can verify the signature, using JD’s public key. Only JD can make the signature, because only JD knows JD’s private key; but anybody can verify the signature.

At no point in this process does JD tell anybody who he is — I called him “John Doe” for a reason. Indeed, JD’s public key is a perfect pseudonym: it conveys nothing about JD’s actual identity, yet it has a distinct “owner” whose presence can be verified. (“You’re really the person who created this public key? Then you should be able to make a signature on the message ‘squeamish ossifrage’ for me….”)

Using this method, anybody can make up a fresh pseudonym whenever they want. If you can generate random numbers and do some math (or have your computer do those things for you), then you can make a fresh pseudonym. You can make as many as you want, without needing to coordinate with anybody. This is all easy to do.

These methods, pseudonyms and signatures, are used even in cases where we want to verify somebody’s real-world identity. When you connect to (say) https://mail.google.com, Google’s web server gives you its public key — a pseudonym — along with a digital certificate that attests that that public key — that pseudonym — belongs to Google Inc. Binding public keys — pseudonyms — to real-world identities is tedious and messy, but of course this is often necessary in practice.

Online, identities are hard to manage. Pseudonyms are easy.