Recently I was part of a collaboration on Mixcoin, a set of proposals for improving Bitcoin’s anonymity. A natural question to ask is: why do this research? Before I address that, an even more basic question is whether or not Bitcoin is already anonymous. You may have seen back-and-forth arguments on this question. So which […]
Unlocking Hidden Consensus in Legislatures
A legislature is a small group with a big impact. Even for people who will never be part of one, the mechanics of a legislature matter — when they work well, we all benefit, and when they work poorly, we all lose out. At the same time, with several hundred participants, legislatures are large enough […]
New Research Result: Bubble Forms Not So Anonymous
Today, Joe Calandrino, Ed Felten and I are releasing a new result regarding the anonymity of fill-in-the-bubble forms. These forms, popular for their use with standardized tests, require respondents to select answer choices by filling in a corresponding bubble. Contradicting a widespread implicit assumption, we show that individuals create distinctive marks on these forms, allowing use of the marks as a biometric. Using a sample of 92 surveys, we show that an individual’s markings enable unique re-identification within the sample set more than half of the time. The potential impact of this work is as diverse as use of the forms themselves, ranging from cheating detection on standardized tests to identifying the individuals behind “anonymous” surveys or election ballots.
If you’ve taken a standardized test or voted in a recent election, you’ve likely used a bubble form. Filling in a bubble doesn’t provide much room for inadvertent variation. As a result, the marks on these forms superficially appear to be largely identical, and minor differences may look random and not replicable. Nevertheless, our work suggests that individuals may complete bubbles in a sufficiently distinctive and consistent manner to allow re-identification. Consider the following bubbles from two different individuals:
These individuals have visibly different stroke directions, suggesting a means of distinguishing between both individuals. While variation between bubbles may be limited, stroke direction and other subtle features permit differentiation between respondents. If we can learn an individual’s characteristic features, we may use those features to identify that individual’s forms in the future.
To test the limits of our analysis approach, we obtained a set of 92 surveys and extracted 20 bubbles from each of those surveys. We set aside 8 bubbles per survey to test our identification accuracy and trained our model on the remaining 12 bubbles per survey. Using image processing techniques, we identified the unique characteristics of each training bubble and trained a classifier to distinguish between the surveys’ respondents. We applied this classifier to the remaining test bubbles from a respondent. The classifier orders the candidate respondents based on the perceived likelihood that they created the test markings. We repeated this test for each of the 92 respondents, recording where the correct respondent fell in the classifier’s ordered list of candidate respondents.
If bubble marking patterns were completely random, a classifier could do no better than randomly guessing a test set’s creator, with an expected accuracy of 1/92 ? 1%. Our classifier achieves over 51% accuracy. The classifier is rarely far off: the correct answer falls in the classifier’s top three guesses 75% of the time (vs. 3% for random guessing) and its top ten guesses more than 92% of the time (vs. 11% for random guessing). We conducted a number of additional experiments exploring the information available from marked bubbles and potential uses of that information. See our paper for details.
Additional testing—particularly using forms completed at different times—is necessary to assess the real-world impact of this work. Nevertheless, the strength of these preliminary results suggests both positive and negative implications depending on the application. For standardized tests, the potential impact is largely positive. Imagine that a student takes a standardized test, performs poorly, and pays someone to repeat the test on his behalf. Comparing the bubble marks on both answer sheets could provide evidence of such cheating. A similar approach could detect third-party modification of certain answers on a single test.
The possible impact on elections using optical scan ballots is more mixed. One positive use is to detect ballot box stuffing—our methods could help identify whether someone replaced a subset of the legitimate ballots with a set of fraudulent ballots completed by herself. On the other hand, our approach could help an adversary with access to the physical ballots or scans of them to undermine ballot secrecy. Suppose an unscrupulous employer uses a bubble form employment application. That employer could test the markings against ballots from an employee’s jurisdiction to locate the employee’s ballot. This threat is more realistic in jurisdictions that release scans of ballots.
Appropriate mitigation of this issue is somewhat application specific. One option is to treat surveys and ballots as if they contain identifying information and avoid releasing them more widely than necessary. Alternatively, modifying the forms to mask marked bubbles can remove identifying information but, among other risks, may remove evidence of respondent intent. Any application demanding anonymity requires careful consideration of options for preventing creation or disclosure of identifying information. Election officials in particular should carefully examine trade-offs and mitigation techniques if releasing ballot scans.
This work provides another example in which implicit assumptions resulted in a failure to recognize a link between the output of a system (in this case, bubble forms or their scans) and potentially sensitive input (the choices made by individuals completing the forms). Joe discussed a similar link between recommendations and underlying user transactions two weeks ago. As technologies advance or new functionality is added to systems, we must explicitly re-evaluate these connections. The release of scanned forms combined with advances in image analysis raises the possibility that individuals may inadvertently tie themselves to their choices merely by how they complete bubbles. Identifying such connections is a critical first step in exploiting their positive uses and mitigating negative ones.
This work will be presented at the 2011 USENIX Security Symposium in August.
Electronic Voting Researcher Arrested Over Anonymous Source
About four months ago, Ed Felten blogged about a research paper in which Hari Prasad, Rop Gonggrijp, and I detailed serious security flaws in India’s electronic voting machines. Indian election authorities have repeatedly claimed that the machines are “tamperproof,” but we demonstrated important vulnerabilities by studying a machine provided by an anonymous source.
The story took a disturbing turn a little over 24 hours ago, when my coauthor Hari Prasad was arrested by Indian authorities demanding to know the identity of that source.
At 5:30 Saturday morning, about ten police officers arrived at Hari’s home in Hyderabad. They questioned him about where he got the machine we studied, and at around 8 a.m. they placed him under arrest and proceeded to drive him to Mumbai, a 14 hour journey.
The police did not state a specific charge at the time of the arrest, but it appears to be a politically motivated attempt to uncover our anonymous source. The arresting officers told Hari that they were under “pressure [from] the top,” and that he would be left alone if he would reveal the source’s identity.
Hari was allowed to use his cell phone for a time, and I spoke with him as he was being driven by the police to Mumbai:
The Backstory
India uses paperless electronic voting machines nationwide, and the Election Commission of India, the country’s highest election authority, has often stated that the machines are “perfect” and “fully tamper-proof.” Despite widespread reports of election irregularities and suspicions of electronic fraud, the Election Commission has never permitted security researchers to complete an independent evaluation nor allowed the public to learn crucial technical details of the machines’ inner workings. Hari and others in India repeatedly offered to collaborate with the Election Commission to better understand the security of the machines, but they were not permitted to complete a serious review.
Then, in February of this year, an anonymous source approached Hari and offered a machine for him to study. This source requested anonymity, and we have honored this request. We have every reason to believe that the source had lawful access to the machine and made it available for scientific study as a matter of conscience, out of concern over potential security problems.
Later in February, Rop Gonggrijp and I joined Hari in Hyderabad and conducted a detailed security review of the machine. We discovered that, far from being tamperproof, it suffers from a number of weaknesses. There are many ways that dishonest election insiders or other criminals with physical access could tamper with the machines to change election results. We illustrated two ways that this could happen by constructing working demonstration attacks and detailed these findings in a research paper, Security Analysis of India’s Electronic Voting Machines. The paper recently completed peer review and will appear at the ACM Computer and Communications Security conference in October.
Our work has produced a hot debate in India. Many commentators have called for the machines to be scrapped, and 16 political parties representing almost half of the Indian parliament have expressed serious concerns about the use of electronic voting.
Earlier this month at EVT/WOTE, the leading international workshop for electronic voting research, two representatives from the Election Commission of India joined in a panel discussion with Narasimha Rao, a prominent Indian electronic voting critic, and me. (I will blog more about the panel in coming days.) After listening to the two sides argue over the security of India’s voting machines, 28 leading experts in attendance signed a letter to the Election Commission stating that “India’s [electronic voting machines] do not today provide security, verifiability, or transparency adequate for confidence in election results.”
Nevertheless, the Election Commission continues to deny that there is a security problem. Just a few days ago, Chief Election Commissioner S.Y. Quraishi told reporters that the machines “are practically totally tamper proof.”
Effects of the Arrest
This brings us to today’s arrest. Hari is spending Saturday night in a jail cell, and he told me he expects to be interrogated by the authorities in the morning. Hari has retained a lawyer, who will be flying to Mumbai in the next few hours and who hopes to be able to obtain bail within days. Hari seemed composed when I spoke to him, but he expressed great concern for his wife and children, as well as for the effect his arrest might have on other researchers who might consider studying electronic voting in India.
If any good has come from this, it’s that there has been an outpouring of support for Hari. He has received positive messages from people all over India.
Unfortunately, the entire issue distracts from the primary problem: India’s electronic voting machines have fundamental security flaws, and do not provide the transparency necessary for voters to have confidence in elections. To fix these problems, the Election Commission will need help from India’s technical community. Arresting and interrogating a key member of that community is enormously counterproductive.
—
Professor J. Alex Halderman is a computer scientist at the University of Michigan.
Pseudonyms: The Natural State of Online Identity
I’ve been writing recently about the problems that arise when you try to use cryptography to verify who is at the other end of a network connection. The cryptographic math works, but that doesn’t mean you get the identity part right.
You might think, from this discussion, that crypto by itself does nothing — that cryptographic security can only be bootstrapped from some kind of real-world identity verification. That’s the way it works for website certificates, where a certificate authority has to check your bona fides before it will issue you a certificate.
But this intuition turns out to be wrong. There is one thing that crypto can do perfectly, without any real-world support: providing pseudonyms. Indeed, crypto is so good at supporting pseudonyms that we can practically say that pseudonyms are the natural state of identity online.
To explain why this is true, I need to offer a gentle introduction to a basic crypto operation: digital signatures. Suppose John Doe (“JD”) wants to use digital signatures. First, JD needs to create a private cryptographic key, which he does by generating some random numbers and combining them according to a special geeky recipe. The result is a unique private key that only JD knows. Next, JD uses a certain procedure to determine the public key that corresponds to his private key. He announces the public key to everyone. The math guarantees that (1) JD’s public key is unique and corresponds to JD’s private key, and (2) a person who knows JD’s public key can’t figure out JD’s private key.
Now JD can make digital signatures. If JD wants to “sign” a certain message M, he combines M with JD’s private key in a special way, and the result is JD’s “signature on M”. Now anybody can verify the signature, using JD’s public key. Only JD can make the signature, because only JD knows JD’s private key; but anybody can verify the signature.
At no point in this process does JD tell anybody who he is — I called him “John Doe” for a reason. Indeed, JD’s public key is a perfect pseudonym: it conveys nothing about JD’s actual identity, yet it has a distinct “owner” whose presence can be verified. (“You’re really the person who created this public key? Then you should be able to make a signature on the message ‘squeamish ossifrage’ for me….”)
Using this method, anybody can make up a fresh pseudonym whenever they want. If you can generate random numbers and do some math (or have your computer do those things for you), then you can make a fresh pseudonym. You can make as many as you want, without needing to coordinate with anybody. This is all easy to do.
These methods, pseudonyms and signatures, are used even in cases where we want to verify somebody’s real-world identity. When you connect to (say) https://mail.google.com, Google’s web server gives you its public key — a pseudonym — along with a digital certificate that attests that that public key — that pseudonym — belongs to Google Inc. Binding public keys — pseudonyms — to real-world identities is tedious and messy, but of course this is often necessary in practice.
Online, identities are hard to manage. Pseudonyms are easy.