October 6, 2022

Archives for June 2011

Supreme Court Takes Important GPS Tracking Case

This morning, the Supreme Court agreed to hear an appeal next term of United States v. Jones (formerly United States v. Maynard), a case in which the D.C. Circuit Court of Appeals suppressed evidence of a criminal defendant’s travels around town, which the police collected using a tracking device they attached to his car. For more background on the case, consult the original opinion and Orin Kerr’s previous discussions about the case.

No matter what the Court says or holds, this case will probably prove to be a landmark. Watch it closely.

(1) Even if the Court says nothing else, it will face the constitutionally of the use by police of tracking beepers to follow criminal suspects. In a pair of cases from the mid-1980’s, the Court held that the police did not need a warrant to use a tracking beeper to follow a car around on public, city streets (Knotts) but did need a warrant to follow a beeper that was moved indoors (Karo) because it “reveal[ed] a critical fact about the interior of the premises.” By direct application of these cases, the warrantless tracking in Jones seems constitutional, because it was restricted to movement on public, city streets.

Not so fast, said the D.C. Circuit. In Jones, the police tracked the vehicle 24 hours a day for four weeks. Citing the “mosaic theory often invoked by the Government in cases involving national security information,” the Court held that the whole can sometimes be more than the parts. Tracking a car continuously for a month is constitutionally different in kind not just degree from tracking a car along a single trip. This is a new approach to the Fourth Amendment, one arguably at odds with opinions from other Courts of Appeal.

(2) This case gives the Court the opportunity to speak generally about the Fourth Amendment and location privacy. Depending on what it says, it may provide hints for lower courts struggling with the government’s use of cell phone location information, for example.

(3) For support of its embrace of the mosaic theory, the D.C. Circuit cited a 1989 Supreme Court case, U.S. Department of Justice v. National Reporters Committee. In this case, which involved the Freedom of Information Act (FOIA) not the Fourth Amendment, the Court allowed the FBI to refuse to release compiled “rap sheets” about organized crime suspects, even though the rap sheets were compiled mostly from “public” information obtainable from courthouse records. In agreeing that the rap sheets nevertheless fell within a “personal privacy” exemption from FOIA, the Court embraced, for the first time, the idea that the whole may be worth more than the parts. The Court noted the difference “between scattered disclosure of the bits of information contained in a rap-sheet and revelation of the rap-sheet as a whole,” and found a “vast difference between the public records that might be found after a diligent search of courthouse files, county archives, and local police stations throughout the country and a computerized summary located in a single clearinghouse of information.” (FtT readers will see the parallels to the debates on this blog about PACER and RECAP.) In summary, it found that “practical obscurity” could amount to privacy.

Practical obscurity is an idea that hasn’t gotten much traction in the Courts since National Reporters Committee. But it is an idea well-loved by many privacy scholars, including myself, for whom it helps explain their concerns about the privacy implications of data aggregation and mining of supposedly “public” data.

The Court, of course, may choose a narrow route for affirming or reversing the D.C. Circuit. But if it instead speaks broadly or categorically about the viability of practical obscurity as a legal theory, this case might set a standard that we will be debating for years to come.

What Gets Redacted in Pacer?

In my research on privacy problems in PACER, I spent a lot of time examining PACER documents. In addition to researching the problem of “bad” redactions, I was also interested in learning about the pattern of redactions generally. To this end, my software looked for two redaction styles. One is the “black rectangle” redaction method I described in my previous post. This method sometimes fails, but most of these redactions were done successfully. The more common method (around two-thirds of all redactions) involves replacing sensitive information with strings of XXs.

Out of the 1.8 million documents it scanned, my software identified around 11,000 documents that appeared to have redactions. Many of them could be classified automatically (for example “123-45-xxxx” is clearly a redacted Social Security number, and “Exxon” is a false positive) but I examined several thousand by hand.

Here is the distribution of the redacted documents I found.

Type of Sensitive Information No. of Documents
Social Security number 4315
Bank or other account number 675
Address 449
Trade secret 419
Date of birth 290
Unique identifier other than SSN 216
Name of person 129
Phone, email, IP address 60
National security related 26
Health information 24
Miscellaneous 68
Total 6208

To reiterate the point I made in my last post, I didn’t have access to a random sample of the PACER corpus, so we should be cautious about drawing any precise conclusions about the distribution of redacted information in the entire PACER corpus.

Still, I think we can draw some interesting conclusions from these statistics. It’s reasonable to assume that the distribution of redacted sensitive information is similar to the distribution of sensitive information in general. That is, assuming that parties who redact documents do a decent job, this list gives us a (very rough) idea of what kinds of sensitive information can be found in PACER documents.

The most obvious lesson from these statistics is that Social Security numbers are by far the most common type of redacted information in PACER. This is good news, since it’s relatively easy to build software to automatically detect and redact Social Security numbers.

Another interesting case is the “address” category. Almost all of the redacted items in this category—393 out of 449—appear in the District of Columbia District. Many of the documents relate to search warrants and police reports, often in connection with drug cases. I don’t know if the high rate of redaction reflects the different mix of cases in the DC District, or an idiosyncratic redaction policy voluntarily pursued by the courts and/or the DC police but not by officials in other districts. It’s worth noting that the redaction of addresses doesn’t appear to be required by the federal redaction rules.

Finally, there’s the category of “trade secrets,” which is a catch-all term I used for documents whose redactions appear to be confidential business information. Private businesses may have a strong interest in keeping this information confidential, but the public interest in such secrecy here is less clear.

To summarize, out of 6208 redacted documents, there are 4315 Social Security that can be redacted automatically by machine, 449 addresses whose redaction doesn’t seem to be required by the rules of procedure, and 419 “trade secrets” whose release will typically only harm the party who fails to redact it.

That leaves around 1000 documents that would expose risky confidential information if not properly redacted, or about 0.05 percent of the 1.8 million documents I started with. A thousand documents is worth taking seriously (especially given that there are likely to be tens of thousands in the full PACER corpus). The courts should take additional steps to monitor compliance with the redaction rules and sanction parties who fail to comply with them, and they should explore techniques to automate the detection of redaction failures in these categories.

But at the same time, a sense of perspective is important. This tiny fraction of PACER documents with confidential information in them is a cause for concern, but it probably isn’t a good reason to limit public access to the roughly 99.9 percent of documents that contain no sensitive information and may be of significant benefit to the public.

Thanks again to Carl Malamud and Public.Resource.Org for their support of my research.

Universities in Brazil are too closed to the world, and that's bad for innovation

When Brazilian president Dilma Roussef visited China in the beginning of May, she came back with some good news (maybe too good to be entirely true). Among them, the announcement that Foxconn, the largest maker of electronic components, will invest US$12 billion to open a large industrial plant in the country. The goal is to produce iPads and other key electronic components locally.

The announcement was praised, and made it quickly to the headlines of all major newspapers. There is certainly reason for excitement. Brazil lost important waves of economic development, including industrialization (which only really happened in the 1940´s), or the semiconductor wave, an industry that has shown but a few signs of development in the country until now. (continue reading)

The president´s news also included the announcement that Foxconn would hire 100 thousand employees for the new plant, being 20% of them engineers. The numbers raised skepticism, for various reasons. Not only they seem exaggerated, but Brazil simply does not have 20,000 engineers available for hire. In 2008, the number of engineers in the country was 750,000 and the projection is that if growth rates continue at the same level, a deficit deficit in engineers is expected for the next years.

The situation increases the pressure over universities to train engineers and also to cope with the demands of development and innovation. This is a complex debate, but it is worth focusing on one aspect of the Brazilian university system: its isolation from the rest of the world. In short, Brazilian universities, both in terms of students and faculty, are almost entirely made of Brazilians. As an example, at the University of Sao Paulo (USP), the largest and most important university in the country, only 2,8% of a total 56,000 students are international international. In most other universities the number of international students tend to be even smaller. Regarding faculty, the situation is not different. There have been a few recent efforts by some institutions (mostly private) to increase the number of international professors. But there is still a long way to go.

The low degree of internationalization is already causing problems. For instance, it makes it difficult for Brazilian universities to score well on world ranks. By way of example, no Brazilian university has ever been included in the top 200 universities of the Times Higher Education World Ranking, a ranking that pays especial attention to internationalization efforts.

Even if rankings might not be the main issue, the fact that the university system is essentially inward-looking indeed creates problems, making it harder for innovation. For instance, many of Foxconn’s new plant engineers might end up being hired abroad. If some sort of integration is not established with Brazilian universities, that will consist of a missed opportunity for transferring technology or developing local capacity.

The challenges of integrating such a large operation with universities are huge. Even for small scale cooperation, it turns out that the majority of universities in Brazil are unprepared to deal with international visitors, either students or faculty. For an international professor to be formally hired by a local university, she will have in most cases have to validate her degree in Brazil. The validation process can be Kafkian, requiring lots o paperwork (including “sworn translations”) and time, often months or years. This poses a challenge not only for professors seeking to teach in Brazil, but also to Brazilian who obtained a degree abroad and return home. Local boards of education do not recognize international degrees, regardless if they have been awarded by Princeton or the Free University of Berlin. Students return home formally with the same academic credentials they had before obtaining a degree abroad. The market often recognize the value of the international degrees, but the the university system does not.

The challenges are visible also at the very practical level. Most of universities do not have an office in charge of foreign admissions or international faculty or students. Many professors who venture into the Brazilian university system will go through the process without formal support, counting on the efforts and enthusiasm of local peer professors who undertake the work of dealing with the details of the visit (obtaining a Visa, work permit, or the long bureaucratic steps to get the visitor’s salary actually being paid).

The lack of internationalization is bad innovation. As pointed out by Princeton’s computer science professor Kai Li during a recent conference on technology cooperation between the US and China organized by the Center for Information Technology Policy, the presence of international students and faculty in US universities has been crucial for innovation. Kai emphasizes the importance of maintaining an ecosystem for innovation, which not only attracts the best students to local universities, but help retain them after graduation. Many will work on research, create start-ups or get jobs in the tech industry. The same point was made recently by Lawrence Lessig at his recent G8 talk in France, where he claimed that a great deal of innovation in the US was made by “outsiders”.

Another important aspect of the lack of internationalization in Brazil is the lack of institutional support. Government funding organizations, such as CAPES, CNPQ, Fapesp and others, play an important role. But Brazil still lacks both public and private institutions aimed specifically at promoting integration, Brazilian culture and international exchange (along the lines of Fulbright, the Humboldt Foundation, or institutes like Cervantes, Goethe or the British Council).

As mentioned by Volker Grassmuck, a German media studies professor who spent 18 months as a researcher at the University of Sao Paulo: “The Brazilian funding institutions do have grants for visiting researchers, but the application has to be sent locally by the institution. In the end of my year in Sao Paulo I applied to FAPESP, the research funding age of the Sao Paulo state, but it did not work out, since my research group did not have a research project formalized there”.

He compares the situation with German universities, saying that “when I started teaching at Paderborn University which is a young (funded in 1972) mid-sized (15.000 students) university in a small town, the first time I walked across campus, I heard Indian, Vietnamese, Chinese, Arabic, Turkish and Spanish. At USP during the entire year I never heard anything but Portuguese”. (see Volker’s full interview below)

Of course any internationalization process at this point has to be very well planned. In Brazil, 25% of the universities are public and 75% private. There is still a huge deficit of places for local students, even with the university population growing quite fast in the past 6 years. In 2004 Brazil had 4,1 milllion university students. In 2010, the number reached 6,5 million. However, only 20% of young students in Brazil find a place at the university system, different from the 43% in Chile or 61% in Argentina. The country still struggles to provide access to its own students at universities. But at the same time, the effort of internationalization should not be understood as competing with expanding access. The challenge for Brazil is actually to do both things at the same time: expanding access to local students, and promoting internationalization. If Brazil wants to play a role as an important emerging economy, that´s the way to go (no one said it would be easy!). One thing should not exclude the other.

In this sense, João Victor Issler, an economics professor at EPGE (the Graduate School of Economics at Fundação Getulio Vargas), has a pragmatic view about the issue. He says: “inasmuch as Brazil develops economically, it will inexorably increase the openness of the university system. I am not saying that there should not be specific initiatives to increase internationalization, but an isolated process will be limited. More important than the internationalization of students and faculty is opening the economy to commerce and finance, a process that will directly affect long-term economic development and all its variables: education, innovation and the work force”. João Victor´s point is important. If internationalization follows development, there is already some catch up to do. The country has developed significantly in the past 16 years, but that has not corresponded to any significant improvement in the internationalization of universities.

A few strategies might help achieving more openness on the part of Brazilian universities, without necessarily competing with the goal of expanding access to local students. One of them is the use ICT´s for international collaboration. Another is providing support to what is already working. But there is more that could be done to improve internationalization. Here is a short list:

a) Development organizations such as the World Bank or the Interamerican Development Bank (IDB) can play an important role. Once the internationalization goal is defined, they could provide the necessary support, in partnership with local institutions.

b) Pay attention to the basics: creating specific departments to centralize support for international students and faculty. They should be responsible for the strategy, but also help with practical matters, such as Visa, travel, and coping with the local bureaucracy.

c) The majority of Brazilian universities´ websites are only in Portuguese. Even the webpage of the International Cooperation Commission at the University of Sao Paulo is mostly in Portuguese, and many of the English links are broken.

d) Increase the use of Information and Communication Technologies (ICT´s) as a tool for cooperation and for integrating students and faculty with international projects. Increasing distance learning programs and cooperation mediated by ICT´s is a no-brainer.

e) Create a prize system for internationalization projects, to be awarded every few years to the educational institution that best advanced that goal.

f) Consider a policy-effective tax break to the private sector (which might include private universities), in exchange for developing successful research centers that include an international component.

g) Brazilian organizations funding research should seek to increase support to international researchers and professors who would like to develop projects in Brazil.

h) Regional integration is the low-hanging fruit. Attracting the best students from other Latin American countries is an opportunity to kickstart international cooperation

i) Map what is already in place, identifying what is working in terms of internationalization and supporting its expansion.

j) Brazil needs an innovation research lab. Large investment packages, such as the government support to Foxconn´s new plant should include integration with universities and the creation of a public/private research center, focused on innovation.

Below are the the complete interviews with Volker Grassmuck and João Victor Issler, with their perspectives on the issue.

Interview with Volker Grassmuck

Volker is currently a lecturer at Paderborn University. He spent 18 months in Brazil as a visiting researcher affiliated with the University of Sao Paulo. His visit contributed significantly to the Brazilian copyright reform debate. He partnered with local researchers and law professors (as well as artists and NGO’s) to develop an innovative compensation system for artists, which has become part of the copyright reform debate.

1) How do you think the Brazilian Universities are prepared to receive students and professors/researchers from abroad?

I did not experience any special provisions for foreigners at USP. The inviting professor has to navigate university bureaucracy for the visiting researcher just as for any Brazilian researcher. I did experience a number of bizarre situations, but these were not specific to me, but the same for all in our research group.

E.g.: In order to receive my grant I was forced to open an account with the only bank that has an office on the USP Leste campus. The money from Ford Foundation was already there, and it was exactly the same amount that was supposed to be made over to my account at the same day of the month. But every single month had to remind the person in our group in charge of administrative issues that the money had not arrived. She would then go to the university administration to pick up a check that physically had to be carried to the bank to deposit it there. If the single person in the administration in charge was ill this would be delayed until that person came back.

Another path a foreigner can pursue is to apply for a professorship at a Brazilian university. I looked into this while I was there and got advice from a few people who had actually done this. Prerequisite would be a “revalidating” my German Ph.D. This is a long procedure, requiring originals and copies of diploma, grades etc. authenticated by the Brazilian Consulate, a copy of the dissertation, maybe even a translation into Portuguese, an examination similar to the original Ph.D. examination plus some extras (e.g. “didactics”) that you don’t have at a German university and a fee, in the case of USP, of R$ 1,530.00. In other words, Brazilian academy does not trust Free University of Berlin to issue valid Ph.Ds and requires me to essentially go through the whole Ph.D. procedure all over again. And then I would be able to take a “public competition”, which is yet another procedure unlike anything required by a German university.

2) What is the situation in the German universities? Are they prepared and/or do receive foreign students and professors/researcher?

Being German I have not experienced being a foreign student or researcher here. But here are some impressions: When I started teaching at Paderborn University which is a young (funded in 1972) mid-sized (15.000 students) university in a small town, the first time I walked across campus, I heard Indian, Vietnamese, Chinese, Arabic, Turkish and Spanish. At USP during the entire year I never heard anything but Portuguese, except in the language course where there were people from other Latin American countries, two women from Spain and one visiting researcher from the US. Staff at Paderborn is less international, but once or twice a week there is a presentation by a guest speaker from a university in Europe or beyond.

This is anecdotal, of course. I’m sure objective numbers would show a different picture. The Centrum für Hochschulentwicklung (CHE) does a regular ranking of German universities. It includes their international orientation. This year’s result: the business faculties at universities of applied science are leading with 50%. Only 35% of universities got ranked as being internationally oriented, with sociology and political sciences being the weakest. http://www.che-ranking.de/

I wonder how Brazilian universities would rank by the same standards.

c) Do you think there is a connection between innovation and foreign students at local universities?

No doubt about it. I did see an international orientation is two forms: 1. People read the international literature in the fields I’m interested in in. But without having actual people to enter into a dialogue with this often remains a reproduction or at best an application of innovations to Brazil. 2. People travel and study abroad. A few students and professors travel extensively. Some students from our group went to Bolivia, Mozambique, France during my year there. So there is a certain internationalization „from Brazil” but my overwhelming impression was that there is very little academic internationalization „of Brazil.”

Interview with João Victor Issler

Joao Victor Issler is an economics professor at the Fundacao Getulio Vargas Graduate School of Economics, who has been been closely following the recent internationalization efforts. His full bio here.

a) How do you see the presence of international students and faculty at the Brazilian universities?

The presence of of both is quite rare. There are a few isolated efforts here and there by a few groups. For example, in Economics, we have PUC-Rio (Pontifical Catholic University at Rio) in Economics and IMPA (National Institute for Pure and Applied Mathematics) who have at their masters and Ph.D. level students from Argentina, Chile, Peru etc. Our school, EPGE (FGV Graduate Scool of Economics) hires professors outside Brazil, but we do not have specific incentives for international students. Beyond Economics, I know that the University of Sao Paulo is seeking to attract international students, but it is hard to tell at what schools and how many

b) Foxcoon announced it will open a new plant in Brazil, and will hire 20,000 engineers for that. We clearly don´t have that many engineers. Do you think that the internationalization of universities could help the country to build better capacity for developing its tech-industry?

These numbers announced cannot be trusted. In any way, the general perception is that there is a deficit of engineers in Brazil. The tech-market, however, is an endogenous variable, correlated to our GDP per capita, the level of education of the working force, number of houses with access to drinkable water, infrastructure, etc. Inasmuch as Brazil develops economically, it will inexorably increase the openness of the university system. I am not saying that there should not be specific initiatives to increase internationalization, but an isolated process will be limited. More important than the internationalization of students and faculty is opening the economy to commerce and finance, a process that will directly affect long-term economic development and all its variables: education, innovation and the work force.

c) In other countries, there are institutions such as the Goethe Institute, or the Humboldt Foundation in Germany, that end up attracting international talents. The same goes for the US, with the Fulbright program. Why not in Brazil?

Germany and other European countries face problems due to the shape of their age demographic pyramid, whose base is small compared to the top. They have a better capacity to offer places in the university, that go beyond German students. Thus, it is possible to attract international students, in order to fill the present capacity. It is hard to say how this structure will evolve. They might reduce the installed capacity, or increase the search for international students. And they are looking for Brazilian students, for instance, especially engineers. Generally, developed countries tend to attract good students (and wealthier) than the developing countries, what explains this movement towards Germany, the US or Canada. To me, the US are the most important model regarding the higher education industry. In the beginning of the 20th Century, there were already many Japanes and Chinese students at universities in the US and Europe. With the development of Japan, this movement decreased in the end of the Century. Brazil today (for instance, the University of Sao Paulo) attrachs a few good students from Latin America. And it could attract more if we develop faster than the rest of the region. In Brazil, CAPES (for which I was an advisor until recently) plays a similar role than the institutions you mentioned. They are engaged in several bilateral agreements for students and professors. This openness is certainly positive. For students and professors, it is important to consider the hierarchy and quality: the best students tend to go to the US and Europe. We end up with the midle, and others go to countries where the development level is lower. As I mentioned, I don’t believe it is possible to change this pattern unilaterally, unless we want to apply huge public resources on that. In my view, it is not a priority, given the current levels of subsidies already applied to higher education in comparison with fundamental education in Brazil.

d) In your opinion, and considering the experience of EPGE, what are the advantages or disadvantages of increasing interationalization at Brazilian universities? Would that reduce space for Brazilians?

Increasing the universe of choice always improves the final results. Therefore, I see only advantages and I don’t see how we can be against internationalization. However, as I mentioned, I believe that an unilateral process will be limited to change higher education in Brazil (and also its impact on innovation and technology). Openning universities might not reduce the places for Brazilians, provided it is an organized and planned movement, correlated to our development level. If it is unilateral, then there can be indeed a loss for Brazilian students and professors.

e) Finally, do you see a relation between innovation and the internationalization of universities?

Yes, I do think the relation is positive between the two variables, but I don’t think it is possible to take any of them as isolated variables.

Deceptive Assurances of Privacy?

Earlier this week, Facebook expanded the roll-out of its facial recognition software to tag people in photos uploaded to the social networking site. Many observers and regulators responded with privacy concerns; EFF offered a video showing users how to opt-out.

Tim O’Reilly, however, takes a different tack:

Face recognition is here to stay. My question is whether to pretend that it doesn’t exist, and leave its use to government agencies, repressive regimes, marketing data mining firms, insurance companies, and other monolithic entities, or whether to come to grips with it as a society by making it commonplace and useful, figuring out the downsides, and regulating those downsides.

…We need to move away from a Maginot-line like approach where we try to put up walls to keep information from leaking out, and instead assume that most things that used to be private are now knowable via various forms of data mining. Once we do that, we start to engage in a question of what uses are permitted, and what uses are not.

O’Reilly’s point –and face-recognition technology — is bigger than Facebook. Even if Facebook swore off the technology tomorrow, it would be out there, and likely used against us unless regulated. Yet we can’t decide on the proper scope of regulation without understanding the technology and its social implications.

By taking these latent capabilities (Riya was demonstrating them years ago; the NSA probably had them decades earlier) and making them visible, Facebook gives us more feedback on the privacy consequences of the tech. If part of that feedback is “ick, creepy” or worse, we should feed that into regulation for the technology’s use everywhere, not just in Facebook’s interface. Merely hiding the feature in the interface, while leaving it active in the background would be deceptive: it would give us a false assurance of privacy. For all its blundering, Facebook seems to be blundering in the right direction now.

Compare the furor around Dropbox’s disclosure “clarification”. Dropbox had claimed that “All files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password,” but recently updated that to the weaker assertion: “Like most online services, we have a small number of employees who must be able to access user data for the reasons stated in our privacy policy (e.g., when legally required to do so).” Dropbox had signaled “encrypted”: absolutely private, when it meant only relatively private. Users who acted on the assurance of complete secrecy were deceived; now those who know the true level of relative secrecy can update their assumptions and adapt behavior more appropriately.

Privacy-invasive technology and the limits of privacy-protection should be visible. Visibility feeds more and better-controlled experiments to help us understand the scope of privacy, publicity, and the space in between (which Woody Hartzog and Fred Stutzman call “obscurity” in a very helpful draft). Then, we should implement privacy rules uniformly to reinforce our social choices.

New Research Result: Bubble Forms Not So Anonymous

Today, Joe Calandrino, Ed Felten and I are releasing a new result regarding the anonymity of fill-in-the-bubble forms. These forms, popular for their use with standardized tests, require respondents to select answer choices by filling in a corresponding bubble. Contradicting a widespread implicit assumption, we show that individuals create distinctive marks on these forms, allowing use of the marks as a biometric. Using a sample of 92 surveys, we show that an individual’s markings enable unique re-identification within the sample set more than half of the time. The potential impact of this work is as diverse as use of the forms themselves, ranging from cheating detection on standardized tests to identifying the individuals behind “anonymous” surveys or election ballots.

If you’ve taken a standardized test or voted in a recent election, you’ve likely used a bubble form. Filling in a bubble doesn’t provide much room for inadvertent variation. As a result, the marks on these forms superficially appear to be largely identical, and minor differences may look random and not replicable. Nevertheless, our work suggests that individuals may complete bubbles in a sufficiently distinctive and consistent manner to allow re-identification. Consider the following bubbles from two different individuals:

These individuals have visibly different stroke directions, suggesting a means of distinguishing between both individuals. While variation between bubbles may be limited, stroke direction and other subtle features permit differentiation between respondents. If we can learn an individual’s characteristic features, we may use those features to identify that individual’s forms in the future.

To test the limits of our analysis approach, we obtained a set of 92 surveys and extracted 20 bubbles from each of those surveys. We set aside 8 bubbles per survey to test our identification accuracy and trained our model on the remaining 12 bubbles per survey. Using image processing techniques, we identified the unique characteristics of each training bubble and trained a classifier to distinguish between the surveys’ respondents. We applied this classifier to the remaining test bubbles from a respondent. The classifier orders the candidate respondents based on the perceived likelihood that they created the test markings. We repeated this test for each of the 92 respondents, recording where the correct respondent fell in the classifier’s ordered list of candidate respondents.

If bubble marking patterns were completely random, a classifier could do no better than randomly guessing a test set’s creator, with an expected accuracy of 1/92 ? 1%. Our classifier achieves over 51% accuracy. The classifier is rarely far off: the correct answer falls in the classifier’s top three guesses 75% of the time (vs. 3% for random guessing) and its top ten guesses more than 92% of the time (vs. 11% for random guessing). We conducted a number of additional experiments exploring the information available from marked bubbles and potential uses of that information. See our paper for details.

Additional testing—particularly using forms completed at different times—is necessary to assess the real-world impact of this work. Nevertheless, the strength of these preliminary results suggests both positive and negative implications depending on the application. For standardized tests, the potential impact is largely positive. Imagine that a student takes a standardized test, performs poorly, and pays someone to repeat the test on his behalf. Comparing the bubble marks on both answer sheets could provide evidence of such cheating. A similar approach could detect third-party modification of certain answers on a single test.

The possible impact on elections using optical scan ballots is more mixed. One positive use is to detect ballot box stuffing—our methods could help identify whether someone replaced a subset of the legitimate ballots with a set of fraudulent ballots completed by herself. On the other hand, our approach could help an adversary with access to the physical ballots or scans of them to undermine ballot secrecy. Suppose an unscrupulous employer uses a bubble form employment application. That employer could test the markings against ballots from an employee’s jurisdiction to locate the employee’s ballot. This threat is more realistic in jurisdictions that release scans of ballots.

Appropriate mitigation of this issue is somewhat application specific. One option is to treat surveys and ballots as if they contain identifying information and avoid releasing them more widely than necessary. Alternatively, modifying the forms to mask marked bubbles can remove identifying information but, among other risks, may remove evidence of respondent intent. Any application demanding anonymity requires careful consideration of options for preventing creation or disclosure of identifying information. Election officials in particular should carefully examine trade-offs and mitigation techniques if releasing ballot scans.

This work provides another example in which implicit assumptions resulted in a failure to recognize a link between the output of a system (in this case, bubble forms or their scans) and potentially sensitive input (the choices made by individuals completing the forms). Joe discussed a similar link between recommendations and underlying user transactions two weeks ago. As technologies advance or new functionality is added to systems, we must explicitly re-evaluate these connections. The release of scanned forms combined with advances in image analysis raises the possibility that individuals may inadvertently tie themselves to their choices merely by how they complete bubbles. Identifying such connections is a critical first step in exploiting their positive uses and mitigating negative ones.

This work will be presented at the 2011 USENIX Security Symposium in August.