September 24, 2017

Archives for June 2011

Deceptive Assurances of Privacy?

Earlier this week, Facebook expanded the roll-out of its facial recognition software to tag people in photos uploaded to the social networking site. Many observers and regulators responded with privacy concerns; EFF offered a video showing users how to opt-out.

Tim O’Reilly, however, takes a different tack:

Face recognition is here to stay. My question is whether to pretend that it doesn’t exist, and leave its use to government agencies, repressive regimes, marketing data mining firms, insurance companies, and other monolithic entities, or whether to come to grips with it as a society by making it commonplace and useful, figuring out the downsides, and regulating those downsides.

…We need to move away from a Maginot-line like approach where we try to put up walls to keep information from leaking out, and instead assume that most things that used to be private are now knowable via various forms of data mining. Once we do that, we start to engage in a question of what uses are permitted, and what uses are not.

O’Reilly’s point –and face-recognition technology — is bigger than Facebook. Even if Facebook swore off the technology tomorrow, it would be out there, and likely used against us unless regulated. Yet we can’t decide on the proper scope of regulation without understanding the technology and its social implications.

By taking these latent capabilities (Riya was demonstrating them years ago; the NSA probably had them decades earlier) and making them visible, Facebook gives us more feedback on the privacy consequences of the tech. If part of that feedback is “ick, creepy” or worse, we should feed that into regulation for the technology’s use everywhere, not just in Facebook’s interface. Merely hiding the feature in the interface, while leaving it active in the background would be deceptive: it would give us a false assurance of privacy. For all its blundering, Facebook seems to be blundering in the right direction now.

Compare the furor around Dropbox’s disclosure “clarification”. Dropbox had claimed that “All files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password,” but recently updated that to the weaker assertion: “Like most online services, we have a small number of employees who must be able to access user data for the reasons stated in our privacy policy (e.g., when legally required to do so).” Dropbox had signaled “encrypted”: absolutely private, when it meant only relatively private. Users who acted on the assurance of complete secrecy were deceived; now those who know the true level of relative secrecy can update their assumptions and adapt behavior more appropriately.

Privacy-invasive technology and the limits of privacy-protection should be visible. Visibility feeds more and better-controlled experiments to help us understand the scope of privacy, publicity, and the space in between (which Woody Hartzog and Fred Stutzman call “obscurity” in a very helpful draft). Then, we should implement privacy rules uniformly to reinforce our social choices.

New Research Result: Bubble Forms Not So Anonymous

Today, Joe Calandrino, Ed Felten and I are releasing a new result regarding the anonymity of fill-in-the-bubble forms. These forms, popular for their use with standardized tests, require respondents to select answer choices by filling in a corresponding bubble. Contradicting a widespread implicit assumption, we show that individuals create distinctive marks on these forms, allowing use of the marks as a biometric. Using a sample of 92 surveys, we show that an individual’s markings enable unique re-identification within the sample set more than half of the time. The potential impact of this work is as diverse as use of the forms themselves, ranging from cheating detection on standardized tests to identifying the individuals behind “anonymous” surveys or election ballots.

If you’ve taken a standardized test or voted in a recent election, you’ve likely used a bubble form. Filling in a bubble doesn’t provide much room for inadvertent variation. As a result, the marks on these forms superficially appear to be largely identical, and minor differences may look random and not replicable. Nevertheless, our work suggests that individuals may complete bubbles in a sufficiently distinctive and consistent manner to allow re-identification. Consider the following bubbles from two different individuals:

These individuals have visibly different stroke directions, suggesting a means of distinguishing between both individuals. While variation between bubbles may be limited, stroke direction and other subtle features permit differentiation between respondents. If we can learn an individual’s characteristic features, we may use those features to identify that individual’s forms in the future.

To test the limits of our analysis approach, we obtained a set of 92 surveys and extracted 20 bubbles from each of those surveys. We set aside 8 bubbles per survey to test our identification accuracy and trained our model on the remaining 12 bubbles per survey. Using image processing techniques, we identified the unique characteristics of each training bubble and trained a classifier to distinguish between the surveys’ respondents. We applied this classifier to the remaining test bubbles from a respondent. The classifier orders the candidate respondents based on the perceived likelihood that they created the test markings. We repeated this test for each of the 92 respondents, recording where the correct respondent fell in the classifier’s ordered list of candidate respondents.

If bubble marking patterns were completely random, a classifier could do no better than randomly guessing a test set’s creator, with an expected accuracy of 1/92 ? 1%. Our classifier achieves over 51% accuracy. The classifier is rarely far off: the correct answer falls in the classifier’s top three guesses 75% of the time (vs. 3% for random guessing) and its top ten guesses more than 92% of the time (vs. 11% for random guessing). We conducted a number of additional experiments exploring the information available from marked bubbles and potential uses of that information. See our paper for details.

Additional testing—particularly using forms completed at different times—is necessary to assess the real-world impact of this work. Nevertheless, the strength of these preliminary results suggests both positive and negative implications depending on the application. For standardized tests, the potential impact is largely positive. Imagine that a student takes a standardized test, performs poorly, and pays someone to repeat the test on his behalf. Comparing the bubble marks on both answer sheets could provide evidence of such cheating. A similar approach could detect third-party modification of certain answers on a single test.

The possible impact on elections using optical scan ballots is more mixed. One positive use is to detect ballot box stuffing—our methods could help identify whether someone replaced a subset of the legitimate ballots with a set of fraudulent ballots completed by herself. On the other hand, our approach could help an adversary with access to the physical ballots or scans of them to undermine ballot secrecy. Suppose an unscrupulous employer uses a bubble form employment application. That employer could test the markings against ballots from an employee’s jurisdiction to locate the employee’s ballot. This threat is more realistic in jurisdictions that release scans of ballots.

Appropriate mitigation of this issue is somewhat application specific. One option is to treat surveys and ballots as if they contain identifying information and avoid releasing them more widely than necessary. Alternatively, modifying the forms to mask marked bubbles can remove identifying information but, among other risks, may remove evidence of respondent intent. Any application demanding anonymity requires careful consideration of options for preventing creation or disclosure of identifying information. Election officials in particular should carefully examine trade-offs and mitigation techniques if releasing ballot scans.

This work provides another example in which implicit assumptions resulted in a failure to recognize a link between the output of a system (in this case, bubble forms or their scans) and potentially sensitive input (the choices made by individuals completing the forms). Joe discussed a similar link between recommendations and underlying user transactions two weeks ago. As technologies advance or new functionality is added to systems, we must explicitly re-evaluate these connections. The release of scanned forms combined with advances in image analysis raises the possibility that individuals may inadvertently tie themselves to their choices merely by how they complete bubbles. Identifying such connections is a critical first step in exploiting their positive uses and mitigating negative ones.

This work will be presented at the 2011 USENIX Security Symposium in August.

Tinkering with the IEEE and ACM copyright policies

It’s historically been the case that papers published in an IEEE or ACM conference or journal must have their copyrights assigned to the IEEE or ACM, respectively. Most of us were happy with this sort of arrangement, but the new IEEE policy seems to apply more restrictions on this process. Matt Blaze blogged about this issue in particular detail.

The IEEE policy and the comparable ACM policy appear to be focused on creating revenue opportunities for these professional societies. Hypothetically, that income should result in cost savings elsewhere (e.g., lower conference registration fees) or in higher quality member services (e.g., paying the expenses of conference program committee members to attend meetings). In practice, neither of these are true. Regardless, our professional societies work hard to keep a paywall between our papers and their readership. Is this sort of behavior in our best interests? Not really.

What benefits the author of an academic paper? In a word, impact. Papers that are more widely read are more widely influential. Furthermore, widely read papers are more widely cited; citation counts are explicitly considered in hiring, promotion, and tenure cases. Anything that gets in the way of a paper’s impact is something that damages our careers and it’s something we need to fix.

There are three common solutions. First, we ignore the rules and post copies of our work on our personal, laboratory, and/or departmental web pages. Virtually any paper written in the past ten years can be found online, without cost, and conveniently cataloged by sites like Google Scholar. Second, some authors I’ve spoken to will significantly edit the copyright assignment forms before submitting them. Nobody apparently ever notices this. Third, some professional societies, notably the USENIX Association, have changed their rules. The USENIX policy completely inverts the relationship between author and publisher. Authors grant USENIX certain limited and reasonable rights, while the authors retain copyright over their work. USENIX then posts all the papers on its web site, free of charge; authors are free to do the same on their own web sites.

(USENIX ensures that every conference proceedings has a proper ISBN number. Every USENIX paper is just as “published” as a paper in any other conference, even though printed proceedings are long gone.)

Somehow, the sky hasn’t fallen. So far as I know, the USENIX Association’s finances still work just fine. Perhaps it’s marginally more expensive to attend a USENIX conference, but then the service level is also much higher. The USENIX professional staff do things that are normally handled by volunteer labor at other conferences.

This brings me to the vote we had last week at the IEEE Symposium on Security and Privacy (the “Oakland” conference) during the business meeting. We had an unusually high attendance (perhaps 150 out of 400 attendees) as there were a variety of important topics under discussion. We spent maybe 15 minutes talking about the IEEE’s copyright policy and the resolution before the room was should we reject the IEEE copyright policy and adopt the USENIX policy? Ultimately, there were two “no” votes and everybody else voted “yes.” That’s an overwhelming statement.

The question is what happens next. I’m planning to attend ACM CCS this October in Chicago and I expect we can have a similar vote there. I hope similar votes can happen at other IEEE and ACM conferences. Get it on the agenda of your business meetings. Vote early and vote often! I certainly hope the IEEE and ACM agree to follow the will of their membership. If the leadership don’t follow the membership, then we’ve got some more interesting problems that we’ll need to solve.

Sidebar: ACM and IEEE make money by reselling our work, particularly with institutional subscriptions to university libraries and large companies. As an ACM or IEEE member, you also get access to some, but not all, of the online library contents. If you make everything free (as in free beer), removing that revenue source, then you’ve got a budget hole to fill. While I’m no budget wizard, it would make sense for our conference registration fees to support the archival online storage of our papers. Add in some online advertising (example: startup companies, hungry to hire engineers with specialized talents, would pay serious fees for advertisements adjacent to research papers in the relevant areas), and I’ll bet everything would work out just fine.