May 2, 2024

Archives for 2009

Netflix's Impending (But Still Avoidable) Multi-Million Dollar Privacy Blunder

In my last post, I had promised to say more about my article on the limits of anonymization and the power of reidentification. Although I haven’t said anything for a few weeks, others have, and I especially appreciate posts by Susannah Fox, Seth Schoen, and Nate Anderson. Not only have these people summarized my article well, they have also added a lot of insightful commentary, and I commend these three posts to you.

Today brings news relating to one of the central examples in my paper: Netflix has announced plans to commit a privacy blunder that could cost it millions of dollars in fines and civil damages.

In my article, I focus on Netflix’s 2006 decision to release millions of records containing the movie rating preferences of “anonymized” users to the public, in order to fuel a crowd-sourcing competition called the Netflix Prize. The Netflix Prize has been a huge win for Netflix’s public relations, but it has also been a win for academics, who have used the data to improve the science of guessing human behavior from past preferences.

The Netflix Prize was also a watershed event for reidentification research because Arvind Narayanan and Vitaly Shmatikov of U. Texas revealed that they could reidentify some of the “anonymized” users with ease, proving that we are more uniquely tied to our movie rating preferences than intuition would suggest. In my paper, I argue that we should worry about this privacy breach even if we don’t think movie ratings are terribly sensitive, because it can be used to enable other, more terrifying privacy breaches.

I never argue, however, that Netflix deserves punishment or sanction for having released this data. In my opinion, Netflix acted pretty responsibly. It consulted with computer scientists in a (failed) attempt to anonymize successfully. It tried perturbing the data in order to make reidentification harder. And other experts seem to have been surprised by how easy it was for Narayanan and Shmatikov to reidentify. Even with the benefit of hindsight, I find nothing to blame in how Netflix handled the privacy implications of what it did.

Although I give Netflix a pass for its past privacy breach, I am astonished to learn from the New York Times that the company plans a second act:

The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals’ “taste profiles,” the company said. The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.

Netflix should cancel this new, irresponsible contest, which it has dubbed Netflix Prize 2. Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney’s famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of “information entropy”: even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.

I have no doubt that researchers will be able to use the techniques of Narayanan and Shmatikov, together with databases revealing sex, zip code, and age, to tie many people directly to these supposedly anonymized new records.

Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a “video tape service provider” (a broadly defined term) from revealing “personally identifiable information” about its customers. Aggrieved customers can sue providers under the VPPA and courts can order “not less than $2500” in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.

Additionally, the FTC might also decide to fine Netflix for violating its privacy policy as an unfair business practice.

Either a lawsuit under the VPPA or an FTC investigation would turn, in large part, on one sentence in Netflix’s privacy policy: “We may also disclose and otherwise use, on an anonymous basis, movie ratings, consumption habits, commentary, reviews and other non-personal information about customers.” If sued or investigated, Netflix will surely argue that its acts are immunized by the policy, because the data is disclosed “on an anonymous basis.” While this argument might have carried the day in 2006, before Narayanan and Shmatikov conducted their study, the argument is much weaker in 2009, now that Netflix has many reasons to know better, including in part, my paper and the publicity surrounding it. A weak argument is made even weaker if Netflix includes the kind of data–ZIP code, age, and gender–that we have known for over a decade fails to anonymize.

The good news is Netflix has time to avoid this multi-million dollar privacy blunder. As far as I can tell, the Netflix Prize 2 has not yet been launched.

Dear Netflix executives: Don’t do this to your customers, and don’t do this to your shareholders. Cancel the Netflix Prize 2, while you still have the chance.

Improving the Government's User Interface

The White House’s attempts to gather input from citizens have hit some bumps, wrote Anand Giridharadas recently in the New York Times. This administration has done far more than its predecessors to let citizens provide input directly to government via the Internet, but they haven’t always received the input they expected. Giridharadas writes:

During the transition, the administration created an online “Citizen’s Briefing Book” for people to submit ideas to the president…. They received 44,000 proposals and 1.4 million votes for those proposals. The results were quietly published, but they were embarrassing…

In the middle of two wars and an economic meltdown, the highest-ranking idea was to legalize marijuana, an idea nearly twice as popular as repealing the Bush tax cuts on the wealthy. Legalizing online poker topped the technology ideas, twice as popular as nationwide wi-fi. Revoking the Church of Scientology’s tax-exempt status garnered three times more votes than raising funding for childhood cancer.

Once in power, the White House crowdsourced again. In March, its Office of Science and Technology Policy hosted an online “brainstorm” about making government more transparent. Good ideas came; but a stunning number had no connection to transparency, with many calls for marijuana legalization and a raging (and groundless) debate about the authenticity of President Obama’s birth certificate.

It’s obvious what happened: relatively small groups of highly motivated people visited the site, and their input outweighed the discussion of more pressing national issues. This is not a new phenomenon — there’s a long history of organized groups sending letters out of proportion with their numbers.

Now, these groups obviously have the right to speak, and the fact that some groups proved to be better organized and motivated than others is useful information for policymakers to have. But if that is all that policymakers learn, we have lost an important opportunity. Government needs to hear from these groups, but it needs to hear from the rest of the public too.

It’s tempting to decide that this is inevitable, and that online harvesting of public opinion will have little value. But I think that goes too far.

What the administration’s experience teaches, I think, is that measuring public opinion online is difficult, and that the most obvious measurement methods can run into trouble. Instead of giving up, the best response is to think harder about how to gather information and how to analyze the information that is available. What works for a small, organized group, or even a political campaign, won’t necessarily work for the United States as a whole. What we need are new interfaces, new analysis methods, and experiments to reveal what tends to work.

Designing user interfaces is almost always harder than it looks. Designing the user interface of government is an enormous challenge, but getting it right can yield enormous benefits.

NY Times Should Report on NY Times Ad Malware

Yesterday morning, while reading the New York Times online, I was confronted with an attempted security attack, apparently delivered through an advertisement. A window popped up, mimicking an antivirus scanner. After “scanning” my computer, it reported finding viruses and invited me to download a free antivirus scanner. The displays implied, without quite saying so, that the messages came from my antivirus vendor and that the download would come from there too. Knowing how these things work, I recognized it right away as an attack, probably carried by an ad. So I didn’t click on anything, and I’m fairly certain my computer wasn’t infected.

I wasn’t the only person who saw this attack. The Times posted a brief note on its site yesterday, and followed up today with a longer blog post.

What is interesting about the Times’s response is that it consists of security warnings, rather than journalism. Security warnings are good as far as they go; the Times owed that much to its users, at least. But it’s also newsworthy that a major, respected news site was facilitating cybercrime, even unintentionally. Somebody should report on this story — and who better than the Times itself?

It’s probably an interesting story, involving the ugly underside of the online ad business. Most likely, ad space in the Times was sold and, presumably, resold to an actual attacker; or a legitimate ad placement service was penetrated. Either way, other people are at risk of the same attack. Even better, the story opens issues such as the difficulties of securing the web, what vendors are doing to improve matters, what the bad buys are trying to achieve, and what happens to the victims.

An enterprising technology reporter might find a fascinating story here — and it’s right under the noses of the Times staff. Let’s hope they jump on it.

UPDATE (Sept. 15): As Barry points out in the comments below, the Times wrote a good article the day after this post appeared. It turns out that the booby-trapped ad was not sold through an ad network, as one might have expected. Instead, the ad space was sold directly by the Times, to a party who was pretending to be Vonage. The perpetrators ran Vonage ads for a while, then switched over to serving the malicious ads.

Finnish Court Orders Re-Vote After E-Voting Snafu

The Supreme Administrative Court of Finland has ruled that three municipal elections, the first in Finland to use electronic voting, must be redone because of voting machine problems. (English summary; ruling in Finnish)

The troubles started with a usability problem, which caused 232 voters (about 2% of voters) to leave the voting booth without fully casting their ballots. Electronic Frontiers Finland explains what went wrong:

It seems that the system required the voter to insert a smart card to identify the voter, type in their selected candidate number, then press “ok”, check the candidate details on the screen, and then press “ok” again. Some voters did not press “ok” for the second time, but instead removed their smart card from the voting terminal prematurely, causing their ballots not to be cast.

This usability issue was exacerbated by Ministry of Justice instructions, which specifically said that in order to cancel the voting process, the user should click on “cancel” and after that, remove the smart card. Thus, some voters did not realise that their vote had not been registered.

If you want to see what this looks like for a voter, check out the online demo of the voting process, from the Finnish Ministry of Justice (in English).

Well designed voting systems tend to have a prominent, clearly labeled control or action that the voter uses to officially cast his or her vote. This might be a big red “CAST VOTE” button. The Finnish system mistakenly used the same “OK” button used previously in the process, making voter mistakes more likely. Adding to the problem, the voter’s smart card was protruding from the front of the machine, making it all too easy for a voter to grab the card and walk away.

No voting machine can stop a “fleeing voter” scenario, where a voter simply walks away during the voting process (we conventionally say “fleeing” even if the voter leaves by mistake), but some systems are much better than others in this respect. Diebold’s touchscreen voting machines, for all their faults, got this design element right, pulling the voter’s smart card all of the way into the machine and ejecting it only when the voter was supposed to leave — thus turning the voter’s desire to return the smart card into a countermeasure against premature voter departure, rather than a cause of it. (ATM machines often use this same trick of holding the card inside the machine to stop the user from grabbing the card and walking away at the wrong time.) Some older lever machines use an even simpler method against fleeing voters: the same big red handle that casts the ballot also opens the curtains so the voter can leave.

I’d be curious to know what the rules are about fleeing voters in Finland. I know that New Jersey procedures say that if a voter leaves without performing the final step of pushing the “Cast Vote” button, poll workers are supposed to push the button on the voter’s behalf (without looking at the voter’s choices). Crucially, the design of the New Jersey voting machine (for all its faults) makes it almost certain that such a non-cast ballot will be discovered promptly — the machine makes a noise when the ballot is cast, and the machine will complain if the poll worker tries to enable the next voter’s ballot before the previous voter’s ballot has been cast.

It seems likely that the Finnish machine, in addition to its usability problems that led to fleeing voters, had other design/process problems that made a non-completed ballot less noticeable to poll workers. (I don’t know this for sure; the answer isn’t in any English-language document I have seen.)

Fortunately, the damage was not as bad as it might have been, because the e-voting system was used in only three municipalities, as a pilot program, rather than nationwide. Presumably, nationwide use of the flawed system is now unlikely.

Consolidation in E-Voting Market: ES&S Buys Premier

Yesterday Diebold sold its e-voting division, known as Premier Election Systems, to ES&S, one of Premier’s competitors. The price was low: about $5 million.

ES&S is reportedly the largest e-voting company, and Premier was the second-largest, so the deal represents a substantial consolidation in the market. The odds of one major e-voting company breaking from the pack and embracing up-to-date security engineering are now even slimmer than before. Premier had seemed like the company most likely to change its ways.

The sale represents the end of an embarrassing era for Diebold. The company must have had high hopes when it first bought a small e-voting company, but the new Diebold e-voting division never approached the parent companies standards for security and product quality. Over time the small e-voting division became an embarrassment, and the parent company distanced itself by renaming the division from Diebold to Premier and publicizing the division’s independence. Now Diebold is finally rid of its e-voting division and can return to doing what it does relatively well.