December 9, 2022

A Good Day for Email Privacy: A Court Takes Back its Earlier, Bad Ruling in Rehberg v. Paulk

In March, the U.S. Court of Appeals for the Eleventh Circuit, the court that sets federal law for Alabama, Florida, and Georgia, ruled in an opinion in a case called Rehberg v. Paulk that people lacked a reasonable expectation of privacy in the content of email messages stored with an email provider. This meant that the police in those three states were free to ignore the Fourth Amendment when obtaining email messages from a provider. In this case, the plaintiff alleged that the District Attorney had used a sham subpoena to trick a provider to hand over the plaintiff’s email messages. The Court ruled that the DA was allowed to do this, consistent with the Constitution.

I am happy to report that today, the Court vacated the opinion and replaced it with a much more carefully reasoned, nuanced opinion.

Most importantly, the Eleventh Circuit no longer holds that “A person also loses a reasonable expectation of privacy in emails, at least after the email is sent to and received by a third party.” nor that “Rehberg’s voluntary delivery of emails to third parties constituted a voluntary relinquishment of the right to privacy in that information.” These bad statements of law have effectively been erased from the court reporters.

This is a great victory for Internet privacy, although it could have been even better. The Court no longer strips email messages of protection, but it didn’t go further and affirmatively hold that email users possess a Fourth Amendment right to privacy in email. Instead, the Court ruled that even if such a right exists, it wasn’t “clearly established,” at the time the District Attorney acted, which means the plaintiff can’t continue to pursue this claim.

I am personally invested in this case because I authored a brief asking the Court to reverse its earlier bad ruling. I am glad the Court agreed with us and thank all of the other law professors who signed the brief: Susan Brenner, Susan Freiwald, Stephen Henderson, Jennifer Lynch, Deirdre Mulligan, Joel Reidenberg, Jason Schultz, Chris Slobogin, and Dan Solove. Thanks also to my incredibly hard-working and talented research assistants, Nicole Freiss and Devin Looijien.

Updated: The EFF (which represents the plaintiff) is much more disappointed in the amended opinion than I. They make a lot of good points, but I prefer to see the glass half-full.

The Gizmodo Warrant: Searching Journalists in the Terabyte Age

Last Friday night, police officers in California used a warrant to search the home of Jason Chen, the Gizmodo blogger who wrote about the iPhone prototype found in a Redwood City bar. Orin Kerr has written an interesting post assessing the legality of the search. I wanted to touch on an important issue he didn’t discuss: Whether the search the police are conducting is unconstitutionally overbroad.

Orin discusses two laws that specifically shield journalists from being the target of a search, the California Reporter’s Shield Law, found jointly at California Penal Code 1524(g) and California Evidence Code 1070, and the federal Privacy Protection Act (PPA), 42 U.S.C. 2000aa. Both laws were written to limit the impact of Zurcher v. Stanford Daily, a U.S. Supreme Court case authorizing the use of a warrant to search a newspaper’s offices. The Supreme Court decided Zurcher in 1978, and Congress enacted the PPA in 1980 (and amended it in unrelated ways in 1996). I’m not sure when the California law was enacted, but I bet it’s of similar vintage. In other words, all of the rules that govern police searches of news offices were created in the age of typewriters, desks, filing cabinets, and stacks of paper.

Now, flash forward thirty years. The police who searched Jason Chen’s home seized the following: A macbook, HP server, two Dell desktop computers, iPad, ThinkPad, two MacBook Pros, IOmega NAS, three external hard drives, and three flash drives. They also seized other storage-containing devices, including two digital cameras and two smart phones. If Jason Chen’s computing habits are anything like mine, the police likely seized many terabytes of disk space, storing hundreds of thousands (millions?) of files, containing information stretching back years. And they took all of this information to investigate an alleged crime (the sale of the iPhone prototype) that could not have happened more than 37 days before the search (the iPhone was found on March 18th), which they learned about from a blog post published four days before the search.

I’m deeply concerned about overbreadth as the police begin to search through these terabytes of information. The police now possess, intermingled with the evidence of the alleged crime they are investigating, hundreds of thousands of documents belonging to a journalist/blogger that are utterly irrelevant to their investigation. Jason Chen has been blogging for Gizmodo since 2006, and he’s probably written hundreds of stories. The police likely have thousands of email messages revealing confidential sources, detailing meetings, and trading comments with editors, and thousands of other documents bearing notes from interviews, drafts of articles, and other sensitive information. Because of Chen’s beat, some of these documents probably reveal secrets of great economic and business value in the Silicon Valley. Under traditional, outmoded Fourth Amendment rules, the police can read every single document they possess, so long as they intend only to look for evidence of the crime, and under the “plain view rule,” they can use any evidence they find of other, unrelated crimes in court against Chen or anyone else.

If the California state courts share my concerns about overbreadth, they should consider embracing the very sensible rules for search warrants for computer hard drives (in any case, not just those involving journalists) adopted last year by the Ninth Circuit in United States v. Comprehensive Drug Testing. To paraphrase, in cases involving the search and seizure of computers, the Ninth Circuit requires five things: (1) the government must waive the plain view rule, meaning they must agree not to use evidence of crimes other than the one under investigation that led to the warrant; (2) the government must wall off the forensic experts who search the hard drive from the investigating the case; (3) the government must explain the “actual risks of destruction of information” they would face if they weren’t allowed to seize entire computers; (4) the government must use a search protocol to designate what information they can give to the investigating agents; and (5) the government must destroy or return non-responsive data.

These rules are especially needed when the target of a police search is a journalist (in fact, they may not go far enough). And these rules may be required under Zurcher. In justifying the search of the newspaper’s offices in Zurcher, the Supreme Court agreed that when the Fourth Amendment’s search and seizure rules collide with First Amendment values, like freedom of the press, the “Fourth Amendment must be applied with ‘scrupulous exactitude.'” The court went on to explain why ordinary search warrants for news offices (remember, back in the age of paper files) meet this heightened standard:

There is no reason to believe, for example, that magistrates cannot guard against searches of the type, scope, and intrusiveness that would actually interfere with the timely publication of a newspaper. Nor, if the requirements of specificity and reasonableness are properly applied, policed, and observed, will there be any occasion or opportunity for officers to rummage at large in newspaper files or to intrude into or to deter normal editorial and publication decisions.

When the California state courts combine this thirty-year-old statement of the law with the modern realities of terabyte storage devices, they should hold that the Fourth Amendment requires magistrate judges to play an integral and active role in the administration of the search of Jason Chen’s computers and other storage devices. At the very least, the courts should forbid the police from looking at any file timestamped before March 18, 2010, and in addition, they should force the police to comply with the Comprehensive Drug Testing rules. In the terabyte age, these rules are necessary at a minimum to prevent the police from interfering with a free press.

Netflix Cancels the Netflix Prize 2

Today, Netflix announced it is canceling its plans for a second Netflix Prize contest, one that reportedly would have involved the release of more information than the first. As I argued earlier, I feared that the new contest would have put the supposedly private movie viewing and rating habits of Netflix customers at great risk, and I applaud Netflix for making a very responsible decision. No doubt, pressure from the private lawsuit and FTC investigation helped Netflix make up its mind, and both are reportedly going away as a result of today’s action.

Netflix's Impending (But Still Avoidable) Multi-Million Dollar Privacy Blunder

In my last post, I had promised to say more about my article on the limits of anonymization and the power of reidentification. Although I haven’t said anything for a few weeks, others have, and I especially appreciate posts by Susannah Fox, Seth Schoen, and Nate Anderson. Not only have these people summarized my article well, they have also added a lot of insightful commentary, and I commend these three posts to you.

Today brings news relating to one of the central examples in my paper: Netflix has announced plans to commit a privacy blunder that could cost it millions of dollars in fines and civil damages.

In my article, I focus on Netflix’s 2006 decision to release millions of records containing the movie rating preferences of “anonymized” users to the public, in order to fuel a crowd-sourcing competition called the Netflix Prize. The Netflix Prize has been a huge win for Netflix’s public relations, but it has also been a win for academics, who have used the data to improve the science of guessing human behavior from past preferences.

The Netflix Prize was also a watershed event for reidentification research because Arvind Narayanan and Vitaly Shmatikov of U. Texas revealed that they could reidentify some of the “anonymized” users with ease, proving that we are more uniquely tied to our movie rating preferences than intuition would suggest. In my paper, I argue that we should worry about this privacy breach even if we don’t think movie ratings are terribly sensitive, because it can be used to enable other, more terrifying privacy breaches.

I never argue, however, that Netflix deserves punishment or sanction for having released this data. In my opinion, Netflix acted pretty responsibly. It consulted with computer scientists in a (failed) attempt to anonymize successfully. It tried perturbing the data in order to make reidentification harder. And other experts seem to have been surprised by how easy it was for Narayanan and Shmatikov to reidentify. Even with the benefit of hindsight, I find nothing to blame in how Netflix handled the privacy implications of what it did.

Although I give Netflix a pass for its past privacy breach, I am astonished to learn from the New York Times that the company plans a second act:

The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals’ “taste profiles,” the company said. The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.

Netflix should cancel this new, irresponsible contest, which it has dubbed Netflix Prize 2. Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney’s famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of “information entropy”: even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.

I have no doubt that researchers will be able to use the techniques of Narayanan and Shmatikov, together with databases revealing sex, zip code, and age, to tie many people directly to these supposedly anonymized new records.

Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a “video tape service provider” (a broadly defined term) from revealing “personally identifiable information” about its customers. Aggrieved customers can sue providers under the VPPA and courts can order “not less than $2500” in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.

Additionally, the FTC might also decide to fine Netflix for violating its privacy policy as an unfair business practice.

Either a lawsuit under the VPPA or an FTC investigation would turn, in large part, on one sentence in Netflix’s privacy policy: “We may also disclose and otherwise use, on an anonymous basis, movie ratings, consumption habits, commentary, reviews and other non-personal information about customers.” If sued or investigated, Netflix will surely argue that its acts are immunized by the policy, because the data is disclosed “on an anonymous basis.” While this argument might have carried the day in 2006, before Narayanan and Shmatikov conducted their study, the argument is much weaker in 2009, now that Netflix has many reasons to know better, including in part, my paper and the publicity surrounding it. A weak argument is made even weaker if Netflix includes the kind of data–ZIP code, age, and gender–that we have known for over a decade fails to anonymize.

The good news is Netflix has time to avoid this multi-million dollar privacy blunder. As far as I can tell, the Netflix Prize 2 has not yet been launched.

Dear Netflix executives: Don’t do this to your customers, and don’t do this to your shareholders. Cancel the Netflix Prize 2, while you still have the chance.

Anonymization FAIL! Privacy Law FAIL!

I have uploaded my latest draft article entitled, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization to SSRN (look carefully for the download button, just above the title; it’s a little buried). According to my abstract:

Computer scientists have recently undermined our faith in the privacy-protecting power of anonymization, the name for techniques for protecting the privacy of individuals in large databases by deleting information like names and social security numbers. These scientists have demonstrated they can often “reidentify” or “deanonymize” individuals hidden in anonymized data with astonishing ease. By understanding this research, we will realize we have made a mistake, labored beneath a fundamental misunderstanding, which has assured us much less privacy than we have assumed. This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention. We must respond to the surprising failure of anonymization, and this Article provides the tools to do so.

I have labored over this article for a long time, and I am very happy to finally share it publicly. Over the next week, or so, I will write a few blog posts here, summarizing the article’s high points and perhaps expanding on what I couldn’t get to in a mere 28,000 words.

Thanks to Ed, David, and everybody else at Princeton’s CITP for helping me develop this article during my visit earlier this year.

Please let me know what you think, either in these comments or by direct email.