Paul Ohm's blog

A Good Day for Email Privacy: A Court Takes Back its Earlier, Bad Ruling in Rehberg v. Paulk

In March, the U.S. Court of Appeals for the Eleventh Circuit, the court that sets federal law for Alabama, Florida, and Georgia, ruled in an opinion in a case called Rehberg v. Paulk that people lacked a reasonable expectation of privacy in the content of email messages stored with an email provider. This meant that the police in those three states were free to ignore the Fourth Amendment when obtaining email messages from a provider. In this case, the plaintiff alleged that the District Attorney had used a sham subpoena to trick a provider to hand over the plaintiff's email messages. The Court ruled that the DA was allowed to do this, consistent with the Constitution.

I am happy to report that today, the Court vacated the opinion and replaced it with a much more carefully reasoned, nuanced opinion.

Most importantly, the Eleventh Circuit no longer holds that "A person also loses a reasonable expectation of privacy in emails, at least after the email is sent to and received by a third party." nor that "Rehberg's voluntary delivery of emails to third parties constituted a voluntary relinquishment of the right to privacy in that information." These bad statements of law have effectively been erased from the court reporters.

This is a great victory for Internet privacy, although it could have been even better. The Court no longer strips email messages of protection, but it didn't go further and affirmatively hold that email users possess a Fourth Amendment right to privacy in email. Instead, the Court ruled that even if such a right exists, it wasn't "clearly established," at the time the District Attorney acted, which means the plaintiff can't continue to pursue this claim.

I am personally invested in this case because I authored a brief asking the Court to reverse its earlier bad ruling. I am glad the Court agreed with us and thank all of the other law professors who signed the brief: Susan Brenner, Susan Freiwald, Stephen Henderson, Jennifer Lynch, Deirdre Mulligan, Joel Reidenberg, Jason Schultz, Chris Slobogin, and Dan Solove. Thanks also to my incredibly hard-working and talented research assistants, Nicole Freiss and Devin Looijien.

Updated: The EFF (which represents the plaintiff) is much more disappointed in the amended opinion than I. They make a lot of good points, but I prefer to see the glass half-full.

Tagged:  

The Gizmodo Warrant: Searching Journalists in the Terabyte Age

Last Friday night, police officers in California used a warrant to search the home of Jason Chen, the Gizmodo blogger who wrote about the iPhone prototype found in a Redwood City bar. Orin Kerr has written an interesting post assessing the legality of the search. I wanted to touch on an important issue he didn't discuss: Whether the search the police are conducting is unconstitutionally overbroad.

Orin discusses two laws that specifically shield journalists from being the target of a search, the California Reporter's Shield Law, found jointly at California Penal Code 1524(g) and California Evidence Code 1070, and the federal Privacy Protection Act (PPA), 42 U.S.C. 2000aa. Both laws were written to limit the impact of Zurcher v. Stanford Daily, a U.S. Supreme Court case authorizing the use of a warrant to search a newspaper's offices. The Supreme Court decided Zurcher in 1978, and Congress enacted the PPA in 1980 (and amended it in unrelated ways in 1996). I'm not sure when the California law was enacted, but I bet it's of similar vintage. In other words, all of the rules that govern police searches of news offices were created in the age of typewriters, desks, filing cabinets, and stacks of paper.

Now, flash forward thirty years. The police who searched Jason Chen's home seized the following: A macbook, HP server, two Dell desktop computers, iPad, ThinkPad, two MacBook Pros, IOmega NAS, three external hard drives, and three flash drives. They also seized other storage-containing devices, including two digital cameras and two smart phones. If Jason Chen's computing habits are anything like mine, the police likely seized many terabytes of disk space, storing hundreds of thousands (millions?) of files, containing information stretching back years. And they took all of this information to investigate an alleged crime (the sale of the iPhone prototype) that could not have happened more than 37 days before the search (the iPhone was found on March 18th), which they learned about from a blog post published four days before the search.

I'm deeply concerned about overbreadth as the police begin to search through these terabytes of information. The police now possess, intermingled with the evidence of the alleged crime they are investigating, hundreds of thousands of documents belonging to a journalist/blogger that are utterly irrelevant to their investigation. Jason Chen has been blogging for Gizmodo since 2006, and he's probably written hundreds of stories. The police likely have thousands of email messages revealing confidential sources, detailing meetings, and trading comments with editors, and thousands of other documents bearing notes from interviews, drafts of articles, and other sensitive information. Because of Chen's beat, some of these documents probably reveal secrets of great economic and business value in the Silicon Valley. Under traditional, outmoded Fourth Amendment rules, the police can read every single document they possess, so long as they intend only to look for evidence of the crime, and under the "plain view rule," they can use any evidence they find of other, unrelated crimes in court against Chen or anyone else.

If the California state courts share my concerns about overbreadth, they should consider embracing the very sensible rules for search warrants for computer hard drives (in any case, not just those involving journalists) adopted last year by the Ninth Circuit in United States v. Comprehensive Drug Testing. To paraphrase, in cases involving the search and seizure of computers, the Ninth Circuit requires five things: (1) the government must waive the plain view rule, meaning they must agree not to use evidence of crimes other than the one under investigation that led to the warrant; (2) the government must wall off the forensic experts who search the hard drive from the investigating the case; (3) the government must explain the "actual risks of destruction of information" they would face if they weren't allowed to seize entire computers; (4) the government must use a search protocol to designate what information they can give to the investigating agents; and (5) the government must destroy or return non-responsive data.

These rules are especially needed when the target of a police search is a journalist (in fact, they may not go far enough). And these rules may be required under Zurcher. In justifying the search of the newspaper's offices in Zurcher, the Supreme Court agreed that when the Fourth Amendment's search and seizure rules collide with First Amendment values, like freedom of the press, the "Fourth Amendment must be applied with 'scrupulous exactitude.'" The court went on to explain why ordinary search warrants for news offices (remember, back in the age of paper files) meet this heightened standard:

There is no reason to believe, for example, that magistrates cannot guard against searches of the type, scope, and intrusiveness that would actually interfere with the timely publication of a newspaper. Nor, if the requirements of specificity and reasonableness are properly applied, policed, and observed, will there be any occasion or opportunity for officers to rummage at large in newspaper files or to intrude into or to deter normal editorial and publication decisions.

When the California state courts combine this thirty-year-old statement of the law with the modern realities of terabyte storage devices, they should hold that the Fourth Amendment requires magistrate judges to play an integral and active role in the administration of the search of Jason Chen's computers and other storage devices. At the very least, the courts should forbid the police from looking at any file timestamped before March 18, 2010, and in addition, they should force the police to comply with the Comprehensive Drug Testing rules. In the terabyte age, these rules are necessary at a minimum to prevent the police from interfering with a free press.

Netflix Cancels the Netflix Prize 2

Today, Netflix announced it is canceling its plans for a second Netflix Prize contest, one that reportedly would have involved the release of more information than the first. As I argued earlier, I feared that the new contest would have put the supposedly private movie viewing and rating habits of Netflix customers at great risk, and I applaud Netflix for making a very responsible decision. No doubt, pressure from the private lawsuit and FTC investigation helped Netflix make up its mind, and both are reportedly going away as a result of today's action.

Tagged:  

Netflix's Impending (But Still Avoidable) Multi-Million Dollar Privacy Blunder

In my last post, I had promised to say more about my article on the limits of anonymization and the power of reidentification. Although I haven't said anything for a few weeks, others have, and I especially appreciate posts by Susannah Fox, Seth Schoen, and Nate Anderson. Not only have these people summarized my article well, they have also added a lot of insightful commentary, and I commend these three posts to you.

Today brings news relating to one of the central examples in my paper: Netflix has announced plans to commit a privacy blunder that could cost it millions of dollars in fines and civil damages.

In my article, I focus on Netflix's 2006 decision to release millions of records containing the movie rating preferences of "anonymized" users to the public, in order to fuel a crowd-sourcing competition called the Netflix Prize. The Netflix Prize has been a huge win for Netflix's public relations, but it has also been a win for academics, who have used the data to improve the science of guessing human behavior from past preferences.

The Netflix Prize was also a watershed event for reidentification research because Arvind Narayanan and Vitaly Shmatikov of U. Texas revealed that they could reidentify some of the "anonymized" users with ease, proving that we are more uniquely tied to our movie rating preferences than intuition would suggest. In my paper, I argue that we should worry about this privacy breach even if we don't think movie ratings are terribly sensitive, because it can be used to enable other, more terrifying privacy breaches.

I never argue, however, that Netflix deserves punishment or sanction for having released this data. In my opinion, Netflix acted pretty responsibly. It consulted with computer scientists in a (failed) attempt to anonymize successfully. It tried perturbing the data in order to make reidentification harder. And other experts seem to have been surprised by how easy it was for Narayanan and Shmatikov to reidentify. Even with the benefit of hindsight, I find nothing to blame in how Netflix handled the privacy implications of what it did.

Although I give Netflix a pass for its past privacy breach, I am astonished to learn from the New York Times that the company plans a second act:

The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals’ “taste profiles,” the company said. The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.

Netflix should cancel this new, irresponsible contest, which it has dubbed Netflix Prize 2. Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney's famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of "information entropy": even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.

I have no doubt that researchers will be able to use the techniques of Narayanan and Shmatikov, together with databases revealing sex, zip code, and age, to tie many people directly to these supposedly anonymized new records.

Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a "video tape service provider" (a broadly defined term) from revealing "personally identifiable information" about its customers. Aggrieved customers can sue providers under the VPPA and courts can order "not less than $2500" in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.

Additionally, the FTC might also decide to fine Netflix for violating its privacy policy as an unfair business practice.

Either a lawsuit under the VPPA or an FTC investigation would turn, in large part, on one sentence in Netflix's privacy policy: "We may also disclose and otherwise use, on an anonymous basis, movie ratings, consumption habits, commentary, reviews and other non-personal information about customers." If sued or investigated, Netflix will surely argue that its acts are immunized by the policy, because the data is disclosed "on an anonymous basis." While this argument might have carried the day in 2006, before Narayanan and Shmatikov conducted their study, the argument is much weaker in 2009, now that Netflix has many reasons to know better, including in part, my paper and the publicity surrounding it. A weak argument is made even weaker if Netflix includes the kind of data--ZIP code, age, and gender--that we have known for over a decade fails to anonymize.

The good news is Netflix has time to avoid this multi-million dollar privacy blunder. As far as I can tell, the Netflix Prize 2 has not yet been launched.

Dear Netflix executives: Don't do this to your customers, and don't do this to your shareholders. Cancel the Netflix Prize 2, while you still have the chance.

Tagged:  

Anonymization FAIL! Privacy Law FAIL!

I have uploaded my latest draft article entitled, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization to SSRN (look carefully for the download button, just above the title; it's a little buried). According to my abstract:

Computer scientists have recently undermined our faith in the privacy-protecting power of anonymization, the name for techniques for protecting the privacy of individuals in large databases by deleting information like names and social security numbers. These scientists have demonstrated they can often “reidentify” or “deanonymize” individuals hidden in anonymized data with astonishing ease. By understanding this research, we will realize we have made a mistake, labored beneath a fundamental misunderstanding, which has assured us much less privacy than we have assumed. This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention. We must respond to the surprising failure of anonymization, and this Article provides the tools to do so.

I have labored over this article for a long time, and I am very happy to finally share it publicly. Over the next week, or so, I will write a few blog posts here, summarizing the article's high points and perhaps expanding on what I couldn't get to in a mere 28,000 words.

Thanks to Ed, David, and everybody else at Princeton's CITP for helping me develop this article during my visit earlier this year.

Please let me know what you think, either in these comments or by direct email.

Did the Sanford E-Mail Tipster or the Newspaper Break the Law?

Part of me doesn't want to comment on the Mark Sanford news, because it's all so tawdry and inconsistent with the respectable, family-friendly tone of Freedom to Tinker. But since everybody from the Gray Lady on down is plastering the web with stories, and because all of this reporting is leaving unanalyzed some Internet law questions, let me offer this:

On Wednesday, after Sanford's confessional press conference, The State, the largest newspaper in South Carolina, posted email messages appearing to be love letters between the Governor and his mistress. (The paper obscured the name of the mistress, calling her only "Maria.") The paper explained in a related news story that they had received these messages from an anonymous tipster back in December, but until yesterday's unexpected corroboration of their likely authenticity, they had just sat on them.

Did the anonymous tipster break the law by obtaining or disclosing the email messages? Did the paper break the law by publishing them? After the jump, I'll offer my take on these questions.

Tagged:  

FBI's Spyware Program

Note: I worked for the Department of Justice's Computer Crime and Intellectual Property Section (CCIPS) from 2001 to 2005. The documents discussed below mention a memo written by somebody at CCIPS during the time I worked there, but absolutely everything I say below reflects only my personal thoughts and impressions about the documents released to the public today.

Two years ago, Kevin Poulsen broke the news that the FBI had successfully deployed spyware to help catch a student sending death threats to his high school. The FBI calls the tool a CIPAV for "computer and internet protocol address verifier."

We learned today that Kevin filed a Freedom of Information Act request (along with EFF and CNet News) asking for other information about CIPAVs. The FBI has responded, Kevin made the 152 pages available, and I just spent the past half hour skimming them.

Here are some unorganized impressions:

  • The 152 pages don't take long to read, because they have been so heavily redacted. The vast majority of the pages have no substantive content at all.
  • Page one may be the most interesting page. Someone at CCIPS, my old unit, cautions that "While the technique is of indisputable value in certain kinds of cases, we are seeing indications that it is being used needlessly by some agencies, unnecessarily raising difficult legal questions (and a risk of suppression) without any countervailing benefit,"
  • On page 152, the FBI's Cryptographic and Electronic Analysis Unit (CEAU) "advised Pittsburgh that they could assist with a wireless hack to obtain a file tree, but not the hard drive content." This is fascinating on several levels. First, what wireless hack? The spyware techniques described in Poulsen's reporting are deployed when a target is unlocatable, and the FBI tricks him or her into clicking a link. How does wireless enter the picture? Don't you need to be physically proximate to your target to hack them wirelessly? Second, why could CEAU "assist . . . to obtain a file tree, but not the hard drive content." That smells like a legal constraint, not a technical one. Maybe some lawyer was making distinctions based on probable cause?
  • On page 86, the page summarizing the FBI's Special Technologies and Applications Office (STAO) response to the FOIA request, STAO responds that they have included an "electronic copy of 'Magic Quadrant for Information Access Technology'" on cd-rom. Is that referring to this Gartner publication, and if so, what does this have to do with the FOIA request? I'm hoping one of the uber geeks reading this blog can tie FBI spyware to this phrase.
  • Pages 64-80 contain the affidavit written to justify the use of the CIPAV in the high school threat case. I had seen these back when Kevin first wrote about them, but if you haven't seen them yet, you should read them.
  • It definitely appears that the FBI is obtaining search warrants before installing CIPAVs. Although this is probably enough to justify grabbing IP addresses and information packed in a Windows registry, it probably is not enough alone to justify tracing IP addresses in real time. The FBI probably needs a pen register/trap and trace order in addition to the warrant to do that under 18 U.S.C. 3123. Although pen registers are mentioned a few times in these documents--particularly in the affidavit mentioned above--many of the documents simply say "warrant." This is probably not of great consequence, because if FBI has probable cause to deploy one of these, they can almost certainly justify a pen register order, but why are they being so sloppy?

Two final notes: First, I twittered my present sense impressions while reading the documents, which was an interesting experiment for me, if not for those following me. If you want to follow me, visit my profile.

Second, if you see anything else in the documents that bear scrutiny, please leave them in the comments of this post.

Tagged:  

Fascinating New Blog: ComputationalLegalStudies.com

I was inspired to post the essay I discussed in the prior post by the debut of the best new law blog I have seen in a long time, Computational Legal Studies, featuring the work of Daniel Katz and Michael Bommarito, both graduate students in the University of Michigan's political science department.

Every single blog they have posted has caused me to smack my head once for not having thought of the idea first, and a second time for not having their datasets and skillz. Their visualization of who has gotten TARP funds and how they're connected to legislators deserves to be printed on posters and hung up in newsrooms across the country (not to mention in offices on Capitol Hill). They've also shown good taste by building a bridge to this blog, linking favorably back to the great CITP work led by David Robinson on government openness.

I will have more to say about Dan and Mike's new blog in the weeks and months to come, but for now it is enough to welcome them to the blogosphere.

Computer Programming and the Law: A New Research Agenda

By my best estimate, at least twenty different law professors on the tenure track at American law schools once held a job as a professional computer programmer. I am proud to say that two of us work at my law school.

Most of these hyphenate lawprof-coders rarely write any code today, and this is a shame. There are many good reasons why the world would be a better place if we began to integrate computer programming into legal scholarship (and more generally, into law and policy).

Two years ago, I wrote a blog post for a lawprof blog exploring this idea. I promised a follow-up post, but never delivered. A year later, I expanded the idea into an essay, which the good people at the Villanova Law Review agreed to publish sometime later this year. With this post, I am releasing a slightly-outdated draft of the essay for the first time to the public. You can download it at SSRN.

In the abstract, I say:

This essay proposes a new interdisciplinary research agenda called Computer Programming and the Law. By harnessing the power of computer programming, legal scholars can develop better tools, data, and insights for advancing their research interests. This essay presents the case for this new research agenda, highlights some examples of those who have begun to blaze the trail, and includes code samples to demonstrate the power and potential of developing software for legal scholarship. The code samples in this essay can be run like a piece of software—thanks to a technique known as literate programming—making this the world’s first law review article that is also a working computer program.

If you have any interest in the intersection of technology and policy (in other words, if you read this blog), please read the essay and let me know what you think. Unlike many law review articles, this one is short. And how bad could it be? It contains 350 lines of perl! (Wait, don't answer that!)

Tagged:  

Being Acquitted Versus Being Searched (YANAL)

With this post, I'm launching a new, (very) occasional series I'm calling YANAL, for "You Are Not A Lawyer." In this series, I will try to disabuse computer scientists and other technically minded people of some commonly held misconceptions about the law (and the legal system).

I start with something from criminal law. As you probably already know, in the American criminal law system, as in most others, a jury must find a defendant guilty "beyond a reasonable doubt" to convict. "Beyond a reasonable doubt" is a famously high standard, and many guilty people are free today only because the evidence against them does not meet this standard.

When techies think about criminal law, and in particular crimes committed online, they tend to fixate on this legal standard, dreaming up ways people can use technology to inject doubt into the evidence to avoid being convicted. I can't count how many conversations I have had with techies about things like the "open wireless access point defense," the "trojaned computer defense," the "NAT-ted firewall defense," and the "dynamic IP address defense." Many people have talked excitedly to me about tools like TrackMeNot or more exotic methods which promise, at least in part, to inject jail-springing reasonable doubt onto a hard drive or into a network.

People who place stock in these theories and tools are neglecting an important drawback. There are another set of legal standards--the legal standards governing search and seizure--you should worry about long before you ever get to "beyond a reasonable doubt". Omitting a lot of detail, the police, even without going to a judge first, can obtain your name, address, and credit card number from your ISP if they can show the information is relevant to a criminal investigation. They can obtain transaction logs (think apache or sendmail logs) after convincing a judge the evidence is "relevant and material to an ongoing criminal investigation." If they have probable cause--another famous, but often misunderstood standard--they can read all of your stored email, rifle through your bedroom dresser drawers, and image your hard drive. If they jump through a few other hoops, they can wiretap your telephone. Some of these standards aren't easy to meet, but all of them are well below the "beyond a reasonable doubt" standard for guilt.

So by the time you've had your Perry Mason moment in front of the jurors, somehow convincing them that the fact that you don't enable WiFi authentication means your neighbor could've sent the death threat, your life will have been turned upside down in many ways: The police will have searched your home and seized all of your computers. They will have examined all of the files on your hard drives and read all of the messages in your inboxes. (And if you have a shred of kiddie porn stored anywhere, the alleged death threat will be the least of your worries. I know, I know, the virus on your computer raises doubt that the kiddie porn is yours!) They will have arrested you and possibly incarcerated you pending trial. Guys with guns will have interviewed you and many of your friends, co-workers, and neighbors.

In addition, you will have been assigned an overworked public defender who has no time for far-fetched technological defenses and prefers you take a plea bargain, or you will have paid thousands of dollars to a private attorney who knows less than the public defender about technology, but who is "excited to learn" on your dime. Maybe, maybe, maybe after all of this, your lawyer convinces the judge or the jury. You're free! Congratulations?

The police and prosecutors run into many legal standards, many of which are much easier to satisfy than "beyond a reasonable doubt" and most of which are met long before they see an access point or notice a virus infection. By meeting any of these standards, they can seriously disrupt your life, even if they never end up putting you away.

Tagged:  
Syndicate content