September 22, 2021

Archives for August 2006

Great, Now They'll Never Give Us Data

Today’s New York Times has an interesting article by Katie Hafner on AOL’s now-infamous release of customers’ search data.

AOL’s goal in releasing the data was to help researchers by giving them realistic data to study. Today’s technologies, such as search engines, have generated huge volumes of information about what people want online and why. But most of this data is locked up in the data centers of companies like AOL, Google, and eBay, where researchers can’t use it. So researchers have been making do with a few old datasets. The lack of good data is certainly holding back progress in this important area. AOL wanted to help out by giving researchers a better dataset to work with.

Somebody at AOL apparently thought they had “anonymized” the data by replacing the usernames with meaningless numbers. That was a terrible misjudgement – if there is one thing we have learned from the AOL data, it is that people reveal a lot about themselves in their search queries. Reporters have identified at least two of the affected AOL users by name, and finding and publishing embarrassing search sequences has become a popular sport.

The article quotes some prominent researchers, including Jon Kleinberg, saying they’ll refuse to work with this data on ethical grounds. I don’t quite buy that there is an ethical duty to avoid research uses of the data. If I had a valid research use for it, I’m pretty sure I could develop my own guidelines for using it without exacerbating the privacy problem. If I had had something to do with inducing the ill-fated release of the data, I might have an obligation to avoid profiting from my participation in the release. But if the data is out there due to no fault of mine, and the abuses that occur are no fault of mine, why shouldn’t I be able to use the data responsibly, for the public good?

Researchers know that this incident will make companies even more reluctant to release data, even after anonymizing it. If you’re a search-behavior expert, this AOL data may be the last useful data you see for a long time – which is all the more reason to use it.

Most of all, the AOL search data incident reminds us of the complexity of identity and anonymity online. It should have been obvious that removing usernames wasn’t enough to anonymize the data. But this is actually a common kind of mistake – simplistic distinctions between “personally identifiable information” and other information pervade the policy discussion about privacy. The same error is common in debates about big government data mining programs – it’s not as easy as you might think to enable data analysis without also compromising privacy.

In principle, it might have been possible to transform the search data further to make it safe for release. In practice we’re nowhere near understanding how to usefully depersonalize this kind of data. That’s an important research problem in itself, which needs its own datasets to work on. If only somebody had released a huge mass of poorly depersonalized data …

Attacks on a Plane

Last week’s arrest of a gang of would-be airplane bombers unleashed a torrent of commentary, including much of the I told you so variety. One question that I haven’t heard discussed is why the group wanted to attack planes.

The standard security narrative has attackers striking a system’s weak points, and defenders trying to identify and remedy weak points before the attackers hit them. Surely if you were looking for a poorly secured place with a high density of potential victims, an airplane wouldn’t be your first choice. Airplanes have to be the best-secured places that most people go, and they only hold a few hundred people. A ruthless attacker who was trying to maximize expected death and destruction would attack elsewhere.

(9/11 was an attack against office buildings, using planes as weapons. That type of attack is very unlikely to work anymore, now that passengers will resist hijackers rather than cooperating.)

So why did last week’s arrestees target planes? Perhaps they weren’t thinking too carefully – Perry Metzger argues that their apparent peroxide-bomb plan was impractical. Or perhaps they were trying to maximize something other than death and destruction. What exactly? Several candidates come to mind. Perhaps they were trying to install maximum fear, exploiting our disproportionate fear of plane crashes. Perhaps they were trying to cause economic disruption by attacking the transportation infrastructure. Perhaps planes are symbolic targets representing modernity and globalization.

Just as interesting as the attackers’ plans is the government response of beefing up airport security. The immediate security changes made sense in the short run, on the theory that the situation was uncertain and the arrests might trigger immediate attacks by unarrested co-conspirators. But it seems likely that at least some of the new restrictions will continue indefinitely, even though they’re mostly just security theater.

Which suggests another reason the bad guys wanted to attack planes: perhaps it was because planes are so intensively secured; perhaps they wanted to send the message that nowhere is safe. Let’s assume, just for the sake of argument, that this speculation is right, and that visible security measures actually invite attacks. If this is right, then we’re playing a very unusual security game. Should we reduce airport security theater, on the theory that it may be making air travel riskier? Or should we beef it up even more, to draw attacks away from more vulnerable points? Fortunately (for me) I don’t have space here to suggest answers to these questions. (And don’t get me started on the flaws in our current airport screening system.)

The bad guys’ decision to attack planes tells us something interesting about them. And our decision to exhaustively defend planes tells us something interesting about ourselves.

PRM Wars

Today I want to wrap up the recap of my invited talk at Usenix Security. Previously (1; 2) I explained how advocates of DRM-bolstering laws are starting to switch to arguments based on price discrimination and platform lock-in, and how technology is starting to enable the use of DRM-like technologies, which I dubbed Property Rights Management or PRM, on everyday goods. Today I want to speculate on how the policy argument over PRM might unfold, and how it might differ from today’s debate over copyright-oriented DRM.

As with the DRM debate, the policy debate about PRM shouldn’t be (directly) about whether PRM is good or bad, but should instead be about whether the law should bolster PRM by requiring, subsidizing, or legally privileging it; or hinder PRM by banning or regulating it; or maintain a neutral policy that lets companies build PRM products and others to study and use them as they see fit.

What might a PRM-bolstering law look like? One guess is that it will extend the DMCA to PRM scenarios where no copyrighted work is at issue. Courts have generally read the DMCA as not extending to such scenarios (as in the Skylink and Static Control cases), but Congress could change that. The result would be a general ban on circumventing anti-interoperability technology or trafficking in circumvention tools. This would have side effects even worse than the DMCA’s, but Congress seemed to underestimate the DMCA’s side effects when debating it, so we can’t rule out a similar Congressional mistake again.

The most important feature of the PRM policy argument is that it won’t be about copyright. So fair use arguments are off the table, which should clarify the debate all around – arguments about DRM and fair use often devolve into legal hairsplitting or focus too much on the less interesting fair use scenarios. Instead of fair use we’ll have the simpler intuition that people should be able to use their stuff as they see fit.

We can expect the public to be more skeptical about PRM than DRM. Users who sympathize with anti-infringement efforts will not accept so easily the desire of ordinary manufacturers to price discriminate or lock in customers. People distrust DRM because of its side-effects. With PRM they may also reject its stated goals.

So the advocates of PRM-bolstering laws will have to change the argument. Perhaps they’ll come up with some kind of story about trademark infringement – we want to make your fancy-brand watch reject third-party watchbands to protect you against watchband counterfeiters. Or perhaps they’ll try a safety argument – as responsible automakers, we want to protect you from the risks of unlicensed tires.

Our best hope for sane policy in this area is that policymakers will have learned from the mistakes of DRM policy. That’s not too much to ask, is it?