January 23, 2025

Great, Now They'll Never Give Us Data

Today’s New York Times has an interesting article by Katie Hafner on AOL’s now-infamous release of customers’ search data.

AOL’s goal in releasing the data was to help researchers by giving them realistic data to study. Today’s technologies, such as search engines, have generated huge volumes of information about what people want online and why. But most of this data is locked up in the data centers of companies like AOL, Google, and eBay, where researchers can’t use it. So researchers have been making do with a few old datasets. The lack of good data is certainly holding back progress in this important area. AOL wanted to help out by giving researchers a better dataset to work with.

Somebody at AOL apparently thought they had “anonymized” the data by replacing the usernames with meaningless numbers. That was a terrible misjudgement – if there is one thing we have learned from the AOL data, it is that people reveal a lot about themselves in their search queries. Reporters have identified at least two of the affected AOL users by name, and finding and publishing embarrassing search sequences has become a popular sport.

The article quotes some prominent researchers, including Jon Kleinberg, saying they’ll refuse to work with this data on ethical grounds. I don’t quite buy that there is an ethical duty to avoid research uses of the data. If I had a valid research use for it, I’m pretty sure I could develop my own guidelines for using it without exacerbating the privacy problem. If I had had something to do with inducing the ill-fated release of the data, I might have an obligation to avoid profiting from my participation in the release. But if the data is out there due to no fault of mine, and the abuses that occur are no fault of mine, why shouldn’t I be able to use the data responsibly, for the public good?

Researchers know that this incident will make companies even more reluctant to release data, even after anonymizing it. If you’re a search-behavior expert, this AOL data may be the last useful data you see for a long time – which is all the more reason to use it.

Most of all, the AOL search data incident reminds us of the complexity of identity and anonymity online. It should have been obvious that removing usernames wasn’t enough to anonymize the data. But this is actually a common kind of mistake – simplistic distinctions between “personally identifiable information” and other information pervade the policy discussion about privacy. The same error is common in debates about big government data mining programs – it’s not as easy as you might think to enable data analysis without also compromising privacy.

In principle, it might have been possible to transform the search data further to make it safe for release. In practice we’re nowhere near understanding how to usefully depersonalize this kind of data. That’s an important research problem in itself, which needs its own datasets to work on. If only somebody had released a huge mass of poorly depersonalized data …

Attacks on a Plane

Last week’s arrest of a gang of would-be airplane bombers unleashed a torrent of commentary, including much of the I told you so variety. One question that I haven’t heard discussed is why the group wanted to attack planes.

The standard security narrative has attackers striking a system’s weak points, and defenders trying to identify and remedy weak points before the attackers hit them. Surely if you were looking for a poorly secured place with a high density of potential victims, an airplane wouldn’t be your first choice. Airplanes have to be the best-secured places that most people go, and they only hold a few hundred people. A ruthless attacker who was trying to maximize expected death and destruction would attack elsewhere.

(9/11 was an attack against office buildings, using planes as weapons. That type of attack is very unlikely to work anymore, now that passengers will resist hijackers rather than cooperating.)

So why did last week’s arrestees target planes? Perhaps they weren’t thinking too carefully – Perry Metzger argues that their apparent peroxide-bomb plan was impractical. Or perhaps they were trying to maximize something other than death and destruction. What exactly? Several candidates come to mind. Perhaps they were trying to install maximum fear, exploiting our disproportionate fear of plane crashes. Perhaps they were trying to cause economic disruption by attacking the transportation infrastructure. Perhaps planes are symbolic targets representing modernity and globalization.

Just as interesting as the attackers’ plans is the government response of beefing up airport security. The immediate security changes made sense in the short run, on the theory that the situation was uncertain and the arrests might trigger immediate attacks by unarrested co-conspirators. But it seems likely that at least some of the new restrictions will continue indefinitely, even though they’re mostly just security theater.

Which suggests another reason the bad guys wanted to attack planes: perhaps it was because planes are so intensively secured; perhaps they wanted to send the message that nowhere is safe. Let’s assume, just for the sake of argument, that this speculation is right, and that visible security measures actually invite attacks. If this is right, then we’re playing a very unusual security game. Should we reduce airport security theater, on the theory that it may be making air travel riskier? Or should we beef it up even more, to draw attacks away from more vulnerable points? Fortunately (for me) I don’t have space here to suggest answers to these questions. (And don’t get me started on the flaws in our current airport screening system.)

The bad guys’ decision to attack planes tells us something interesting about them. And our decision to exhaustively defend planes tells us something interesting about ourselves.

PRM Wars

Today I want to wrap up the recap of my invited talk at Usenix Security. Previously (1; 2) I explained how advocates of DRM-bolstering laws are starting to switch to arguments based on price discrimination and platform lock-in, and how technology is starting to enable the use of DRM-like technologies, which I dubbed Property Rights Management or PRM, on everyday goods. Today I want to speculate on how the policy argument over PRM might unfold, and how it might differ from today’s debate over copyright-oriented DRM.

As with the DRM debate, the policy debate about PRM shouldn’t be (directly) about whether PRM is good or bad, but should instead be about whether the law should bolster PRM by requiring, subsidizing, or legally privileging it; or hinder PRM by banning or regulating it; or maintain a neutral policy that lets companies build PRM products and others to study and use them as they see fit.

What might a PRM-bolstering law look like? One guess is that it will extend the DMCA to PRM scenarios where no copyrighted work is at issue. Courts have generally read the DMCA as not extending to such scenarios (as in the Skylink and Static Control cases), but Congress could change that. The result would be a general ban on circumventing anti-interoperability technology or trafficking in circumvention tools. This would have side effects even worse than the DMCA’s, but Congress seemed to underestimate the DMCA’s side effects when debating it, so we can’t rule out a similar Congressional mistake again.

The most important feature of the PRM policy argument is that it won’t be about copyright. So fair use arguments are off the table, which should clarify the debate all around – arguments about DRM and fair use often devolve into legal hairsplitting or focus too much on the less interesting fair use scenarios. Instead of fair use we’ll have the simpler intuition that people should be able to use their stuff as they see fit.

We can expect the public to be more skeptical about PRM than DRM. Users who sympathize with anti-infringement efforts will not accept so easily the desire of ordinary manufacturers to price discriminate or lock in customers. People distrust DRM because of its side-effects. With PRM they may also reject its stated goals.

So the advocates of PRM-bolstering laws will have to change the argument. Perhaps they’ll come up with some kind of story about trademark infringement – we want to make your fancy-brand watch reject third-party watchbands to protect you against watchband counterfeiters. Or perhaps they’ll try a safety argument – as responsible automakers, we want to protect you from the risks of unlicensed tires.

Our best hope for sane policy in this area is that policymakers will have learned from the mistakes of DRM policy. That’s not too much to ask, is it?

DRM Wars: Property Rights Management

In the first part of my invited talk at Usenix Security, I argued that as the inability of DRM technology to stop peer-to-peer infringement becomes increasingly obvious to everybody, the rationale for DRM is shifting. The new argument for DRM-bolstering laws is that DRM enables price discrimination and platform lock-in, which are almost always good for vendors, and sometimes good for society as a whole. The new arguments have no real connection to copyright enforcement so (I predict) the DRM policy debate will come unmoored from copyright.

The second trend I identified in the talk was toward the use of DRM-like technologies on traditional physical products. A good example is the use of cryptographic lockout codes in computer printers and their toner cartridges. Printer manufacturers want to sell printers at a low price and compensate by charging more for toner cartridges. To do this, they want to stop consumers from buying cheap third-party toner cartridges. So some printer makers have their printers do a cryptographic handshake with a chip in their cartridges, and they lock out third-party cartridges by programming the printers not to operate with cartridges that can’t do the secret handshake.

Doing this requires having some minimal level of computing functionality in both devices (e.g., the printer and cartridge). Moore’s Law is driving the size and price of that functionality to zero, so it will become economical to put secret-handshake functions into more and more products. Just as traditional DRM operates by limiting and controlling interoperation (i.e., compatibility) between digital products, these technologies will limit and control interoperation between ordinary products. We can call this Property Rights Management, or PRM.

(Unfortunately, I didn’t coin this term until after the talk. During the actual talk I used the awkward “DRM-like technologies”.)

Where can PRM technologies be deployed? I gave three examples where they’ll be feasible before too many more years. (1) A pen may refuse to dispense ink unless it’s being used with licensed paper. The pen would handshake with the paper by short-range RFID or through physical contact. (2) A shoe may refuse to provide some features, such as high-tech cushioning of the sole, unless used with licensed shoelaces. Again, this could be done by short-range RFID or physical contact. (3) The scratchy side of a velcro connector may refuse to stick to the fuzzy size unless the fuzzy side is licensed. The scratchy side of velcro has little hooks to grab loops on the fuzzy side; the hooks may refuse to function unless the license is in order. For example, Apple could put PRMed scratchy-velcro onto the iPod, in the hope of extracting license fees from companies that make fuzzy-velcro for the iPod to stick to.

[UPDATE (August 16): I missed an obvious PRM example: razors and blades. The razor would refuse to grip the blade unless the blade knew the secret handshake.]

Will these things actually happen? I can’t say for sure. I chose these examples to illustrate how far PRM micht go. The examples will be feasible to implement, eventually. Whether PRM gets used in these particular markets depends on market conditions and business decisions by the vendors. What we can say, I think, is that as PRM becomes practical in more product areas, its use will widen and we’ll face policy decisions about how to treat it.

To sum up thus far, the arguments for DRM are disconnecting from copyright, and the mechanisms of DRM are starting to disconnect from copyright in the form of Property Rights Management. Where does this leave the public policy debates? That will be the topic of the next (and final) installment.

DRM Wars: The Next Generation

Last week at the Usenix Security Symposium, I gave an invited talk, with the same title as this post. The gist of the talk was that the debate about DRM (copy protection) technologies, which has been stalemated for years now, will soon enter a new phase. I’ll spend this post, and one or two more, explaining this.

Public policy about DRM offers a spectrum of choices. On one end of the spectrum are policies that bolster DRM, by requiring or subsidizing it, or by giving legal advantages to companies that use it. On the other end of the spectrum are policies that hinder DRM, by banning or regulating it. In the middle is the hands-off policy, where the law doesn’t mention DRM, companies are free to develop DRM if they want, and other companies and individuals are free to work around the DRM for lawful purposes. In the U.S. and most other developed countries, the move has been toward DRM-bolstering laws, such as the U.S. DMCA.

The usual argument in favor of bolstering DRM is that DRM retards peer-to-peer copyright infringement. This argument has always been bunk – every worthwhile song, movie, and TV show is available via P2P, and there is no convincing practical or theoretical evidence that DRM can stop P2P infringement. Policymakers have either believed naively that the next generation of DRM would be different, or accepted vague talk about speedbumps and keeping honest people honest.

At last, this is starting to change. Policymakers, and music and movie companies, are starting to realize that DRM won’t solve their P2P infringement problems. And so the usual argument for DRM-bolstering laws is losing its force.

You might expect the response to be a move away from DRM-bolstering laws. Instead, advocates of DRM-bolstering laws have switched to two new arguments. First, they argue that DRM enables price discrimination – business models that charge different customers different prices for a product – and that price discrimination benefits society, at least sometimes. Second, they argue that DRM helps platform developers lock in their customers, as Apple has done with its iPod/iTunes products, and that lock-in increases the incentive to develop platforms. I won’t address the merits or limitations of these arguments here – I’m just observing that they’re replacing the P2P piracy bogeyman in the rhetoric of DMCA boosters.

Interestingly, these new arguments have little or nothing to do with copyright. The maker of almost any product would like to price discriminate, or to lock customers in to its product. Accordingly, we can expect the debate over DRM policy to come unmoored from copyright, with people on both sides making arguments unrelated to copyright and its goals. The implications of this change are pretty interesting. They’ll be the topic of my next post.