November 27, 2024

Why So Many False Positives on the No-Fly List?

Yesterday I argued that Walter Murphy’s much-discussed encounter with airport security was probably just a false positive in the no-fly list matching algorithm. Today I want to talk about why false positives (ordinary citizens triggering mistaken “matches” with the list) are so common.

First, a preliminary. It’s often argued that the high false positive rate proves the system is poorly run or even useless. This is not necessarily the case. In running a system like this, we necessarily trade off false positives against false negatives. We can lower either kind of error, but doing so will increase the other kind. The optimal policy will balance the harm from false positives against the harm from false negatives, to minimize total harm. If the consequences of a false positive are relatively minor (brief inconvenience for one traveler), but the consequences of a false negative are much worse (non-negligible probability of multiple deaths), then the optimal choice is to accept many false positives in order to drive the false negative rate way down. In other words, a high false positive rate is not by itself a sign of bad policy or bad management. You can argue that the consequences of error are not really so unbalanced, or that the tradeoff is being made poorly, but your argument can’t rely only on the false positive rate.

Having said that, the system’s high false positive rate still needs explaining.

The fundamental reason for the false positives is that the system matches names , and names are a poor vehicle for identifying people, especially in the context of air travel. Names are not as unique as most people think, and names are frequently misspelled, especially in airline records. Because of the misspellings, you’ll have to do approximate matching, which will make the nonuniqueness problem even worse. The result is many false positives.

Why not use more information to reduce false positives? Why not, for example, use the fact that the Walter Murphy who served in the Marine Corps and used to live near Princeton is not a threat?

The reason is that using that information would have unwanted consequences. First, the airlines would have to gather much more private information about passengers, and they would probably have to verify that information by demanding documentary proof of some kind.

Second, checking that private information against the name on the no-fly list would require bringing together the passenger’s private information with the government’s secret information about the person on the no-fly list. Either the airline can tell the government what it knows about the passenger’s private life, or the government can tell the airline what it knows about the person on the no-fly list. Both options are unattractive.

A clumsy compromise – which the government is apparently making – is to provide a way for people who often trigger false positives to supply more private information, and if that information distinguishes the person from the no-fly list entry, to give the person some kind of “I’m not really on the no-fly list” certificate. This imposes a privacy cost, but only on people who often trigger false positives.

Once you’ve decided to have a no-fly list, a significant false positive rate is nearly inevitable. The bigger policy question is whether, given all of its drawbacks, we should have a no-fly list at all.

Walter Murphy Stopped at Airport: Another False Positive

Blogs are buzzing about the story of Walter Murphy, a retired Princeton professor who reported having triggered a no-fly list match on a recent trip. Prof. Murphy suspects this happened because he has given speeches criticizing the Bush Administration.

I studied the no-fly list mechanism (and the related watchlist) during my service on the TSA’s Secure Flight Working Group. Based on what I learned about the system, I am skeptical of Prof. Murphy’s claim. I think he reached, in good faith, an incorrect conclusion about why he was stopped.

Based on Prof. Murphy’s story, it appears that when his flight reservation was matched against the no-fly list, the result was a “hit”. This is why he was not allowed to check in at curbside but had to talk to an airline employee at the check-in desk. The employee eventually cleared him and gave him a boarding pass.

(Some reports say Prof. Murphy might have matched the watchlist, a list of supposedly less dangerous people, but I think this is unlikely. A watchlist hit would have caused him to be searched at the security checkpoint but would not have led to the extended conversation he had. Other reports say he was chosen at random, which also seems unlikely – I don’t think no-fly list challenges are issued randomly.)

There are two aspects to the no-fly list, one that puts names on the list and another that checks airline reservations against the list. The two parts are almost entirely separate.

Names are put on the list through a secret process; about all we know is that names are added by intelligence and/or law enforcement agencies. We know the official standard for adding a name requires that the person be a sufficiently serious threat to aviation security, but we don’t know what processes, if any, are used to ensure that this standard is followed. In short, nobody outside the intelligence community knows much about how names get on the list.

The airlines check their customers’ reservations against the list, and they deal with customers who are “hits”. Most hits are false positives (innocent people who trigger mistaken hits), who are allowed to fly after talking to an airline customer service agent. The airlines aren’t told why any particular name is on the list, nor do they have special knowledge about how names are added. An airline employee, such as the one who told Prof. Murphy that he might be on the list for political reasons, would have no special knowledge about how names get on the list. In short, the employee must have been speculating about why Prof. Murphy’s name triggered a hit.

It’s well known by now that the no-fly list has many false positives. Senator Ted Kennedy and Congressman John Lewis, among others, seem to trigger false positives. I know a man living in Princeton who triggers false positives every time he flies. Having many false positives is inevitable given that (1) the list is large, and (2) the matching algorithm requires only an approximate match (because flight reservations often have misspelled names). An ordinary false positive is by far the most likely explanation for Prof. Murphy’s experience.

Note, too, that Walter Murphy is a relatively common name, making it more likely that Prof. Murphy was being confused with somebody else. Lycos PeopleSearch finds 181 matches for Walter Murphy and 307 matches for W. Murphy in the U.S. And of course the name on the list could be somebody’s alias. Many false positive stories involve people with relatively common names.

Given all of this, the most likely story by far is that Prof. Murphy triggered an ordinary false positive in the no-fly system. These are very annoying to the affected person, and they happen much too often, but they aren’t targeted at particular people. We can’t entirely rule out the possibility that the name “Walter Murphy” was added to the no-fly list for political reasons, but it seems unlikely.

(The security implications of the false positive rate, and how the rate might be reduced, are interesting issues that will have to wait for another post.)

Judge Geeks Out, Says Cablevision DVR Infringes

In a decision that has triggered much debate, a Federal judge ruled recently that Cablevision’s Digital Video Recorder system infringes the copyrights in TV programs. It’s an unusual decision that deserves some unpacking.

First, some background. The case concerned Digital Video Recorder (DVR) technology, which lets cable TV customers record shows in digital storage and watch them later. TiVo is the best-known DVR technology, but many cable companies offer DVR-enabled set-top boxes.

Most cable-company DVRs are delivered as shiny set-top boxes which contain a computer programmed to store and replay programming, using an onboard hard disc drive for storage. The judge called this a Set-Top Storage DVR, or STS-DVR.

Cablevision’s system worked differently. Rather than putting a computer and hard drive into every consumer’s set-top box, Cablevision implemented the DVR functionality in its own data center. Everything looked the same to the user: you pushed buttons on a remote control to tell the system what to record, and to replay it later. The main difference is that rather than storing your recordings in a hard drive in your set-top box, Cablevision’s system stored them in a region allocated for you in some big storage server in Cablevision’s data center. The judge called this a Remote Storage DVR, or RS-DVR.

STS-DVRs are very similar to VCRs, which the Supreme Court found to be legal, so STS-DVRs are probably okay. Yet the judge found the RS-DVR to be infringing. How did he reach this conclusion?

For starters, the judge geeked out on the technical details. The first part of the opinion describes Cablevision’s implementation in great detail – I’m a techie, and it’s more detail than even I want to know. Only after unloading these details does the judge get around, on page 18 of the opinion, to the kind of procedural background that normally starts on page one or two of an opinion.

This matters because the judge’s ruling seems to hinge on the degree of similarity between RS-DVRs and STS-DVRs. By diving into the details, the judge finds many points of difference, which he uses to justify giving the two types of DVRs different legal treatment. Here’s an example (pp. 25-26):

In any event, Cablevision’s attempt to analogize the RS-DVR to the STS-DVR fails. The RS-DVD may have the look and feel of an STS-DVR … but “under the hood” the two types of DVRs are vastly different. For example, to effectuate the RS-DVR, Cablevision must reconfigure the linear channel programming signals received at its head-end by splitting the APS into a second stream, reformatting it through clamping, and routing it to the Arroyo servers. The STS-DVR does not require these activities. The STS-DVR can record directly to the hard drive located within the set-top box itself; it does not need the complex computer network and constant monitoring by Cablevision personnel necessary for the RS-DVR to record and store programming.

The judge sees the STS-DVR as simpler than the RS-DVR. Perhaps this is because he didn’t go “under the hood” in the STS-DVR, where he would have found a complicated computer system with its own internal stream processing, reformatting, and internal data transmission facilities, as well as complex software to control these functions. It’s not the exact same design as in the RS-DVR, but it’s closer than the judge seems to think.

All of this may have less impact than you might expect, because of the odd way the case was framed. Cablevision, for reasons known only to itself, had waived any fair use arguments, in exchange for the plaintiffs giving up any indirect liability claims (i.e., any claims that Cablevision was enabling infringement by its customers). What remained was a direct infringement claim against Cablevision – a claim that Cablevision itself (rather than its customers) was making copies of the programs – to which Cablevision was not allowed to raise a fair use defense.

The question, in other words, was who was recording the programming. Was Cablevision doing the recording, or were its customers doing the recording? The customers, by using their remote controls to navigate through on-screen menus, directed the technology to record certain programs, and controlled the playback. But the equipment that carried out those commands was owned by Cablevision and (mostly) located in Cablevision buildings. So who was doing the recording? The question doesn’t have a simple answer that I can see.

This general issue of who is responsible for the actions of complex computer systems crops up surprisingly
often in law and policy disputes. There doesn’t seem to be a coherent theory about it, which is too bad, because it will only become more important as systems get more complicated and more tightly intereconnected.

EMI To Sell DRM-Free Music

EMI, the world’s third largest record company, announced yesterday that it will sell its music without DRM (copy protection) on Apple’s iTunes Music Store. Songs will be available in two formats: the original DRMed format for the original $0.99 price, or a higher-fidelity DRM-free format for $1.29.

This is a huge step forward for EMI and the industry. Given the consumer demand for DRM-free music, and the inability of DRM to stop infringement, it was only a matter of time before the industry made this move. But there was considerable reluctance to take the first step, partly because a generation of industry executives had backed DRM-based strategies. The industry orthodoxy has been that DRM (a) reduces infringement a lot, and (b) doesn’t lower customer demand much. But EMI must disbelieve at least one of these two propositions; if not, its new strategy is irrational. (If removing DRM increases piracy a lot but doesn’t create many new customers, then it will cost EMI money.) Now that EMI has broken the ice, the migration to DRM-free music can proceed, to the ultimate benefit of record companies and law-abiding customers alike.

Still, it’s interesting how EMI and Apple decided to do this. The simple step would have been to sell only DRM-free music, at the familiar $0.99 price point, or perhaps at a higher price point. Instead, the companies chose to offer two versions, and to bundle DRM-freedom with higher fidelity, with a differentiated price 30% above the still-available original.

Why bundle higher fidelity with DRM-freedom? It seems unlikely that the customers who want higher fidelity are the same ones who want DRM-freedom. (Cory Doctorow argues that customers who want one are probably less likely to want the other.) Given the importance of the DRM issue to the industry, you’d think they would want good data on customer preferences, such as how many customers will pay thirty cents more to get DRM-freedom. By bundling DRM-freedom with another feature, the new offering will obscure that experiment.

Another possibility is that it’s Apple that wants to obscure the experiment. Apple has taken heat from European antitrust authorities for using DRM to lock customers in to the iTunes/iPod product line; the Euro-authorities would like Apple to open its system. If DRM-free tracks cost thirty cents extra, Apple would in effect be selling freedom from lockin for thirty cents a song – not something Apple wants to do while trying to convince the authorities that lockin isn’t a real problem. By bundling the lockin-freedom with something else (higher fidelity) Apple might obscure the fact that it is charging a premium for lockin-free music.

One effect of selling DRM-free music will be to increase the market for complementary products that make other (lawful) uses of music. Examples include non-Apple music players, jukebox software, collaborative recommendation systems, and so on. (DRM frustrates the use of such complements.) Complements will multiply and improve, which over time will make DRM-free music even more attractive to consumers. This process will take some time, so the full benefits of the new strategy to EMI won’t be evident immediately. Even if the switch to DRM-free music is only a break-even proposition of EMI in the short run, it will look better and better in the long run as complements create customer value, some of which will be capturable by EMI through higher prices or increased sales.

The growth of complements will also increase other companies’ incentives to sell DRM-free music. And each company that switches to DRM-free sales will only intensify this effect, boosting complements more and making DRM-free sales even more attractive to the remaining holdout companies. Expect a kind of tipping effect among the major record companies. This may not happen immediately, but over time it seems pretty much inevitable.

In the meantime, EMI will look like the most customer-friendly and tech-savvy major record company.

FreeConference Suit: Neutrality Fight or Regulatory Squabble?

Last week FreeConference, a company that offers “free” teleconferencing services, sued AT&T for blocking access by AT&T/Cingular customers to FreeConference’s services. FreeConference’s complaint says the blocking is anticompetitive and violates the Communications Act.

FreeConference’s service sets up conference calls that connect a group of callers. Users are given an ordinary long-distance phone number to call. When they call the assigned number, they are connected to their conference call. Users pay nothing beyond the cost of the ordinary long-distance call they’re making.

As of last week, AT&T/Cingular started blocking access to FreeConference’s long-distance numbers from AT&T/Cingular mobile phones. Instead of getting connected to their conference calls, AT&T/Cingular users are getting an error message. AT&T/Cingular has reportedly admitted doing this.

At first glance, this looks like an unfair practice, with AT&T trying to shut down a cheaper competitor that is undercutting AT&T’s lucrative conference-call business. This is the kind of thing net neutrality advocates worry about – though strictly speaking this is happening on the phone network, not the Internet.

The full story is a bit more complicated, and it starts with FreeConference’s mysterious ability to provide conference calls for free. These days many companies provide free services, but they all have some way of generating revenue. FreeConference appears to generate revenue by exploiting the structure of telecom regulation.

When you make a long-distance call, you pay your long-distance provider for the call. The long-distance provider is required to pay connection fees to the local phone companies (or mobile companies) at both ends of the call, to offset the cost of connecting the call to the endpoints. This regulatory framework is a legacy of the AT&T breakup and was justified by the desire to have a competitive long-distance market coexist with local phone carriers that were near-monopolies.

FreeConference gets revenue from these connection fees. It has apparently cut a deal with a local phone carrier under which the carrier accepts calls for FreeConference, and FreeConference gets a cut of the carrier’s connection fees from those calls. If the connection fees are large enough – and apparently they are – this can be a win-win deal for FreeConference and the local carrier.

But of course somebody has to pay the fees. When an AT&T/Cingular customer calls FreeConference, AT&T/Cingular has to pay. They can pass on these fees to their customers, but this hardly seems fair. If I were an AT&T/Cingular customer, I wouldn’t be happy about paying more to subsidize the conference calls of other users.

To add another layer of complexity, it turns out that connection fees vary widely from place to place, ranging roughly from one cent to seven cents per minute. FreeConnection, predictably, has allied itself with a local carrier that gets a high connection fee. By routing its calls to this local carrier, FreeConnection is able to extract more revenue from AT&T/Cingular.

For me, this story illustrates everything that is frustrating about telecom. We start with intricately structured regulation, leading companies to adopt business models shaped by regulation rather than the needs of customers. The result is bewildering to consumers, who end up not knowing which services will work, or having to pay higher prices for mysterious reasons. This leads a techno-legal battle between companies that would, in an ideal world, be spending their time and effort developing better, cheaper products. And ultimately we end up in court, or creating more regulation.

We know a better end state is possible. But how do we get there from here?

[Clarification (2:20 PM): Added the “To add another layer …” paragraph. Thanks to Nathan Williams for pointing out my initial failure to mention the variation in connection fees.]