May 23, 2018

Archives for April 2007

Why So Many False Positives on the No-Fly List?

Yesterday I argued that Walter Murphy’s much-discussed encounter with airport security was probably just a false positive in the no-fly list matching algorithm. Today I want to talk about why false positives (ordinary citizens triggering mistaken “matches” with the list) are so common.

First, a preliminary. It’s often argued that the high false positive rate proves the system is poorly run or even useless. This is not necessarily the case. In running a system like this, we necessarily trade off false positives against false negatives. We can lower either kind of error, but doing so will increase the other kind. The optimal policy will balance the harm from false positives against the harm from false negatives, to minimize total harm. If the consequences of a false positive are relatively minor (brief inconvenience for one traveler), but the consequences of a false negative are much worse (non-negligible probability of multiple deaths), then the optimal choice is to accept many false positives in order to drive the false negative rate way down. In other words, a high false positive rate is not by itself a sign of bad policy or bad management. You can argue that the consequences of error are not really so unbalanced, or that the tradeoff is being made poorly, but your argument can’t rely only on the false positive rate.

Having said that, the system’s high false positive rate still needs explaining.

The fundamental reason for the false positives is that the system matches names , and names are a poor vehicle for identifying people, especially in the context of air travel. Names are not as unique as most people think, and names are frequently misspelled, especially in airline records. Because of the misspellings, you’ll have to do approximate matching, which will make the nonuniqueness problem even worse. The result is many false positives.

Why not use more information to reduce false positives? Why not, for example, use the fact that the Walter Murphy who served in the Marine Corps and used to live near Princeton is not a threat?

The reason is that using that information would have unwanted consequences. First, the airlines would have to gather much more private information about passengers, and they would probably have to verify that information by demanding documentary proof of some kind.

Second, checking that private information against the name on the no-fly list would require bringing together the passenger’s private information with the government’s secret information about the person on the no-fly list. Either the airline can tell the government what it knows about the passenger’s private life, or the government can tell the airline what it knows about the person on the no-fly list. Both options are unattractive.

A clumsy compromise – which the government is apparently making – is to provide a way for people who often trigger false positives to supply more private information, and if that information distinguishes the person from the no-fly list entry, to give the person some kind of “I’m not really on the no-fly list” certificate. This imposes a privacy cost, but only on people who often trigger false positives.

Once you’ve decided to have a no-fly list, a significant false positive rate is nearly inevitable. The bigger policy question is whether, given all of its drawbacks, we should have a no-fly list at all.

Walter Murphy Stopped at Airport: Another False Positive

Blogs are buzzing about the story of Walter Murphy, a retired Princeton professor who reported having triggered a no-fly list match on a recent trip. Prof. Murphy suspects this happened because he has given speeches criticizing the Bush Administration.

I studied the no-fly list mechanism (and the related watchlist) during my service on the TSA’s Secure Flight Working Group. Based on what I learned about the system, I am skeptical of Prof. Murphy’s claim. I think he reached, in good faith, an incorrect conclusion about why he was stopped.

Based on Prof. Murphy’s story, it appears that when his flight reservation was matched against the no-fly list, the result was a “hit”. This is why he was not allowed to check in at curbside but had to talk to an airline employee at the check-in desk. The employee eventually cleared him and gave him a boarding pass.

(Some reports say Prof. Murphy might have matched the watchlist, a list of supposedly less dangerous people, but I think this is unlikely. A watchlist hit would have caused him to be searched at the security checkpoint but would not have led to the extended conversation he had. Other reports say he was chosen at random, which also seems unlikely – I don’t think no-fly list challenges are issued randomly.)

There are two aspects to the no-fly list, one that puts names on the list and another that checks airline reservations against the list. The two parts are almost entirely separate.

Names are put on the list through a secret process; about all we know is that names are added by intelligence and/or law enforcement agencies. We know the official standard for adding a name requires that the person be a sufficiently serious threat to aviation security, but we don’t know what processes, if any, are used to ensure that this standard is followed. In short, nobody outside the intelligence community knows much about how names get on the list.

The airlines check their customers’ reservations against the list, and they deal with customers who are “hits”. Most hits are false positives (innocent people who trigger mistaken hits), who are allowed to fly after talking to an airline customer service agent. The airlines aren’t told why any particular name is on the list, nor do they have special knowledge about how names are added. An airline employee, such as the one who told Prof. Murphy that he might be on the list for political reasons, would have no special knowledge about how names get on the list. In short, the employee must have been speculating about why Prof. Murphy’s name triggered a hit.

It’s well known by now that the no-fly list has many false positives. Senator Ted Kennedy and Congressman John Lewis, among others, seem to trigger false positives. I know a man living in Princeton who triggers false positives every time he flies. Having many false positives is inevitable given that (1) the list is large, and (2) the matching algorithm requires only an approximate match (because flight reservations often have misspelled names). An ordinary false positive is by far the most likely explanation for Prof. Murphy’s experience.

Note, too, that Walter Murphy is a relatively common name, making it more likely that Prof. Murphy was being confused with somebody else. Lycos PeopleSearch finds 181 matches for Walter Murphy and 307 matches for W. Murphy in the U.S. And of course the name on the list could be somebody’s alias. Many false positive stories involve people with relatively common names.

Given all of this, the most likely story by far is that Prof. Murphy triggered an ordinary false positive in the no-fly system. These are very annoying to the affected person, and they happen much too often, but they aren’t targeted at particular people. We can’t entirely rule out the possibility that the name “Walter Murphy” was added to the no-fly list for political reasons, but it seems unlikely.

(The security implications of the false positive rate, and how the rate might be reduced, are interesting issues that will have to wait for another post.)

Judge Geeks Out, Says Cablevision DVR Infringes

In a decision that has triggered much debate, a Federal judge ruled recently that Cablevision’s Digital Video Recorder system infringes the copyrights in TV programs. It’s an unusual decision that deserves some unpacking.

First, some background. The case concerned Digital Video Recorder (DVR) technology, which lets cable TV customers record shows in digital storage and watch them later. TiVo is the best-known DVR technology, but many cable companies offer DVR-enabled set-top boxes.

Most cable-company DVRs are delivered as shiny set-top boxes which contain a computer programmed to store and replay programming, using an onboard hard disc drive for storage. The judge called this a Set-Top Storage DVR, or STS-DVR.

Cablevision’s system worked differently. Rather than putting a computer and hard drive into every consumer’s set-top box, Cablevision implemented the DVR functionality in its own data center. Everything looked the same to the user: you pushed buttons on a remote control to tell the system what to record, and to replay it later. The main difference is that rather than storing your recordings in a hard drive in your set-top box, Cablevision’s system stored them in a region allocated for you in some big storage server in Cablevision’s data center. The judge called this a Remote Storage DVR, or RS-DVR.

STS-DVRs are very similar to VCRs, which the Supreme Court found to be legal, so STS-DVRs are probably okay. Yet the judge found the RS-DVR to be infringing. How did he reach this conclusion?

For starters, the judge geeked out on the technical details. The first part of the opinion describes Cablevision’s implementation in great detail – I’m a techie, and it’s more detail than even I want to know. Only after unloading these details does the judge get around, on page 18 of the opinion, to the kind of procedural background that normally starts on page one or two of an opinion.

This matters because the judge’s ruling seems to hinge on the degree of similarity between RS-DVRs and STS-DVRs. By diving into the details, the judge finds many points of difference, which he uses to justify giving the two types of DVRs different legal treatment. Here’s an example (pp. 25-26):

In any event, Cablevision’s attempt to analogize the RS-DVR to the STS-DVR fails. The RS-DVD may have the look and feel of an STS-DVR … but “under the hood” the two types of DVRs are vastly different. For example, to effectuate the RS-DVR, Cablevision must reconfigure the linear channel programming signals received at its head-end by splitting the APS into a second stream, reformatting it through clamping, and routing it to the Arroyo servers. The STS-DVR does not require these activities. The STS-DVR can record directly to the hard drive located within the set-top box itself; it does not need the complex computer network and constant monitoring by Cablevision personnel necessary for the RS-DVR to record and store programming.

The judge sees the STS-DVR as simpler than the RS-DVR. Perhaps this is because he didn’t go “under the hood” in the STS-DVR, where he would have found a complicated computer system with its own internal stream processing, reformatting, and internal data transmission facilities, as well as complex software to control these functions. It’s not the exact same design as in the RS-DVR, but it’s closer than the judge seems to think.

All of this may have less impact than you might expect, because of the odd way the case was framed. Cablevision, for reasons known only to itself, had waived any fair use arguments, in exchange for the plaintiffs giving up any indirect liability claims (i.e., any claims that Cablevision was enabling infringement by its customers). What remained was a direct infringement claim against Cablevision – a claim that Cablevision itself (rather than its customers) was making copies of the programs – to which Cablevision was not allowed to raise a fair use defense.

The question, in other words, was who was recording the programming. Was Cablevision doing the recording, or were its customers doing the recording? The customers, by using their remote controls to navigate through on-screen menus, directed the technology to record certain programs, and controlled the playback. But the equipment that carried out those commands was owned by Cablevision and (mostly) located in Cablevision buildings. So who was doing the recording? The question doesn’t have a simple answer that I can see.

This general issue of who is responsible for the actions of complex computer systems crops up surprisingly
often in law and policy disputes. There doesn’t seem to be a coherent theory about it, which is too bad, because it will only become more important as systems get more complicated and more tightly intereconnected.