November 24, 2024

Walter Murphy Stopped at Airport: Another False Positive

Blogs are buzzing about the story of Walter Murphy, a retired Princeton professor who reported having triggered a no-fly list match on a recent trip. Prof. Murphy suspects this happened because he has given speeches criticizing the Bush Administration.

I studied the no-fly list mechanism (and the related watchlist) during my service on the TSA’s Secure Flight Working Group. Based on what I learned about the system, I am skeptical of Prof. Murphy’s claim. I think he reached, in good faith, an incorrect conclusion about why he was stopped.

Based on Prof. Murphy’s story, it appears that when his flight reservation was matched against the no-fly list, the result was a “hit”. This is why he was not allowed to check in at curbside but had to talk to an airline employee at the check-in desk. The employee eventually cleared him and gave him a boarding pass.

(Some reports say Prof. Murphy might have matched the watchlist, a list of supposedly less dangerous people, but I think this is unlikely. A watchlist hit would have caused him to be searched at the security checkpoint but would not have led to the extended conversation he had. Other reports say he was chosen at random, which also seems unlikely – I don’t think no-fly list challenges are issued randomly.)

There are two aspects to the no-fly list, one that puts names on the list and another that checks airline reservations against the list. The two parts are almost entirely separate.

Names are put on the list through a secret process; about all we know is that names are added by intelligence and/or law enforcement agencies. We know the official standard for adding a name requires that the person be a sufficiently serious threat to aviation security, but we don’t know what processes, if any, are used to ensure that this standard is followed. In short, nobody outside the intelligence community knows much about how names get on the list.

The airlines check their customers’ reservations against the list, and they deal with customers who are “hits”. Most hits are false positives (innocent people who trigger mistaken hits), who are allowed to fly after talking to an airline customer service agent. The airlines aren’t told why any particular name is on the list, nor do they have special knowledge about how names are added. An airline employee, such as the one who told Prof. Murphy that he might be on the list for political reasons, would have no special knowledge about how names get on the list. In short, the employee must have been speculating about why Prof. Murphy’s name triggered a hit.

It’s well known by now that the no-fly list has many false positives. Senator Ted Kennedy and Congressman John Lewis, among others, seem to trigger false positives. I know a man living in Princeton who triggers false positives every time he flies. Having many false positives is inevitable given that (1) the list is large, and (2) the matching algorithm requires only an approximate match (because flight reservations often have misspelled names). An ordinary false positive is by far the most likely explanation for Prof. Murphy’s experience.

Note, too, that Walter Murphy is a relatively common name, making it more likely that Prof. Murphy was being confused with somebody else. Lycos PeopleSearch finds 181 matches for Walter Murphy and 307 matches for W. Murphy in the U.S. And of course the name on the list could be somebody’s alias. Many false positive stories involve people with relatively common names.

Given all of this, the most likely story by far is that Prof. Murphy triggered an ordinary false positive in the no-fly system. These are very annoying to the affected person, and they happen much too often, but they aren’t targeted at particular people. We can’t entirely rule out the possibility that the name “Walter Murphy” was added to the no-fly list for political reasons, but it seems unlikely.

(The security implications of the false positive rate, and how the rate might be reduced, are interesting issues that will have to wait for another post.)

OLPC: Too Much Innovation?

The One Laptop Per Child (OLPC) project is rightly getting lots of attention in the tech world. The idea – putting serious computing and communication technologies into the hands of kids all over the world – could be transformative, if it works.

Recently our security reading group at Princeton studied BitFrost, the security architecture for OLPC. After the discussion I couldn’t help thinking that BitFrost seemed too innovative.

“Too innovative?” you ask. What’s wrong with innovation? Let me explain. Though tech pundits often praise “innovation” in the abstract, the fact is that most would-be innovations fail. In engineering, most new ideas either don’t work or aren’t really an improvement over the status quo. Sometimes the same “new” idea pops up over and over, reinvented each time by someone who doesn’t know about the idea’s past failures.

In the long run, failures are weeded out and the few successes catch on, so the world gets better. But in the short run most innovations fail, which makes the urge to innovate dangerous.

Fred Brooks, in his groundbreaking The Mythical Man-Month, referred to the second-system effect:

An architect’s first work is apt to be spare and clean. He knows he doesn’t know what he’s doing, so he does it carefully and with great restraint.

As he designs the first work, frill after frill and embellishment after embellishment occur to him. These get stored away to be used “next time.” Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.

This second is the most dangerous system a man ever designs. When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.

The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says, is a “big pile.”

The danger, in the second sytem, is the desire to reinvent everything, to replace the flawed but serviceable approaches of the past. The third-system designer, having learned his (or her – things have changed since Brooks wrote) lesson, knows to innovate only in the lab, or in a product only where innovation is necessary.

But here’s the OLPC security specification (lines 115-118):

What makes the OLPC XO laptops radically different is that they represent the first time that all these security measures have been carefully put together on a system slated to be introduced to tens or hundreds of millions of users.

OLPC needs to be innovative in some areas, but I don’t think security is one of them. Sure, it would be nice to have a better security model, but until we know that model is workable in practice, it seems risky to try it out on millions of kids.

How Computers Can Make Voting More Secure

By now there is overwhelming evidence that today’s paperless computer-based voting technologies have such serious security and reliability problems that we should not be using them. Computers can’t do the job by themselves; but what role should they play in voting?

It’s tempting to eliminate computers entirely, returning to old-fashioned paper voting, but I think this is a mistake. Paper has an important role, as I’ll describe below, but paper systems are subject to well-known problems such as ballot-box stuffing and chain voting, as well as other user-interface and logistical challenges.

Security does require some role for paper. Each vote must be recorded in a manner that is directly verified by the voter. And the system must be software-independent, meaning that its accuracy cannot rely on the correct functioning of any software system. Today’s paperless e-voting systems satisfy neither requirement, and the only practical way to meet the requirements is to use paper.

The proper role for computers, then, is to backstop the paper system, to improve it. What we want is not a computerized voting system, but a computer-augmented one.

This mindset changes how we think about the role of computers. Instead of trying to make computers do everything, we will look instead for weaknesses and gaps in the paper system, and ask how computers can plug them.

There are two main ways computers can help. The first is in helping voters cast their votes. Computers can check for errors in ballots, for example by detecting an invalid ballot while the voter is still in a position to fix it. Computers can present the ballot in audio format for the blind or illiterate, or in multiple languages. (Of course, badly designed computer interfaces can do harm, so we have to be careful.) There must be a voter-verified paper record at the end of the vote-casting process, but computers, used correctly, can help voters create and validate that record, by acting as ballot-marking devices or as scanners to help voters spot mismarked ballots.

The second way computers can help is by improving security. Usually the e-voting security debate is about how to keep computers from making security too much worse than it was before. Given the design of today’s e-voting systems, this is appropriate – just bringing these systems up to the level of security and reliability in (say) the Xbox and Wii game consoles would be nice. Even in a computer-augmented system, we’ll need to do a better job of vetting the computers’ design – if a job is worth doing with a computer, it’s worth doing correctly.

But once we adopt the mindset of augmenting a paper-based system, security looks less like a problem and more like an opportunity. We can look for the security weaknesses of paper-based systems, and ask how computers can help to address them. For example, paper-based systems are subject to ballot-box stuffing – how can computers reduce this risk?

Surprisingly, the designs of current e-voting technologies, even the ones with paper trails, don’t do all they can to compensate for the weaknesses of paper. For example, the current systems I’ve seen keep electronic records that are subject to straightforward post-election tampering. Researchers have studied approaches to this problem, but as far as I know none are used in practice.

In future posts, we’ll discuss design ideas for computer-augmented voting.

Manipulating Reputation Systems

BoingBoing points to a nice pair of articles by Annalee Newitz on how people manipulate online reputation systems like eBay’s user ratings, Digg, and so on.

There’s a myth floating around that such systems distill an uncannily accurate folk judgment from the votes submitted by millions of ordinary citizens. The wisdom of crowds, and all that. In fact, reputation systems are fraught with problems, and the most important systems survive because companies expend great effort to supplement the algorithms by investigating abuse and trying to compensate for it. eBay, for example, reportedly works very hard to fight abuse of its reputation system.

Why do people put more faith in reputation systems than the systems really deserve? One reason is the compelling but not entirely accurate analogy to the power of personal reputations in small town gossip networks. If a small-town merchant is accused of cheating a customer, everyone in town will find out quickly and – here’s where the analogy goes off the rails – individual townspeople will make nuanced judgments based on the details of the story, the character of the participants, and their own personal experiences. The reason this works is that the merchant, the customer, and the person evaluating the story are embedded in a complex, densely interconnected network.

When the network of participants gets much bigger and the interconnections much sparser, there is no guarantee that the same system will still work. Even if it does work, a large-scale system might succeed for different reasons than the small-town system. What we need is some kind of theory: some kind of explanation for why a reputation system can succeed. Our theory, whatever it is, will have to account for the desires and incentives of participants, the effect of relevant social norms, and so on.

The incentive problem is especially challenging for recommendation services like Digg. Digg assumes that users will cast votes for the sites they like. If I vote for sites that I really do like, this will mostly benefit strangers (by helping them find something cool to read). But if I sell my votes or cast them for sites run by my friends and me, I will benefit more directly. In short, my incentive is to cheat. These sorts of problems seem likely to get worse as a service grows, because the stakes will grow and the sense of community may weaken.

It seems to me that reputation systems are a fruitful area for technical, economic and social research. I know there is research going on already – and readers will probably chastise me in the comments for not citing it all – but we’re still far from understanding online reputation.

Sarasota: Could a Bug Have Lost Votes?

At this point, we still don’t know what caused the high undervote rate in Sarasota’s Congressional election. [Background: 1, 2.] There are two theories. The State-commissioned study released last week argues that for the theory that a badly designed ballot caused many voters to not see that race and therefore not cast a vote.

Today I want to make the case for the other theory: that a malfunction or bug in the voting machines caused votes to be not recorded. The case sits on four pillars: (1) The postulated behavior is consistent with a common type of computer bug. (2) Similar bugs have been found in voting machines before. (3) The state-commissioned study would have been unlikely to find such a bug. (4) Studies of voting data show patterns that point to the bug theory.

(1) The postulated behavior is consistent with a common type of computer bug.

Programmers know the kind of bug I’m talking about: an error in memory management, or a buffer overrun, or a race condition, which causes subtle corruption in a program’s data structures. Such bugs are maddeningly hard to find, because the problem isn’t evident immediately but the corrupted data causes the program to go wrong in subtle ways later. These bugs often seem to be intermittent or “random”, striking sometimes but lying dormant at other times, and seeming to strike more or less frequently depending on the time of day or other seemingly irrelevant factors. Every experienced programmer tells horror stories about such bugs.

Such a bug is consistent with the patterns we saw in the election. Undervotes didn’t happen to every voter, but they did happen in every precinct, though with different frequency in different places.

(2) Similar bugs have been found in voting machines before.

We know of at least two examples of similar bugs in voting machines that were used in real elections. After problems in Maryland voting machines caused intermittent “freezing” behavior, the vendor recalled the motherboards of 4700 voting machines to remedy a hardware design error.

Another example, this time caused by a software bug, was described by David Jefferson:

In the volume testing of 96 Diebold TSx machines … in the summer of 2005, we had an enormously high crash rate: over 20% of the machines crashed during the course of one election day’s worth of votes. These crashes always occurred either at the end of one voting transaction when the voter touched the CAST button, or right at the beginning of the next voter’s session when the voter SmartCard was inserted.

It turned out that, after a huge effort on Diebold’s part, a [Graphical User Interface] bug was discovered. If a voter touched the CAST button a sloppily, and dragged his/her finger from the button across a line into another nearby window (something that apparently happened with only one of every 400 or 500 voters) an exception would be signaled. But the exception was not handled properly, leading to stack corruption or heap corruption (it was never clear to us which), which apparently invariably lead to the crash. Whether it caused other problems also, such as vote corruption, or audit log corruption, was never determined, at least to my knowledge. Diebold fixed this bug, and at least TSx machines are free of it now.

These are the two examples we know about, but note that neither of these examples was made known to the public right away.

(3) The State-commissioned study would have been unlikely to find such a bug.

The State of Florida study team included some excellent computer scientists, but they had only a short time to do their study, and the scope of their study was limited. They did not perform the kind of time-consuming dynamic testing that one would use in an all-out hunt for such a bug. To their credit, they did the best they could given the limited time and tools they had, but they would have had to get lucky to find such a bug if it existed. Their failure to find such a bug is not strong evidence that a bug does not exist.

(4) Studies of voting data show patterns that point to the bug theory.

Several groups have studied detailed data on the Sarasota election results, looking for patterns that might help explain what happened.

One of the key questions is whether there are systematic differences in undervote rate between individual voting machines. The reason this matters is that if the ballot design theory is correct, then the likelihood that a particular voter undervoted would be independent of which specific machine the voter used – all voting machines displayed the same ballot. But an intermittent bug might well manifest itself differently depending on the details of how each voting machine was set up and used. So if undervote rates depend on attributes of the machines, rather than attributes of the voters, this tends to point toward the bug theory.

Of course, one has to be careful to disentangle the possible causes. For example, if two voting machines sit in different precincts, they will see different voter populations, so their undervote rate might differ even if the machines are exactly identical. Good data analysis must control for such factors or at least explain why they are not corrupting the results.

There are two serious studies that point to machine-dependent results. First, Mebane and Dill found that machines that had a certain error message in their logs had a higher undervote rate. According to the State study, this error message was caused by a particular method used by poll workers to wake the machines up in the morning; so the use of this method correlated with higher undervote rate.

Second, Charles Stewart, an MIT political scientist testifying for the Jennings campaign in the litigation, looked at how the undervote rate depended on when the voting machine was “cleared and tested”, an operation used to prepare the machine for use. Stewart found that machines that were cleared and tested later (closer to Election Day) had a higher undervote rate, and that machines that were cleared and tested on the same day as many other machines also had a higher undervote rate. One possibility is that clearing and testing a machine in a hurry, as the election deadline approached or just on a busy day, contributed to the undervote rate somehow.

Both studies indicate a link between the details of a how a machine was set up and used, and the undervote rate on that machine. That’s the kind of thing we’d expect to see with an intermittent bug, but not if undervotes were caused strictly by ballot design and user confusion.

Conclusion

What conclusion can we draw? Certainly we cannot say that a bug definitely caused undervotes. But we can say with confidence that the bug theory is still in the running, and needs to be considered alongside the ballot design theory as a possible cause of the Sarasota undervotes. If we want to get to the bottom of this, we need to investigate further, by looking more deeply into undervote patterns, and by examining the voting machine hardware and software.

[Correction (Feb. 28): I changed part (3) to say that the team “had” only a short time to do their sstudy. I originally wrote that they “were given” only a short time, which left the impression that the state had set a time limit for the study. As I understand it, the state did not impose such a time limit. I apologize for the error.]