At this point, we still don’t know what caused the high undervote rate in Sarasota’s Congressional election. [Background: 1, 2.] There are two theories. The State-commissioned study released last week argues that for the theory that a badly designed ballot caused many voters to not see that race and therefore not cast a vote.
Today I want to make the case for the other theory: that a malfunction or bug in the voting machines caused votes to be not recorded. The case sits on four pillars: (1) The postulated behavior is consistent with a common type of computer bug. (2) Similar bugs have been found in voting machines before. (3) The state-commissioned study would have been unlikely to find such a bug. (4) Studies of voting data show patterns that point to the bug theory.
(1) The postulated behavior is consistent with a common type of computer bug.
Programmers know the kind of bug I’m talking about: an error in memory management, or a buffer overrun, or a race condition, which causes subtle corruption in a program’s data structures. Such bugs are maddeningly hard to find, because the problem isn’t evident immediately but the corrupted data causes the program to go wrong in subtle ways later. These bugs often seem to be intermittent or “random”, striking sometimes but lying dormant at other times, and seeming to strike more or less frequently depending on the time of day or other seemingly irrelevant factors. Every experienced programmer tells horror stories about such bugs.
Such a bug is consistent with the patterns we saw in the election. Undervotes didn’t happen to every voter, but they did happen in every precinct, though with different frequency in different places.
(2) Similar bugs have been found in voting machines before.
We know of at least two examples of similar bugs in voting machines that were used in real elections. After problems in Maryland voting machines caused intermittent “freezing” behavior, the vendor recalled the motherboards of 4700 voting machines to remedy a hardware design error.
Another example, this time caused by a software bug, was described by David Jefferson:
In the volume testing of 96 Diebold TSx machines … in the summer of 2005, we had an enormously high crash rate: over 20% of the machines crashed during the course of one election day’s worth of votes. These crashes always occurred either at the end of one voting transaction when the voter touched the CAST button, or right at the beginning of the next voter’s session when the voter SmartCard was inserted.
It turned out that, after a huge effort on Diebold’s part, a [Graphical User Interface] bug was discovered. If a voter touched the CAST button a sloppily, and dragged his/her finger from the button across a line into another nearby window (something that apparently happened with only one of every 400 or 500 voters) an exception would be signaled. But the exception was not handled properly, leading to stack corruption or heap corruption (it was never clear to us which), which apparently invariably lead to the crash. Whether it caused other problems also, such as vote corruption, or audit log corruption, was never determined, at least to my knowledge. Diebold fixed this bug, and at least TSx machines are free of it now.
These are the two examples we know about, but note that neither of these examples was made known to the public right away.
(3) The State-commissioned study would have been unlikely to find such a bug.
The State of Florida study team included some excellent computer scientists, but they had only a short time to do their study, and the scope of their study was limited. They did not perform the kind of time-consuming dynamic testing that one would use in an all-out hunt for such a bug. To their credit, they did the best they could given the limited time and tools they had, but they would have had to get lucky to find such a bug if it existed. Their failure to find such a bug is not strong evidence that a bug does not exist.
(4) Studies of voting data show patterns that point to the bug theory.
Several groups have studied detailed data on the Sarasota election results, looking for patterns that might help explain what happened.
One of the key questions is whether there are systematic differences in undervote rate between individual voting machines. The reason this matters is that if the ballot design theory is correct, then the likelihood that a particular voter undervoted would be independent of which specific machine the voter used – all voting machines displayed the same ballot. But an intermittent bug might well manifest itself differently depending on the details of how each voting machine was set up and used. So if undervote rates depend on attributes of the machines, rather than attributes of the voters, this tends to point toward the bug theory.
Of course, one has to be careful to disentangle the possible causes. For example, if two voting machines sit in different precincts, they will see different voter populations, so their undervote rate might differ even if the machines are exactly identical. Good data analysis must control for such factors or at least explain why they are not corrupting the results.
There are two serious studies that point to machine-dependent results. First, Mebane and Dill found that machines that had a certain error message in their logs had a higher undervote rate. According to the State study, this error message was caused by a particular method used by poll workers to wake the machines up in the morning; so the use of this method correlated with higher undervote rate.
Second, Charles Stewart, an MIT political scientist testifying for the Jennings campaign in the litigation, looked at how the undervote rate depended on when the voting machine was “cleared and tested”, an operation used to prepare the machine for use. Stewart found that machines that were cleared and tested later (closer to Election Day) had a higher undervote rate, and that machines that were cleared and tested on the same day as many other machines also had a higher undervote rate. One possibility is that clearing and testing a machine in a hurry, as the election deadline approached or just on a busy day, contributed to the undervote rate somehow.
Both studies indicate a link between the details of a how a machine was set up and used, and the undervote rate on that machine. That’s the kind of thing we’d expect to see with an intermittent bug, but not if undervotes were caused strictly by ballot design and user confusion.
Conclusion
What conclusion can we draw? Certainly we cannot say that a bug definitely caused undervotes. But we can say with confidence that the bug theory is still in the running, and needs to be considered alongside the ballot design theory as a possible cause of the Sarasota undervotes. If we want to get to the bottom of this, we need to investigate further, by looking more deeply into undervote patterns, and by examining the voting machine hardware and software.
[Correction (Feb. 28): I changed part (3) to say that the team “had” only a short time to do their sstudy. I originally wrote that they “were given” only a short time, which left the impression that the state had set a time limit for the study. As I understand it, the state did not impose such a time limit. I apologize for the error.]
What does this say about the use of electronic voting machines that lack voter verification of the recorded ballot?
[Excuse the repost. The blog software choked on a less-than sign. I suggest more parallel testing ;-)]
That they’re inappropriate for use in elections. But that’s true of VVPT machines, too. Even if voters effectively “verify” a VVPT (Selker’s study suggests less than 3% error detection rate), VVPT machines can simply cancel voters’ ballots after they leave their machines, switch their votes, and reprint matching VVPTs. See http://vote.nist.gov/threats/PaperTrailManipulationIII1.pdf for more details. Then there are presentation attacks (e.g., omitting candidates from the ballot, rearranging the candidates, modulating touch-screen sensitivity to make it more difficult to select certain candidates…), delay- or denial-of-service attacks, and the plain old “flip the vote and bet the voter won’t notice, or if she notices, nothing will come of it” attack.
One might plausibly argue that the risks of electronic ballot presentation and recording are acceptable for voters who cannot otherwise vote independently. But it’s plain as a pikestaff (apologies to Mr. Gamgee) that non-disabled voters receive very little benefit in exchange for those risks. Indeed, it would appear — least in Sarasota — that voters didn’t receive even much protection from accidental undervoting.
That they’re inappropriate for use in elections. But that’s true of VVPT machines, too. Even if voters effectively “verify” a VVPT (Selker’s study suggests
So, the precinct clerks were told to warn the voters and a bright red warning appeared on the summary screen, but we are still supposed to conclude that the voters themselves were the cause of this incredibly large undervote in a hotly contested race. This seems to lead to more questions, not answres.
Whether you agree with Ed or not, there certainly seems to be a need for more investigation.
And my original question remains. What does this say about the use of electronic voting machines that lack voter verification of the recorded ballot?
T&S, To correct the record, here are two quotes from the report that go beyond “no evidence”. Hope you will correct your corresponding web posting.
“The team’s unanimous opinion is that the iVotronic firmware, including faults that we identified, did not cause or contribute to the CD13 undervote.” par 1.3
“We are confident that no iVotronic firmware bug contributed to the CD13 undervote. ” par 9.1
If there was a bug, it would have been seen in the parallel testing. Even Jennings expert has stated that he’s never seen a bug that was not revealed in parallel testing. And recall that Jennings team selected the machines that were tested.
As a non-USian, i’m unclear as to exactly what statistics on each election are released.
But is enough released that some of the new work on applying Benfords Law to election statistics could be performed by the wider community?
This wouldn’t be fool proof for a single election – but if these machines are being widely used then it should provide a useful indicator as to whether the errors are consistently present.
Bob, There is a warning to the voter when he/she does not vote in a race. The last screen a voter sees is a summary screen which says in bright red that the voter did not cast a vote in whichever race.
Why wasn’t there a warning to the voters […]?
Bob,
Fri, Nov 3, 2007 email from KDent@[redacted], subject: CRITICAL
Kathy Dent is the Sarasota County Supervisor of Elections. This wanrning was apparently in response to anomalies detected in the early voting process.
Whether poor ballot design or software bug, isn’t this type of disenfranchisement of voters a terrible indictment of DREs, especially those without a voter-verified paper record? Why wasn’t there a warning to the voters, as an optical scanner would have done if a voter missed a race like this? The administrator of elections in Maryland uses the poor ballot design theory as an argument for sticking with paperless DREs, but I think it is just the opposite.
Bud,
According to the report, the code is stored in EPROM, and the system uses several different types of memory which might be affected differently. The hardware problems in the Diebold motherboards (mentioned in the main post) seemed to relate to the Flash memories used to store votes. Given all of this, I don’t think it’s safe to assume that hardware memory problems would be distributed evenly through the address space.
Ed, My points one and three are esentially related. The existence of statistically significant patterns related to partinsanship negates the random bug possibility.
I would also note the finer point of the ballot design flaw theories. One is that people simply missed the race. While this probably had some effect, the summary screen limits this impact. It also does not explain why there was an even higher undervote in the attorney general race in the neighboring two counties using the same equipment, particularly because the AG race was below the governor’s race on the same page and therefore harder to miss.
What is more striking is that there were only three counties that placed the governor and one other candidate on a single touch screen page: Sarasota with CD 13, and the two other CD 13 counties where the governor’s race was placed on the same page with the AG’s race. Those are the three highest undervoted races in those counties and throughout the state. No one else is even close. I do not think the explanation for this result is a bug but instead a human factors explanation, which is likely what the audit was referring to.
Go ahead, blame the hardware. I’m used to it! 🙂
Seriously, this doesn’t exhibit behavior consistant with a random hardware bug that would corrupt memory. If a bit or byte or two gets corrupted by a hardware process, it’s just as likely to happen in a portion of the memory which is “program” as it is in data, unless, of course, these machines are built using a uC that I’m not aware of that is a true Harvard machine. If the error occurs in “program” space, it will either completely hose the machine, or (for most modern machines with vast memory spaces) direct it to an unused section of memory space, where it will sit until the watchdog timer resets it. Neither of these behaviors (crazy results or resets) were reported. Note that the two examples you cite as support both caused resets.
Alec,
The report doesn’t just ask us to take the word of the study members, but instead (to its credit) lays out the evidence that was gathered along with arguments for the report’s conclusions. This post is part of my attempt to evaluate those arguments, based on the evidence.
Given my professional respect for members of the team, it certainly does matter to me that the team members believe the bug theory can be eliminated. But in light of the evidence, including evidence not mentioned in the report, such as Stewart’s analysis, I respectfully disagree with that part of the report’s conclusions.
(Regarding the time constraint issue, I have corrected the original post.)
Sarasota,
Regarding your three points:
(1) I disagree. Buffer overloads are far from the only cause of this kind of problem. Type-checking errors, off-by-one errors, timing errors, and so on can all be causes. As I wrote in the original post, most experienced programmers have seen bugs that manifest occasionally and unpredictably. These are the hardest sorts of bugs to track down.
(2) Good point. That was sloppy writing on my part. I have corrected the main post.
(3) These patterns are interesting and possibly instructive. But I don’t think they’re very helpful in distinguishing the ballot-design theory from the bug theory.
Ed, Your commentary is wrong for a variety of reason. First, you are essentially suggesting a buffer overload theory, which results in the machines crashing or freezing. There is no evidence that the machines froze or crashed. Even Jennings’ experts disagree with this theory.
Second, the report of the eight independent experts stresses that there were no external time limits imposed on their review of th data. They were not “given only a short time to do their study.”
Third, studies of the undervotes show patterns based on strength of party affiliation. For example, voters who voted strongly republican or democrat were more likely to undervote.
[W]e can say with confidence that the bug theory is still in the running, and needs to be considered alongside the ballot design theory as a possible cause of the Sarasota undervotes.
Patrick Whittle writes in the Sarasota Herald Tribune, under the headline, “New vote machines may be delayed” (28 Feb 2007):
As Alec points out, there is consensus among all eight computer scientists that the problems reported by voters should be marked:
UNABLE TO REPRODUCE: STATUS CLOSED.
ES&S sent a “software bug” memo to FLorida SOEs in August of 2006 that there was a problem with a “smoothing filter” that would possibly delay the recording of the voter’s selections. This delay would be longer than expected, and the voter might move on before the vote was recorded.
ES&S recommended putting signs in the voting booths to warn voters, and also recommended a “software patch” prior to the November election.
I do not know if all or any Florida machines ever received that patch, or if the patch was distributed, if it was put on every single voting machine.
Further, if the patch was applied, was it tested? Did it work uniformly on all machines, including those that were ADA enabled?
See that memo here: http://www.ncvoter.net/downloads/ESS_Aug_2006_iVotronic_FL_memo.pdf
If ballot style was the sole cause of FL 13 undervote, then we in North Carolina should have had far worse problems in our iVotronic counties.
Here is Sarasota FL 13’s ballot http://www.ncvoter.net/downloads/sarasota_ballot_style.pdf
Now take a look at what appears to be a more confusing ballot style
for Mecklenburg County NC, the NC 08 ballot:
http://www.ncvoter.net/downloads/Mecklenburg_2006_ballot.pdf
(notice the nearly hidden placement of the US congressional race?)
Meckelnburg had a 4 % undervote rate for that contest.
Here is a memo from the NC State Board of Elections explaining the differences in NC iVotronics and the FL iVotronics, as well as a ballot comparison:
http://www.ncvoter.net/downloads/Sarasota_NC_Ballot_Comparison_06.pdf
Ed, I disagree with you on “strong evidence”. Concensus between all eight computer scientists that conducted the review is strong evidence.
Also to correct the record, we were not under any time constraint. The team completed our work on our own schedule.
Frankly, I’m not convinced the report told us anything relevant. The bulk of the report seemed to be based on a source code review of the voting machine’s software. For the most part the report seemed to find the source code in fairly good shape, and on that basis found the software essentially reliable and not a likely cause of the observed undervotes. But how do we know that the object code that was running in the machine on voting day came from the source code that was reviewed? How would an investigator be able to conclusively determine that a second set of object code, from a second source codebase, wasn’t used instead on election day? Was there an intact chain of custody for the machines for the interval from the the close of polls to the opening of the investigation? Classic “two sets of books” situation.
I’m going to disagree with you on this, Ed, for one simple reason: when I was examining the photos of the voting screens in the report’s appendix, I, too, failed to notice the ballot question of interest at first. Yes, that’s anecdotal, and I did eventually see the ballot question as I continued looking at the screen, but I can easily see how some people would speed right through without voting at all.
Three things made the ballot question slip under the radar: one, there were two ballot questions on that screen. Two, the area on the second page of the ballot where the tricky question was located was partially occupied by a superfluous header (the “official ballot blah blah”) on the first page. And three, there was no boldface type in the header for the ballot question in, er, question, while there *was* boldface type in the other two questions shown in the screenshots; the boldface type and generally large/dense header for the Governor question (bottom of page 2) draws the eye down away from the House race just above it.
Now, that doesn’t mean that there *wasn’t* a bug at work here, and voting systems are second only to life-sustaining/protecting systems when it comes to a demand for correctness, but a judicious shave with Occam’s razor leads me to believe that in this specific case, worrying about a bug is like swatting a fly on a dead horse.
A bad pointer in a linked-list or B-tree style structure causes chunks of the list or tree to be chopped away. Variables may keep values from last cycle because something doesn’t get cleared properly at the start of a cycle. These things are easy to detect with code analysis tools especially when running modular tests in some sort of simulation environment. Once the code gets wrapped up into a finished product you can never properly flush out such bugs.
I remember I had a Netcomm ADSL modem that now and then needed rebooting. It would work perfectly for days and sometimes for weeks but then for no reason it would just fail line sync until it got power cycled. Full commercial product with at least a million users and it still had bugs which struck at random times but not often enough that you can’t put up with the problem.
Everyone knows that the only way to be sure of what you get is to have fully transparent source code subject to independent analysis, but when you come down to it, paper ballots and human counters do a perfectly good job and enjoy universal acceptance. Voting machines are an example of vendor push technology with neither cost advantage nor accuracy advantage for the user and really no genuine demand. Go back to paper ballots… there are plenty of useful jobs for the computers elsewhere.
So I’m not familiar with the physical design, but I’m reminded of Hugh Thompson’s airline hack. If there is a buffer overrun, can you invoke it by holding down a key?