August 18, 2017

Archives for October 2006

Diebold's Motherboard Flaw: Implications

Yesterday I explained the design error that led Diebold in 2005 to recall and replace the motherboards in thousands of voting machines, most of which had been used in the November 2004 election. Today I’ll talk about how the motherboard flaws might have affected the accuracy of elections.

Machines with flawed boards were normally identified when they “froze” on election day. When personal computers crash, they often manage to reboot themselves, but the Diebold machines don’t reboot themselves on a crash, so any kind of general system crash will make the system freeze. So the bug was usually identified when a voting machine crashed. Mystery crashes typically don’t happen at random times but are concerntrated at certain stages of the machine’s use, because the detailed technical conditions that trigger the crash are more likely to happen at some times than at others.

When did the flawed Diebold machines crash? Here’s the Montgomery County (Maryland) Lessons Learned report from the 2004 election (page 11):

Election judges and technical staff reported that many of these units froze when the voter pressed the Cast Ballot button. This leads to great confusion for judges and voters. The voter leaves the polling place with little or no confidence that their vote was counted. In many cases, the election judges are unable to provide substantial confirmation that the vote was, in fact, counted.

You’d be hard pressed to pick a worse time for a voting machine to crash. The voter has made his selections, confirmed them on the ballot review screen, and now wants them to be recorded. When the Cast Vote button is pressed, the machine reads the intended votes out of its temporary RAM memory and copies them into the official ballot record file, which lives in the machine’s flash memory. If the machine crashes just before the vote is copied, the vote is lost. If it crashes just after the vote is copied, the vote is recorded. It won’t be immediately obvious which case you’re in – hence the confused voters and poll workers.

The kind of design mistake Diebold made – timing errors in the use of RAM chips – crops up in other (non-voting) systems, so we know what kinds of problems it tends to cause. Sometimes it will cause system crashes, but sometimes it will cause data to be corrupted when it gets copied from one place to another. Which is particularly worrisome because the Diebold flaw tends to show up just at the time when the vote is copied into the official record.

And that’s not all. Some other machines failed with Ballot Exception Errors, which happen when the machine’s log file is corrupted – a file that is stored alongside the vote record file, and is also updated when the Cast Vote button is pressed. So we know that some of the records kept by the voting machine (either internally or on removable memory cards) were getting corrupted.

Were votes ever actually corrupted? We’ll never know. If we had a voter-verified paper audit trail, we could compare it to the records kept by the crashed machines. But with only the electronic records to go on, it’s probably impossible to tell.

The good news is that all of the affected motherboards have now been replaced. The bad news is that Diebold knew about these problems in March 2004, and yet they allowed thousands of affected machines to be used in the November 2004 election.

Diebold Quietly Recalled Voting Machine Motherboards

Diebold replaced the motherboard (i.e., the main electronic component) on about 4700 of Maryland’s AccuVote-TS voting machines in 2005, according to Cameron Barr’s story in Thursday’s Washington Post. The company and state officials kept the recall quiet – even some members of the state’s Board of Elections were unaware of it until contacted by the Post. (“If they had asked, we would have told them,” an official said.)

The original motherboards had a design error that caused the machines to become unresponsive, or “freeze”, sometimes during elections. In the 2004 general election, about four percent of Montgomery County’s machines had this problem, according to the county’s 2004 Presidential General Election Review: Lessons Learned report (page 11).

In March 2004, Diebold had sent the state a memo describing the problem in the original motherboards. The memo says that “stack-up of component tolerances” led to timing errors in accessing RAM memory.

Let’s decode that for non-engineer readers. A circuitboard uses many chips or components. The technical specifications for each chip give a set of tolerances, which might say something like this: “If the temperature is between 40 and 140 degrees, and the supply voltage is between 2.9 and 3.1 volts, and a stable signal is delivered on pin 13 for at least 30 nanoseconds, then the chip will respond by sending a signal on pin 19, between 30 and 70 nanoseconds after receiving the pin-13 signal.” This is a promise from the chip’s manufacturer to the system designer. Designers rely on promises like this to make sure their systems will work.

When the designer connects different chips together – when a signal produced by one chip is fed into another one – the designer has to make sure that the signal provided by the first chip falls within the tolerances accepted by the second chip. Otherwise the second chip might not work as advertised, and the overall system might be flaky or simply fail.

But sometimes design errors like these turn out not to cause trouble. If tolerances are just a little bit out of whack, you might just get lucky. Maybe a chip that is guarantted only for voltages over 2.9 volts will still work at 2.88 volts. Maybe a delay guaranteed between 30 and 70 nanoseconds tends to come out on the low end of that range in the batch of chips you got. Or maybe everything works fine, except when something unusual happens – a hot day, or a glitch in the building’s power supply, or an unusual sequence of button presses on the screen. A designer might choose to risk such problems to save money, in an application where reliability isn’t critical. But it shouldn’t happen in a voting machine.

Diebold’s March 2004 memo explains their design problem and says that they redesigned the motherboard to fix the problem. Newly manufactured machines were getting the redesigned motherboards, and any old machines that exhibited problems would have their motherboards replaced. But at that time old machines that hadn’t been seen malfunctioning were left in the field. Diebold estimated that fewer than one percent of the old machines would have problems.

In the November 2004 election, about four percent of Montgomery County machines had screen freezes. Afterward, Diebold decided to recall the old motherboards, replacing them all with new redesigned boards. Today, every Maryland voting machine has one of the new motherboards. Will we see further problems with Diebold’s motherboard design? Only time will tell.

(You may be wondering how these design problems might have affected the accuracy of vote-counting in the 2004 election. I’ll consider that question in the next post.)

Why So Little Attention to Botnets?

Our collective battle against botnets is going badly, according to Ryan Naraine’s recent article in eWeek.

What’s that? You didn’t know we were battling botnets? You’re not alone. Though botnets are a major cause of Internet insecurity problems, few netizens know what they are or how they work.

In this context, a “bot” is a malicious software agent that gets installed on an unsuspecting user’s computer. Bots get onto computers by exploiting security flaws. Once there, they set up camp and wait unobtrusively for instructions. Bots work in groups, called “botnets”, in which many thousands of bots (hundreds of thousands, sometimes) all over the Net work together at the instruction of a remote badguy.

Botnets can send spam or carry out coordinated security attacks on targets elsewhere on the Net. Attacks launched by botnets are very hard to stop because they come from so many places all at once, and tracking down the sources just leads to innocent users with infected computers. There is an active marketplace in which botnets are sold and leased.

Estimates vary, but a reasonable guess is that between one and five percent of the computers on the net are infected with bots. Some computers have more than one bot, although bots nowadays often try to kill each other.

Bots exploit the classic economic externality of network security. A well-designed bot on your computer tries to stay out of your way, only attacking other people. An infection on your computer causes harm to others but not to you, so you have little incentive to prevent the harm.

Nowadays, bots often fight over territory, killing other bots that have infected the same machine, or beefing up the machine’s defenses against new bot infections. For example, Brian Krebs reports that some bots install legitimate antivirus programs to defend their turf.

If bots fight each other, a rationally selfish computer owner might want his computer to be infected by bots that direct their attacks outward. Such bots would help to defend the computer against other bots that might harm the computer owner, e.g. by spying on him. They’d be the online equivalent of the pilot fish that swim into sharks’ mouths with impunity, to clean the sharks’ teeth.

Botnets live today on millions of ordinary users’ computers, leading to nasty attacks. Some experts think we’re losing the war against botnets. Yet there isn’t much public discussion of the problem among nonexperts. Why not?