May 18, 2024

License for an open-source voting system?

Back when we were putting together the grant proposal for ACCURATE, one of the questions that we asked ourselves, and which the NSF people asked us as well, was whether we would produce a “bright shiny object,” which is to say whether or not we would produce a functional voting machine that could ostensibly be put to use in a real election.  Our decision at the time, and it was certainly the correct decision, is that we would focus on innovating in the technology under the covers of a voting system, and we might produce, at most “research prototypes”.  The difference between a research prototype and a genuine, commercial system are typically quite substantial, in general, and it would be no different here with voting system prototypes.

At Rice we built a fairly substantial prototype that we call “VoteBox”; you can read more about it in a paper that will appear on Friday at Usenix Security.  To grossly summarize, our prototype feels a lot like a normal DRE voting system, but uses some nice cryptographic machinery to ensure that you don’t have to trust that the code is correct.  You can verify the correctness of a machine, on the fly, while the election is ongoing.  Our prototype is missing a couple features that you’d want from a commercial system, like write-in voting, but it’s far enough along that it’s been used in several human-factors experiments (CHI’08, Everett’07).

This summer, our mission is to get this thing shipped as some sort of “open source” project.  Now we have several goals in this:

  • Allow other researchers to benefit from our infrastructure as a platform to do their own research.
  • Inspire commercial voting system vendors to build better products (i.e., solving the hard design problems for them, to reduce their cost for adopting innovative techniques).
  • Allow commercial voting system vendors to build on our source code, itself.

All well and good.  Now the question is how we should actually license the thing.  There are many, many different models under which we could proceed:

  • Closed source + patents + licenses.  This may or may not yield revenues for our university, and may or may not be attractive to vendors.  It’s clearly unattractive to other researchers, and would limit uptake of our system in places where we might not even think to look, such as outside the U.S.
  • Open source + a “not for commercial use” license.  This makes it a little easier for other researchers to pick up and modify the software although ownership issues could get tricky.
  • Open source with a “BSD”-style license. A BSD-style license effectively says “do whatever you want, just give us credit for our work and you’re on your own if it doesn’t work.”   This sort of license tends to maximize the ease with which companies can adopt your work.
  • Open source with a “GPL”-style license.  The GPL has an interesting property for the voting system world: it makes any derivatives of the source code as open as the original code (unless a vendor reimplements it from scratch).  This sort of openness is something we want to encourage in voting system vendors, but it might reduce vendor willingness to use the codebase.
  • Open source with a “publication required” licenseJoe Hall suggested this as another option.  Like a BSD license, anybody can use it in a product, but the company would be compelled to publish the source code, like a book.  Unlike GPL, they would not be required to give up copyright or allow any downstream use of their code.

I did a quick survey of several open source voting systems.  Most are distributed under the GPL:

  • Adder
  • eVACS (old version is GPL; new version is proprietary)
  • Helios (code not yet released; most likely GPL according to Ben Adida)
  • OVC (GPL with extensions to require change histories be maintained)
  • Pvote

Civitas is distributed under a non-commercial-use only license.  VoteHere, at one point, opened its code for independent evaluation (but not reuse), but I can’t find it any more.  It was probably a variant on the non-commercial-use only theme.  Punchscan is distributed under a BSD-style license.

My question to the peanut gallery: what sort of license would you select for a bright, shiny new voting system project and why?

[Extra food for thought: The GPLv3 would have some interesting interactions with voting systems.  For starters, there’s a question of who, exactly, a “user” might be.  Is that the county clerk whose office bought it, or the person who ultimately votes on it?  Also, you have section 3, which forbids any attempt to limit reverse-engineering or “circumvention” of the product.  I suppose that means that garden-variety tampering with a voting machine would still violate various laws, but the vendor couldn’t sue you for it.  Perhaps more interesting is section 6, which talks about how source code must be made available when you ship compiled software.  A vendor could perhaps give the source code only to its “customers” without giving it to everybody (again, depending on who a “user” is).  Of course, any such customer is free under the GPL to redistribute the code to anybody else.  Voting vendors may be most scared away by section 11, which talks about compulsory patent licensing (depending, of course, on the particulars of their patent portfolios).]

Vendor misinformation in the e-voting world

Last week, I testified before the Texas House Committee on Elections (you can read my testimony).  I’ve done this many times before, but I figured this time would be different.  This time, I was armed with the research from the California “Top to Bottom” reports and the Ohio EVEREST reports.  I was part of the Hart InterCivic source code team for California’s analysis.  I knew the problems.  I was prepared to discuss them at length.

Wow, was I disappointed.  Here’s a quote from Peter Lichtenheld, speaking on behalf of Hart InterCivic:

Security reviews of the Hart system as tested in California, Colorado, and Ohio were conducted by people who were given unfettered access to code, equipment, tools and time and they had no threat model.  While this may provide some information about system architecture in a way that casts light on questions of security, it should not be mistaken for a realistic approximation of what happens in an election environment.  In a realistic election environment, the technology is enhanced by elections professionals and procedures, and those professionals safeguard equipment and passwords, and physical barriers are there to inhibit tampering.  Additionally, jurisdiction ballot count, audit, and reconciliation processes safeguard against voter fraud.

You can find the whole hearing online (via RealAudio streaming), where you will hear the Diebold/Premier representative, as well as David Beirne, the director of their trade organization, saying essentially the same thing.  Since this seems to be the voting system vendors’ party line, let’s spend some time analyzing it.

Did our work cast light on questions of security? Our work found a wide variety of flaws, most notably the possibility of “viral” attacks, where a single corrupted voting machine could spread that corruption, as part of regular processes and procedures, to every other voting system.  In effect, one attacker, corrupting one machine, could arrange for every voting system in the county to be corrupt in the subsequent election.  That’s a big deal.

At this point, the scientific evidence is in, it’s overwhelming, and it’s indisputable.  The current generation of DRE voting systems have a wide variety of dangerous security flaws.  There’s simply no justification for the vendors to be making excuses or otherwise downplaying the clear scientific consensus on the quality of their products.

Were we given unfettered access? The big difference between what we had and what an attacker might have is that we had some (but not nearly all) source code to the system.  An attacker who arranged for some equipment to “fall off the back of a truck” would be able to extract all of the software, in binary form, and then would need to go through a tedious process of reverse engineering before reaching parity with the access we had. The lack of source code has demonstrably failed to do much to slow down attackers who find holes in other commercial software products.  Debugging and decompilation tools are really quite sophisticated these days.  All this means is that an attacker would need additional time to do the same work that we did.

Did we have a threat model? Absolutely!  See chapter three of our report, conveniently titled “Threat Model.”  The different teams working on the top to bottom report collaborated together to draft this chapter. It talks about attackers’ goals, levels of access, and different variations on how sophisticated an attacker might be.  It is hard to accept that the vendors can get away with claiming that the reports did not have a threat model, when a simple check of the table of contents of the reports disproves their claim.

Was our work a “realistic approximation” of what happens in a real election? When the vendors call our work “unrealistic”, they usually mean one of two things:

  1. Real attackers couldn’t discover these vulnerabilities
  2. The attackers can’t be exploited in the real world.

Both of these arguments are wrong. In real elections, individual voting machines are not terribly well safeguarded.  In a studio where I take swing dance lessons, I found a rack of eSlates two weeks after the election in which they were used.  They were in their normal cases.  There were no security seals.  (I didn’t touch them, but I did have a very good look around.) That’s more than sufficient access for an attacker wanting to tamper with a voting machine.  Likewise, Ed Felten has a series of Tinker posts about unguarded voting machines in Princeton.

Can an attacker learn enough about these machines to construct the attacks we described in our report? This sort of thing would need to be done in private, where a team of smart attackers could carefully reverse engineer the machine and piece together the attack.  I’ll estimate that it would take a group of four talented people, working full time, two to three months of effort to do it.  Once.  After that, you’ve got your evil attack software, ready to go, with only minutes of effort to boot a single eSlate, install the malicious software patch, and then it’s off to the races.  The attack would only need to be installed on a single eSlate per county in order to spread to every other eSlate.  The election professionals and procedures would be helpless to prevent it.  (Hart has a “hash code testing” mechanism that’s meant to determine if an eSlate is running authentic software, but it’s trivial to defeat.  See issues 9 through 12 in our report.)

What about auditing, reconciliation, “logic and accuracy” testing, and other related procedures? Again, all easily defeated by a sophisticated attacker.  Generally speaking, there are several different kinds of tests that DRE systems support.  “Self-tests” are trivial for malicious software to detect, allowing the malicious software to either disable and fake the test results, or simply behave correctly.  Most “logic and accuracy” tests boil down to casting a handful of votes for each candidate and then doing a tally.  Malicious software might simply behave correctly until more than a handful of votes have been received.  Likewise, malicious software might just look at the clock and behave correctly unless it’s the proper election day.  Parallel testing is about pulling machines out of service and casting what appears to be completely normal votes on them while the real election is ongoing.  This may or may not detect malicious software, but nobody in Texas does parallel testing.  Auditing and reconciliation are all about comparing different records of the same event.  If you’ve got a voter-verified paper audit trail (VVPAT) attachment to a DRE, then you could compare it with the electronic records.  Texas has not yet certified any VVPAT printers, so those won’t help here.  (The VVPAT printers sold by current DRE vendors have other problems, but that’s a topic for another day.) The “redundant” memories in the DREs are all that you’ve got left to audit or reconcile.  Our work shows how this redundancy is unhelpful against security threats; malicious code will simply modify all of the copies in synchrony.

Later, the Hart representative remarked:

The Hart system is the only system approved as-is for the November 2007 general election after the top to bottom review in California.

This line of argument depends on the fact that most of Hart’s customers will never bother to read our actual report.  As it turns out, this was largely true in the initial rules from the CA Secretary of State, but you need to read the current rules, which were released several months later.  The new rules, in light of the viral threat against Hart systems, requires the back-end system (“SERVO”) to be rebooted after each and every eSlate is connected to it.  That’s hardly “as-is”.  If you have thousands of eSlates, properly managing an election with them will be exceptionally painful.  If you only have one eSlate per precinct, as California required for the other vendors, with most votes cast on optical-scanned paper ballots, you would have a much more manageable election.

What’s it all mean? Unsurprisingly, the vendors and their trade organization are spinning the results of these studies, as best they can, in an attempt to downplay their significance.  Hopefully, legislators and election administrators are smart enough to grasp the vendors’ behavior for what it actually is and take appropriate steps to bolster our election integrity.

Until then, the bottom line is that many jurisdictions in Texas and elsewhere in the country will be using e-voting equipment this November with known security vulnerabilities, and the procedures and controls they are using will not be sufficient to either prevent or detect sophisticated attacks on their e-voting equipment. While there are procedures with the capability to detect many of these attacks (e.g., post-election auditing of voter-verified paper records), Texas has not certified such equipment for use in the state.  Texas’s DREs are simply vulnerable to and undefended against attacks.

CORRECTION: In the comments, Tom points out that Travis County (Austin) does perform parallel tests.  Other Texas counties don’t.  This means that some classes of malicious machine behavior could potentially be discovered in Travis County.

Counterfeits, Trojan Horses, and shady distributors

Last Friday, the New York Times published an article about counterfeit Cisco products that have been sold as if they were genuine and are widely used throughout the U.S. government.  The article also raised the concern that these counterfeits could well be engineered with malicious intent, but that this appears not to have been the case. There was an immediate Slashdot thread as well, but a number of issues are still worth commenting on.

First things first: the facts, as best we understand them.  The New York Times reports that approximately 3500 counterfeit Cisco components (worth $3.5M) have been discovered as a result of a two-year FBI investigation.  A Cisco spokesman is quoted saying that they found “no evidence of re-engineering.”  In other words, we’re talking about faithful knock-offs of legitimate products.

If you go to the FBI’s unclassified PowerPoint presentation (dated January 11, 2008), you’ll see all the actual information.  This is a fascinating read.  For starters, let’s talk about the cost.  The slides claim you can get a counterfeit router for approximately 1/6 the cost of a genuine router.  (You can do similarly well buying used gear on eBay.)  The counterfeit gear looks an awful lot like the genuine article.  Detecting differences here is as difficult as detecting counterfeit money, counterfeit Rolex watches, or counterfeit signatures from sports stars.  Given the apparent discrepancy between component cost and street value, we should be no more surprised to find knock-off Cisco gear than we are to find knock-off everything else.

Counterfeit vs. Original Cisco line card

It’s claimed that these counterfeits are built to lower manufacturing standards than the original equipment, causing higher failure rates. One even caught fire due to a faulty power supply.  Likewise, the fakers are making stupid errors, like building multiple components with the same MAC address.  (MAC addresses, by design, are meant to be unique – no two ever the same.)

The really interesting story is all about the supply chain. Consider how you might buy yourself a new Mac.  You could go to your local Apple store.  Or you could get it from any of a variety of other stores, who in turn may have gotten it from Apple directly or may have gone through a distributor.  Apparently, for Cisco gear, it’s much more complicated than that.  The U.S. government buys from “approved” vendors, who might then buy from multiple tiers of sub-contractors.  In one case, one person bought shady gear from eBay and resold it to the government, moving a total of $1M in gear before he was caught.  In a more complicated case, Lockheed Martin won a bid for a U.S. Navy project.  They contracted with an unauthorized Cisco reseller who in turn contracted with somebody else, who used a sub-contractor, who then directly shipped the counterfeit gear to the Navy. (The slides say that $250K worth of counterfeit gear was sold; duplicate serial numbers were discovered.)

Why is this happening?  The Government wants to save money, so they look for contractors who can give them the best price, and their contracts allow for subcontracts, direct third-party shipping, and so forth.  There is no serious vetting of this supply chain by either Cisco or the government. Apparently, Cisco doesn’t do direct sales except for high-end, specialized gear.  You’d think Cisco would follow the lead of the airline industry, among others, and cut out the distributors to keep the profit for themselves.

Okay, on to the speculation.  Both the New York Times and the FBI presentation concern themselves with Trojan Horses.  Even though there’s no evidence that any of this counterfeit gear was actually malicious, the weak controls in the supply chain make it awfully easy for such compromised gear to be sold into sensitive parts of the government, raising all the obvious concerns.

Consider a recent paper by U. Illinois’s Sam King et al. where they built a “malicious processor”.  The idea is pretty clever.  You send along a “secret knock” (e.g., a network packet with a particular header) which triggers a sensor that enables “shadow code” to start running alongside the real operating system.  The Illinois team built shadow code that compromised the Linux login program, adding a backdoor password.  After the backdoor was tripped, it would disable the shadow code, thus going back to “normal” operation.

The military is awfully worried about this sort of threat, as well they should be.  For that matter, so are voting machine critics. It’s awfully easy for “stealth” malicious behavior to exist in legitimate systems, regardless of how carefully you might analyze or test it. Ken Thompson’s classic paper, Reflections on Trusting Trust, shows how he designed a clever Trojan Horse for Unix.  [Edit: it’s unclear that it ever got released into the wild.]

Okay everybody, let’s put on our evil hats.  If your goal was to get a Trojan Horse router into a sensitive military environment, how would you do it and how would it behave?  Clearly, the weak supply chain is an excellent vector for getting the gear into place.  Given the resources of a nation-state intelligence agency, you could afford to buy genuine Cisco parts and modify them, rather than using low-cost, counterfeit gear.  Nobody would detect you; you wouldn’t screw up and ship multiple boxes with the same serial number.

How will you implement your Trojan Horse logic?  Pretty much any gear you’ll ever find of any modest complexity will have software running inside it.  Even line cards have embedded processors of some sort.  For all that hardware, there’s software, and that’s what you’d go to install your logic bomb.  The increasing use of FPGAs in industrial designs means you could also “rewire” those parts to behave arbitrarily, much like the Illinois hack; you’d really want to get a hold of the original VHDL “source code”, leveraging your aforementioned spying prowess, to simplify the design and implementation of your malicious behavior.  Hacking the raw netlists (the FPGA-equivalent of machine code) would be possible, but would be far more painful. [See Sidebar.]

What sort of behavior would you build in?  The New York Times raises the idea of a kill switch.  I send your router a magic packet and it dies.  That’s too easy.  How about I send your router a magic packet, it then forwards it on to all of its peers, repeatedly, and then they all die a few seconds later?  That’s a pretty good denial of service attack (nevermind a plot device that was the basis of a popular science fiction television series). Alternatively, following the Illinois idea, we could imagine that the magic packet turns on a monitoring feature, allowing our intelligence agency to gather all kinds of information, reconfigure the router, and so forth.  If they don’t want to generate extra traffic, which might be detected, they could instead weaken the encryption of a VPN tunnel, perhaps publishing the session key through a subliminal channel of some sort, acquiring the ciphertext through “other” means.

In summary, it’s probably a good thing, from the perspective of the U.S. military, to discover that their supply chain is allowing counterfeit gear into production.  This will help them clean up the supply chain, and will also provide an extra push to consider just how much they trust the sources of their equipment to ship clean software and hardware.

[Sidebar: Xilinx supports a notion of “encrypting” a netlist.  Broadly speaking, the idea behind the technology is to encrypt the description of your FPGA configuration with a crypto key, such that anybody who reads the file out of your board gets encrypted garbage.  However, the FPGA has the key material to decrypt the configuration and then initialize itself normally.  This sort of technology is meant to serve an anti-piracy / anti-reverse-engineering purpose.  It could ostensibly also serve an anti-Trojan Horse purpose, although at that point it’s really no more or less secure, semantically, than Microsoft’s Authenticode.  This technology, more broadly, is also an active research area (see, for example, Roy et al.’s EPIC: Ending Piracy of Integrated Circuits).  Again, if we’ve got a nation-state intelligence service tampering with the system, none of this is going to provide meaningful protection for the end-user against Trojan Horses.]

spammers gone wild

I’m sure this sort of behavior is old news, but it’s still really annoying.  Starting last night and continuing as I’m writing this, some annoying spammer has been forging my email address as the “From” line of a variety of spams.  This is causing a staggering volume of backscatter, mostly of the “Delivery Status Notification (failure)” variety.  Sampling these messages, I’m seeing several interesting things.

  1. The spammer is using my proper email address (dwallach@…) on each message, but a different “real” name on each one.  The name “Dan Wallach” does not appear anywhere.
  2. I forward everything to Gmail.  Gmail considers all of this backscatter to be spam.  That’s probably the correct answer, but I’m not sure I want to train my own DSPAM to do the same thing.  (DSPAM runs locally, and then I save a local copy and forward to Gmail.)  If I send a real message and it legitimately bounces, I want to know about it.  If I train DSPAM that all of these delivery status notifications are spam, it will inevitably throw away anything from “mailer-daemon”.  I’m unclear on whether that’s good or bad.
  3. You could easily build a bounce-message validator.  Every backscatter seems to have the original message ID in it, somewhere.  If the backscatter mentions a message ID that my system actually generated, then the backscatter is allowed.  Otherwise it’s dropped.  (This idea appears to be a variation of VERP; I’d make the message ID be a keyed MAC of a sequence number.)
  4. A large number of these spams have a message body consisting entirely of “Take a look at yourself :)”  and linking to “video.exe” on a variety of different web sites.  Gmail helpfully rewrites those links such that they can track that I clicked on it.  This would also seem to give them an opportunity to give me an anti-virus warning, but they don’t do any such thing.  (“video.exe” is one of the common names used by the Storm worm.)
  5. Many spams include links that redirect through Google’s PageAd server to yet another server.  I clicked on one of them.  It appears that the PageAd redirector worked, but then Firefox’s “badware” detector caught the destination as being bad, ultimately taking me to stopbadware.org.  Go Firefox!
  6. Some legit antispam firewall products (including Barracuda) are helpfully telling me my message “was blocked by our Spam Firewall. The email you sent with the following subject has NOT BEEN DELIVERED”.  This is clearly broken behavior.  Just drop it and move on!
  7. Several of the backscatter messages are actually validation messages (sender address verification).  This has been largely discredited due to a variety of practical problems, never mind common-case annoyance to normal users.
  8. One of the spammers seems to be quite keen to sell replicas of expensive wristwatches, and those links take you to some kind of seemingly real online store, albeit with a funky DNS name.  Somehow, even if I did want a fake expensive watch, I’m not sure I’d be comfortable typing my credit card number into a web site whose name is a list of random characters and who (clearly) is closely related to the underworld of lecherous spammers.

EDIT: fixed post that had gone out before it was done.

voting ID requirements and the Supreme Court

Last week, I posted here about voter ID requirements.  There was a case pending before the U.S. Supreme Court on the same topic.  It seems Indiana was trying to require voters to present ID in order to vote.  Lawsuit.  In the end, the court found that the requirement wasn’t particularly onerous (the New York Times’s article is as good as any for a basic summary, or go straight to the ruling).

Unsurprisingly, there has been a lot of hang-wringing on this (see, for example, this New York Times unsigned editorial).  We can expect similar legislation elsewhere now that the Court has made it pretty difficult to challenge these sorts of laws (see, for example, the ongoing battle to pass this sort of legislation in Texas).

As I wrote last time, I’m not particularly opposed to voters being required to present ID.  However, ID needs to be easy to get for anybody who is elgible to vote.  For most people, this is easy.  The big question we’d all like to know is the size of the population for which it’s not easy.  Consider, as a hypothetical example, an elderly Texas woman who never drove a car.  If she’s over 75 years old, the state’s centralized birth certificate registry won’t (officially) have her records.  It could well require detective work to produce sufficient documentation to get her a state ID card.  Who’s going to pay for that?

The big technical question, of course, is whether the root desires behind the voter ID requirement can be addressed in some more effective fashion than ID requirement.  What are those root desires?

  1. Prevent legitimate citizens from registering to vote and voting in more than one locale
  2. Prevent registered voters from casting multiple votes in their own name
  3. Prevent registered voters from impersonating other registered voters
  4. Prevent anyone, including malicious poll workers, from casting votes on behalf of registered voters who have chosen not to vote
  5. Prevent non-eligible people (non-citizens, felons, etc.) from registering to vote
  6. Detect changes in registered voters’ eligibility status, quickly and accurately

Which problems can be solved by purple ink on a voter’s thumb?  #1 and #2 are readily solved, since a second attempt to vote will be forbidden.  #3 is disincentivized, because the impersonator will be unable to vote under his or her own name.  #4-6 will require other technologies.

Okay, which problems can be solved by having required voter ID?  Let’s assume, for the sake of discussion, we have a centralized state database keyed off the voter’s ID card number, but individual polling places do not have real-time access to this database.  Also, let’s assume that voter ID cards do not have any computational power: no smart cards, no crypto, etc.  #1 is ostensibly solved by the central database.  #2 cannot be prevented (at least, in a world with early voting or voting centers, where a voter has multiple places where he or she can legitimately vote), but it can be detected, and is thus disincentivized.  #3 is solved.  #4 is largely unsolved: if malicious poll workers want to forge signatures in the poll book, they may or may not be detected.  (In a recount situation, written signatures should be verified, but it’s unclear what the accuracy of that checking process might be.)

You could try to solve #4 with smartcards that issue digital signatures, but that’s a whole different can of worms.  Since the smartcard doesn’t really know what it’s being asked to sign, this could be exploited by an attacker.  (Example: you need to present your ID in a variety of different circumstances, such as proving your age to enter a bar.  The bouncer could “swipe” your card and use that as a way of getting a forged signature on an election record.)

What about #5 and #6?  These are really back-end database problems.  Requiring voters to present ID doesn’t have any impact.  However, having a database that is keyed off the voters’ ID cards significantly improves #5 and #6 and could ostensibly help reduce a variety of errors in the process.

Curiously, it seems that most of the benefit of requiring ID occurs in the back-end database, rather than on the day of the election.  The only real benefit of presenting ID, on election day, occurs in vote centers, early voting locations, and so forth.  When there may be millions of eligible voters who could use a vote center, traditional paper poll books are unworkable.  With a database keyed from ID card numbers, a voter’s records can be efficiently looked up and verified.  While this isn’t a security problem, improving the efficiency of the voting process is still a worthwhile goal.