March 7, 2021

Voting Machine Hashcode Testing: Unsurprisingly insecure, and surprisingly insecure

By Andrew Appel and Susan Greenhalgh

The accuracy of a voting machine is dependent on the software that runs it. If that software is corrupted or hacked, it can misreport the votes.  There is a common assumption that we can check the legitimacy of the software that is installed by checking a “hash code” and comparing it to the hash code of the authorized software.  In practice the scheme is supposed to work like this:  Software provided by the voting-machine vendor examines all the installed software in the voting machine, to make sure it’s the right stuff.

There are some flaws in this concept:  it’s hard to find “all the installed software in the voting machine,” because modern computers have many layers underneath what you examine.  But mainly, if a hacker can corrupt the vote-tallying software, perhaps they can corrupt the hash-generating function as well, so that whenever you ask the checker “does the voting machine have the right software installed,” it will say, “Yes, boss.”  Or, if the hasher is designed not to say “yes” or “no,” but to report the hash of what’s installed, it can simply report the hash of what’s supposed to be there, not what’s actually there. For that reason, election security experts never put much reliance in this hash-code idea; instead they insist that you can’t fully trust what software is installed, so you must achieve election integrity by doing recounts or risk-limiting audits of the paper ballots.

But you might have thought that the hash-code could at least help protect against accidental, nonmalicious errors in configuration.  You would be wrong.  It turns out that ES&S has bugs in their hash-code checker:  if the “reference hashcode” is completely missing, then it’ll say “yes, boss, everything is fine” instead of reporting an error.  It’s simultaneously shocking and unsurprising that ES&S’s hashcode checker could contain such a blunder and that it would go unnoticed by the U.S. Election Assistance Commission’s federal certification process. It’s unsurprising because testing naturally tends to focus on “does the system work right when used as intended?”  Using the system in unintended ways (which is what hackers would do) is not something anyone will notice.

Until somebody does notice.  In this case, it was the State of Texas’s voting-machine examiner, Brian Mechler.  In his report dated September 2020 he found this bug in the hash-checking script supplied with the ES&S EVS election system (for the ExpressVote touch-screen BMD, the DS200 in-precinct optical scanner, the DS450 and DS850 high-speed optical scanners, and other related voting machines).  (Read Section 7.2 of Mr. Mechler’s report for details).

We can’t know whether that bug was intentional or not.  Either way, it’s certainly convenient for ES&S, because it’s one less hassle when installing firmware upgrades.  (Of course, it’s one less hassle for potential hackers, too.)

Another gem in Mr. Mechler’s report is in Section 7.1, in which he reveals that acceptance testing of voting systems is done by the vendor, not by the customer.  Acceptance testing is the process by which a customer checks a delivered product to make sure it satisfies requirements.  To have the vendor do acceptance testing pretty much defeats the purpose.  

When the Texas Secretary of State learned that their vendor was doing the acceptance testing themselves, the SoS’s Election Division took an action “to work with ES&S and their Texas customers to better define their roles and responsibilities with respect to acceptance testing,” according to the report. They may encounter a problem, though: the ES&S sales contract specifies that ES&S must perform the acceptance testing, or they will void your warranty (see clause 7b)

There’s another item in Mr. Mechler’s report, Section 7.3.  The U.S. Election Assistance Commission requires that “The vendor shall have a process to verify that the correct software is loaded, that there is no unauthorized software, and that voting system software on voting equipment has not been modified, using the reference information from the [National Software Reference Library] or from a State designated repository. The process used to verify software should be possible to perform without using software installed on the voting system.”  This requirement is usually interpreted to mean, “check the hash code of the installed software against the reference hash code held by the EAC or the State.”

But ES&S’s hash-checker doesn’t do that at all.  Instead, ES&S instructs its techs to create some “golden” hashes from the first installation, then subsequently check the hash code against these.  So whatever software was first installed gets to be “golden”, regardless of whether it’s been approved by the EAC or by the State of Texas. This design decision was probably a convenient shortcut by engineers at ES&S, but it directly violates the EAC’s rules for how hash-checking is supposed to work.

So, what have we learned?

We already knew that hash codes can’t protect against hackers who install vote-stealing software, because the hackers can also install software that lies about the hash code.  But now we’ve learned that hash codes are even more useless than we might have thought.  This voting-machine manufacturer

  • has a hash-code checker that erroneously reports a match, even when you forget to tell it what to match against;
  • checks the hash against what was first installed, not against the authorized reference that they’re supposed to;
  • and the vendor insists on running this check itself — not letting the customer do it — otherwise the warranty is voided.

As a bonus we learned that the EAC certifies voting systems without checking if the validation software functions properly. 

Are we surprised?  You know: fool me once, shame on you; fool me twice, shame on me.  Every time that we imagine that a voting-machine manufacturer might have sound cybersecurity practices, it turns out that they’ve taken shortcuts and they’ve made mistakes.  In this, voting-machine manufacturers are no different from any other makers of software.  There’s lots of insecure software out there made by software engineers who cut corners and don’t pay attention to security, and why should we think that voting machines are any different?

So if we want to trust our elections, we should vote on hand-marked paper ballots, counted by optical scanners, and recountable by hand.  Those optical scanners are pretty accurate when they haven’t been hacked — even the ES&S DS200 — and it’s impractical to count all the ballots without them.  But we should always check up on the machines by doing random audits of the paper ballots.  And those audits should be “strong” enough — that is, use good statistical methods and check enough of the ballots — to catch the mistakes that the machines might make, if the machines make mistakes (or are hacked).  The technical term for those “strong enough” audits is Risk-Limiting Audit.

Andrew W. Appel is Professor of Computer Science at Princeton University.

Susan Greenhalgh is Senior Advisor on Election Security at Free Speech For People.

Georgia’s election certification avoided an even worse nightmare that’s just waiting to happen next time

Voters in Georgia polling places, 2020, used Ballot-Marking Devices (BMDs), touchscreen computers that print out paper ballots; then voters fed those ballots into Precinct-Count Optical Scan (PCOS) voting machines for tabulation. There were many allegations about hacking of Georgia’s Presidential election. Based on the statewide audit, we can know that the PCOS machines were not cheating (in any way that changed the outcome). But can we know that the touchscreen BMDs were not cheating? And what about next time? There’s a nightmare scenario waiting to happen if Georgia (or other states) continue to use touchscreen BMDs on a large scale.

Dominion ICX ballot-marking device used in Georgia polling places 2020. Voters use the touchscreen to select candidates, then a paper ballot is printed out, which the voter then feeds into the scanner for tabulation and for retention in a ballot box.
Dominion ICP optical-scanner used in Georgia polling places 2020.
25% of Georgia voters in 2020 voted by mail; they marked their optical-scan ballot by hand, so they didn’t need to worry about whether the computer that marked their ballot was hacked–no computer marked their ballot! This is a high-speed central-count scanner that counts mail-in ballots; the screen on the right is not a touch-screen for the voter, it’s a control computer for the election administrators. It’s legitimate to worry about whether the optical scanners are hacked—but the hand audits of the paper ballots (by people, not computers) resolved that question in Georgia 2020.

Part 1: What happened in November 2020

There were many allegations about hacking of Georgia’s voting-machine computers in the November 2020 election—accusations about who owned the company that made the voting machines, accusations about who might have hacked into the computers. An important principle of election integrity is “software independence,” which I’ll paraphrase as saying that we should be able to verify the outcome of the election without having to know who wrote the software in the voting machines.

Indeed, the State of Georgia did a manual audit of all the paper ballots in the November 2020 Presidential election. The audit agreed with the outcome claimed by the optical-scan voting machines. This means,

  • The software in Georgia’s PCOS scanners is now irrelevant to the outcome of the 2020 Presidential election in Georgia, which has been confirmed by the audit.
  • Georgia’s PCOS scanners were not cheating in the 2020 Presidential election (certainly not by enough to change the outcome), which we know because the hand-count audits closely agreed with the PCOS counts.
  • The audit gave election officials the opportunity to notice that several batches of ballots hadn’t even been counted the first time; properly counting those ballots changed the vote totals but not the outcome. I’ll discuss that in a future post.

Suppose the polling-place optical scanners had been hacked (enough to change the outcome). Then this would have been detected in the audit, and (in principle) Georgia would have been able to recover by doing a full recount. That’s what we mean when we say optical-scan voting machines have “strong software independence”—you can obtain a trustworthy result even if you’re not sure about the software in the machine on election day.

If Georgia had still been using the paperless touchscreen DRE voting machines that they used from 2003 to 2019, then there would have been no paper ballots to recount, and no way to disprove the allegations that the election was hacked. That would have been a nightmare scenario. I’ll bet that Secretary of State Raffensperger now appreciates why the Federal Court forced him to stop using those DRE machines (Curling v. Raffensperger, Case 1:17-cv-02989-AT Document 579).

But optical scanners are not the only voting machines in Georgia’s polling places. Every in-person Georgia voter uses two machines: first, voters select candidates on a touch-screen ballot-marking device (BMD) that prints out a ballot paper; then, they feed that ballot paper into a precinct-count optical scanner (PCOS). The software independence of BMDs is much more problematic.

The audit confirmed that the PCOS was not cheating. How do we know that the BMD was not cheating, printing different votes onto the ballot paper than what the voter selected on the touch screen? This is a much more difficult question, and it can’t be answered by any audit or recount of the ballot papers.

You might think, “the voter would notice if the ballot paper differs from what they indicated on the touch screen.” But two different scientific studies have shown that most voters don’t notice. Only about 7% of voters speak up if a touchscreen BMD fraudulently prints a wrong vote. And that’s just one estimate from one study—it might actually be overoptimistic.***

Biden got about 50.125% of the votes in Georgia, and Trump got 49.875%. Suppose, hypothetically, that 50.125% of the voters chose Trump, but (hypothetically) hacked BMDs were changing votes on 0.25% of the ballots, in favor of Biden. Then the result we’d see would be Biden 50.125%, and the recount would confirm that—because that’s what’s printed on the paper.

In this scenario, if 7% (1 out of 15) of voters carefully review their paper ballot, and 0.25% (1 out of 400) of paper ballots had votes for Biden when the voter had really chosen Trump, then we might expect 1 out of 6000 (15×400) voters to complain to the pollworkers. And the pollworkers would supposedly tell those voters, “no problem, don’t put that ballot into the PCOS, we’ll void that for you and you can mark a fresh ballot.” But all those other voters who didn’t carefully check the printout would still be voting for a candidate they didn’t intend to, and the hack would be successful.

You might think (in this hypothetical scenario), “at least some voters caught the BMDs cheating”. But even if a voter catches the machine cheating, so what? Election officials can’t void an entire election, or “correct” the vote totals, based on the say-so of 0.017% (that is, 1/6000) of the voters.

Did the touchscreen BMDs cheat in the Georgia 2020 Presidential Election? We can guess that they did not cheat this time, and here’s a weak basis for that guess: If the BMDs had been shifting enough votes from Trump to Biden to make a difference, then at least 0.017% of voters would have noticed. There were 5 million votes cast, so that’s about 83 833 voters statewide**. If those voters complained, then presumably the local news media would have reported contemporaneous reports of such “BMD vote flipping.” But we didn’t hear any such reports.**** So probably the BMDs weren’t flipping any votes.

That’s a pretty weak basis to assert that the BMDs weren’t cheating. But it could be a lot worse . . .

Part 2: The nightmare scenario just waiting to happen next time.

But what about the next election? Suppose in Georgia’s 2022 Senate election between Raphael Warnock and his Republican challenger (whoever that will be), one of those candidates wins with 50.125% of the vote. And suppose 100 voters statewide claim that the BMDs flipped their vote. What should Secretary of State Raffensperger do? He cannot change the election results based on the say-so of 100 voters—those voters might be mistaken (or lying) about what they indicated on the touch screen. He cannot fix it by a recount, because (if the BMDs were really cheating) the paper ballots are fraudulent. He will be in a bind, and there will be no way out. And no way out for the people of Georgia, either.

You might argue, “More than 7% of voters would notice that their paper ballot was incorrectly marked.” Even if that were true (there’s no evidence for it), it just means 2000 or 3000 voters statewide (10 or 20 per county) would have noticed, instead of just 83 833. The problem is the same: even if they notice, there’s no way to correct the election.

The solution is simple.  Voters should mark their optical-scan bubble ballots with a pen.  That way, you know the recount is counting the ballots that the voter actually marked. Touchscreen BMDs (which also have audio interfaces for blind voters) should be reserved for those voters with disabilities who cannot mark a paper ballot by hand.

Georgia should continue using their PCOS (optical scan) voting machines, which will readily count hand-marked optical-scan “bubble” ballots. No major investment in new equipment is needed. This change can easily be implemented before the next election.

And other states and counties that are considering BMDs-for-all-voters—some counties in Pennsylvania and New Jersey have bought those, New York is considering them—should consider the nightmare scenario, and stick with hand-marked paper ballots.

Everything I’ve described here is consistent with the peer-reviewed scientific paper,  Ballot-Marking Devices Cannot Assure the Will of the Voters, by Andrew W. Appel, Richard A. DeMillo, and Philip B. Stark, in Election Law Journal, vol. 19 no. 3, pp. 432-450, September 2020. [non-paywall version here]

Georgia’s law doesn’t actually say what’s required if the audit detects a problem. The law doesn’t specify that audit results are binding on official results. This year that didn’t matter, because the audit agreed with the official outcome.

*Georgia’s audit was done by examining the ballots with human eyes. Later, at the request of the Trump campaign, Georgia also did a recount using their central-count optical scanners. If those optical scanners had been hacked to cheat consistently with (hypothetically) cheating precinct-count optical scanners, then the machine recount wouldn’t catch the fraud. For that reason, a hand-count is more effective protection than a machine recount. In any case, all three counts (the polling-place count using PCOS, the audit, and the machine recount) showed a Biden victory, although their actual numbers of votes differed.

**Actually, this year a large proportion of Georgians voted by mail, on hand-marked paper ballots, so they didn’t use BMDs at all. Those votes are safe from BMD hacks. But it doesn’t change the “83 833 voters statewide” result of my analysis.

***That statistic (“7% of voters will notice if the BMD prints the wrong candidate on their ballot”) comes from a single study in Michigan. Here’s why it might be overoptimistic, as applied to this voting machine and these voters. First, look at the BMD ballot and how hard it is to read.***** In November, one observer watched a constant stream of voters during about 20 minutes in Cobb County: they voted without a glance at their paper ballots, but then they told the poll workers that they had checked them. It is just too much trouble to try to read and check them.  In the January 2021 Senate runoffs, another observer saw that only 6 of 46 voters even glanced at the paper—which is not the same as checking it carefully.

****We would like to think “there was no local news reporting of BMD-flipped votes” means that “BMDs didn’t flip votes”. But so much of Georgia is quite rural with very little local reporting, and certainly without the experience to know how to even report something like that. And (in other elections) it often happens that there are verified stories of discrepancies months after the election that never made it to any newspaper.

*****I mean, really! not easy to decode the paper printout. In the Senate race, this is what the ballot says:

For United States Senate (Loeffler) -
Special (Vote for One) (NP)
   Vote for Annette Davis Jackson

Is that a vote for Kelly Loeffler, whose name appears on the first line? Apparently not, I’d guess it’s a vote for Annette Davis Jackson. And what does (NP) mean? And what does (I) mean attached to votes for many other candidates? Certainly (I) does not mean Independent. This ballot is a masterpiece of bad design, and it’s no wonder that real-life voters are discouraged from looking at it very carefully.

Edited 8 February 2021 to correct 83 to 833.

Using an Old Model for New Questions on Influence Operations

Alicia Wanless, Kristen DeCaires Gall, and Jacob N. Shapiro
Freedom to Tinker:

Expanding the knowledge base around influence operations has proven challenging, despite known threats to elections,COVID-related misinformation circulating worldwide, and recent tragic events at the U.S. Capitol fueled in part by political misinformation and conspiracy theories. Credible, replicable evidence from highly sensitive data can be difficult to obtain. The bridge between industry and academia remains riddled with red tape. Intentional and systemic obstructions continue to hinder research on a range of important questions about how influence operations spread, their effects, and the efficacy of countermeasures.

A key part of the challenge lies in the basic motivations for both industry and academic sectors. Tech companies have little incentive to share sensitive data or allocate resources to an effort that does not end in a commercial product, and may even jeopardize their existing one. As a result, cross-platform advances to manage the spread of influence operations have been limited, with the notable exception of successful counter-terrorism data sharing. Researchers who seek to build relationships with specific companies encounter well-documented obstacles in accessing and sharing information, and subtler ones in the time-consuming process of learning how to navigate internal politics. Companies face difficulties recruiting in-house experts from academia as well, as many scholars worry about publication limitations and lack of autonomy when moving to industry.  

The combination of these factors leaves a gap in research on non-commercial issues, at least in relation to the volume of consumer data tech companies ingest. And, unfortunately, studying influence in a purely academic setting presents all the challenges of normal research—inconsistent funding streams, access to quality data, and retaining motivated research staff—as well as the security and confidentiality issues that accompany any mass transfer of data. 

We are left with a lack of high-quality, long-term research on influence operations. 

Fortunately, a way forward exists. The U.S. government long-ago recognized that neither market nor academic incentives can motivate all the research large organizations need. Following World War II, it created a range of independent research institutions. Among them, the Federally Funded Research and Development Centers (FFRDCs) were created explicitly to “provide federal agencies with R&D capabilities that cannot be effectively met by the federal government or the private sector alone”. FFRDCs – IDAMITRE, and RAND for example – are non-profit organizations funded by Congress for longer periods of time (typically five years) to pursue specific limited research agendas. They are prohibited from competing for other contracts, which enable for-profit firms to share sensitive data with them, even outside of the protections of the national security classification system, and can invest in staffing choices and projects that span short government budget cycles. These organizations bridge the divide between university research centers and for-profit contractors, allowing them to fill critical analytical gaps for important research questions. 

The FFRDC model is far from perfect. Like many government contractors, some have historically had cost inefficiencyand security issues. But by solving a range of execution challenges, they enable important, but not always market-driven research on topics ranging from space exploration, to renewable energy, to cancer treatment. 

Adopting a similar model of a multi-stakeholder research and development center (MRDC) funded by industry and civil society could lay a foundation for collaboration on issues pertaining to misinformation and influence operations by accomplishing five essential tasks

  • Facilitate funding for long-term projects.
    • Provide infrastructure for developing shared research agendas and a mechanism for executing studies.
    • Create conditions that help build trusted, long-term relationships between sectors.
    • Offer career opportunities for talented researchers wishing to do basic research with practical application.
    • Guard against inappropriate disclosures while enabling high-credibility studies with sensitive information that cannot be made public.

The MDRC model fills a very practical need for flexibility and speed on the front end of addressing immediate problems, such as understanding what, if any, role foreign nations played in the discussions which led up to January 6. Such an organization would provide a bridge for academics and practitioners to come together quickly and collaborate for a sustained period, months or years, on real-world operational issues. A research project at a university can take six months to a year to set up funding and fully staff a project. Furthermore, most universities, and even organizations like the Stanford Internet Observatory fully dedicated to these issues, cannot do “work for hire”. Meaning, if there’s no unique intellectual product or no true research question at hand, their ability to work on a given problem is limited or non-existent. An established contract organization that clearly owns a topic, fully staffed with experts in house, minimizes these hindrances.

Because an MDRC focused on influence operations does not fit neatly into existing organizational structures, its initial setup should be an iterative process. It should start with two or more tech companies joining with a cluster of academic organizations on a discrete set of deliverables, all with firm security agreements in place. Once the initial set of projects proves the model’s value, and plans for budgets and researcher time are solidified, the organization could be expanded. The negative impact of internet platforms’ impact on society did not grow over night, and we certainly do not expect the solution to either. And, tempting as it is to think the U.S. government could simply fund such an institution, it likely needs to remain independent of government funding in order to avoid collusion concerns from the international community. 

Steps toward bridging the gap between academia and the social media firms have already taken place. Facebook’s recent provision of academic access to Crowdtangle, meant in part to provide increased transparency on influence operations and disinformation, is a good step, as is its data-sharing partnership with several universities to look at election-related content. Such efforts will enable some work currently stymied by data sharing, but they do not address the deeper incentive-related issues. 

Establishing a long-term MDRC around the study of influence operations and misinformation is more crucial than ever. It is a logical way forward to address these questions at the scale they deserve.