September 18, 2018

Serious design flaw in ESS ExpressVote touchscreen: “permission to cheat”

Kansas, Delaware, and New Jersey are in the process of purchasing voting machines with a serious design flaw, and they should reconsider while there is still time!

Over the past 15 years, almost all the states have moved away from paperless touchscreen voting systems (DREs) to optical-scan paper ballots.  They’ve done so because if a paperless touchscreen is hacked to give fraudulent results, there’s no way to know and no way to correct; but if an optical scanner were hacked to give fraudulent results, the fraud could be detected by a random audit of the paper ballots that the voters actually marked, and corrected by a recount of those paper ballots.

Optical-scan ballots marked by the voters are the most straightforward way to make sure that the computers are not manipulating the vote.  Second-best, in my opinion, is the use of a ballot-marking device (BMD), where the voter uses a touchscreen to choose candidates, then the touchscreen prints out an optical-scan ballot that the voter can then deposit in a ballot box or into an optical scanner.  Why is this second-best?  Because (1) most voters are not very good at inspecting their computer-marked ballot carefully, so hacked BMDs could change some choices and the voter might not notice, or might notice and think it’s the voter’s own error; and (2) the dispute-resolution mechanism is unclear; pollworkers can’t tell if it’s the machine’s fault or your fault; at best you raise your hand and get a new ballot, try again, and this time the machine “knows” not to cheat.

Third best is “DRE with paper trail”, where the paper ballot prints out behind glass; the voter can inspect it, but it can be difficult and discouraging to read a long ballot behind glass, and there’s pressure just to press the “accept” button and get on with it.  With hand-marked optical-scan ballots there’s much less pressure to hurry:  you’re not holding up the line at the voting machine, you’re sitting at one of the many cheap cardboard privacy screens with a pen and a piece of paper, and you don’t approach the optical scanner until you’re satisfied with your ballot.  That’s why states (such as North Carolina) that had previously permitted  “DRE with paper trail” moved last year to all optical-scan.

Now there’s an even worse option than “DRE with paper trail;”  I call it “press this button if it’s OK for the machine to cheat” option.   The country’s biggest vendor of voting machines, ES&S, has a line of voting machines called ExpressVote.  Some of these are optical scanners (which are fine), and others are “combination” machines, basically a ballot-marking device and an optical scanner all rolled into one.

This video shows a demonstration of ExpressVote all-in-one touchscreens purchased by Johnson County, Kansas.  The voter brings a blank ballot to the machine, inserts it into a slot, chooses candidates.  Then the machine prints those choices onto the blank ballot and spits it out for the voter to inspect.  If the voter is satisfied, she inserts it back into the slot, where it is counted (and dropped into a sealed ballot box for possible recount or audit).

So far this seems OK, except that the process is a bit cumbersome and not completely intuitive (watch the video for yourself).  It still suffers from the problems I describe above: voter may not carefully review all the choices, especially in down-ballot races; counties need to buy a lot more voting machines, because voters occupy the machine for a long time (in contrast to op-scan ballots, where they occupy a cheap cardboard privacy screen).

But here’s the amazingly bad feature:  “The version that we have has an option for both ways,” [Johnson County Election Commissioner Ronnie] Metsker said. “We instruct the voters to print their ballots so that they can review their paper ballots, but they’re not required to do so. If they want to press the button ‘cast ballot,’ it will cast the ballot, but if they do so they are doing so with full knowledge that they will not see their ballot card, it will instead be cast, scanned, tabulated and dropped in the secure ballot container at the backside of the machine.”  [TYT Investigates, article by Jennifer Cohn, September 6, 2018]

Now it’s easy for a hacked machine to cheat undetectably!  All the fraudulent vote-counting program has to do is wait until the voter chooses between “cast ballot without inspecting” and “inspect ballot before casting”.  If the latter, then don’t cheat on this ballot.  If the former, then change votes how it likes, and print those fraudulent votes on the paper ballot, knowing that the voter has already given up the right to look at it.

Johnson County should not have bought these machines; if they’re going to use them, they must insist that ES&S disable this “permission to cheat” feature.

Union County New Jersey and the entire state of Delaware are (to the best of my knowledge) in the process of purchasing ExpressVote XL machines, which are like the touchscreens shown in the video but with a much larger screen that can show the whole ballot at once.  New Jersey and Delaware should not buy these machines.  If they insist on buying them, they must disable the “permission to cheat” feature.

Of course, if the permission-to-cheat feature is disabled, that reverts to the cumbersome process shown in the video: (1) receive your bar-code card and blank ballot from the election worker; (2) insert the blank ballot card into the machine; (3) insert the bar-code card into the machine; (4) make choices on the screen; (5) press the “done” button; (6) wait for the paper ballot to be ejected; (7) compare the choices listed on the ballot with the ones you made on the screen; (8) put the ballot back into the machine.

Wouldn’t it be better to use conventional optical-scan balloting, as most states do?  (1) receive your optical-scan ballot from the election worker;  (2) fill in the ovals with a pen, behind a privacy screen; (3) bring your ballot to the optical scanner; (4) feed your ballot into the optical scanner.

I thank Professor Philip Stark (interviewed in the TYT article cited above) for bringing this to my attention.

 

Privacy, ethics, and data access: A case study of the Fragile Families Challenge

This blog post summarizes a paper describing the privacy and ethics process by which we organized the Fragile Families Challenge. The paper will appear in a special issue of the journal Socius.

Academic researchers, companies, and governments holding data face a fundamental tension between risk to respondents and benefits to science. On one hand, these data custodians might like to share data with a wide and diverse set of researchers in order to maximize possible benefits to science. On the other hand, the data custodians might like to keep data locked away in order to protect the privacy of those whose information is in the data. Our paper is about the process we used to handle this fundamental tension in one particular setting: the Fragile Families Challenge, a scientific mass collaboration designed to yield insights that could improve the lives of disadvantaged children in the United States. We wrote this paper not because we believe we eliminated privacy risk, but because others might benefit from our process and improve upon it.

One scientific objective of the Fragile Families Challenge was to maximize predictive performance of adolescent outcomes (i.e. high school GPA) measured at approximately age 15 given a set of background variables measured from birth through age 9. We aimed to do so using the Common Task Framework (see Donoho 2017, Section 6): we would share data with researchers who would build predictive models using observed outcomes for half of the cases (the training set). These researchers would receive instant feedback on out-of-sample performance in ⅛ of the cases (the leaderboard set) and ultimately be evaluated by performance in ⅜ of the cases which we would keep hidden until the end of the Challenge (the holdout set). If scientific benefit was the only goal, the optimal design might be to include every possible variable in the background set and share with anyone who wanted access with no restrictions.

Scientific benefit may be maximized by sharing data widely, but risk to respondents is also maximized by doing so. Although methods of data sharing with provable privacy guarantees are an active area of research, we believed that solutions that could offer provable guarantees were not possible in our setting without a substantial loss of scientific benefit (see Section 2.4 of our paper). Instead, we engaged in a privacy and ethics process that involved threat modeling, threat mitigation, and third-party guidance, all undertaken within an ethical framework.

 

Threat modeling

Our primary concern was a risk of re-identification. Although our data did not contain obvious identifiers, we worried that an adversary could find an auxiliary dataset containing identifiers as well as key variables also present in our data. If so, they could link our dataset to the identifiers (either perfectly or probabilistically) to re-identify at least some rows in the data. For instance, Sweeney was able to re-identify Massachusetts medical records data by linking to identified voter records using the shared variables zip code, date of birth, and sex. Given the vast number of auxiliary datasets (red) that exist now or may exist in the future, it is likely that some research datasets (blue) could be re-identified. It is difficult to know in advance which key variables (purple) may aid the adversary in this task.

To make our worries concrete, we engaged in threat modeling: we reasoned about who might have both (a) the capability to conduct such an attack and (b) the incentive to do so. We even tried to attack our own data.  Through this process we identified five main threats (the rows in the figure below). A privacy researcher, for instance, would likely have the skills to re-identify the data if they could find auxiliary data to do so, and might also have an incentive to re-identify, perhaps to publish a paper arguing that we had been too cavalier about privacy concerns. A nosy neighbor who knew someone in the data might be able to find that individual’s case in order to learn information about their friend which they did not already know. We also worried about other threats that are detailed in the full paper.

 

Threat mitigation

To mitigate threats, we took several steps to (a) reduce the likelihood of re-identification and to (b) reduce the risk of harm in the event of re-identification. While some of these defenses were statistical (i.e. modifications to data designed to support aims [a] and [b]), many instead focused on social norms and other aspects of the project that are more difficult to quantify. For instance, we structured the Challenge with no monetary prize, to reduce an incentive to re-identify the data in order to produce remarkably good predictions. We used careful language and avoided making extreme claims to have “anonymized” the data, thereby reducing the incentive for a privacy researcher to correct us. We used an application process to only share the data with those likely to contribute to the scientific goals of the project, and we included an ethical appeal in which potential participants learned about the importance of respecting the privacy of respondents and agreed to use the data ethically. None of these mitigations eliminated the risk, but they all helped to shift the balance of risks and benefits of data sharing in a way consistent with ethical use of the data. The figure below lists our main mitigations (columns), with check marks to indicate the threats (rows) against which they might be effective.  The circled check mark indicates the mitigation that we thought would be most effective against that particular adversary.

 

Third-party guidance

A small group of researchers highly committed to a project can easily convince themselves that they are behaving ethically, even if an outsider would recognize flaws in their logic. To avoid groupthink, we conducted the Challenge under the guidance of third parties. The entire process was conducted under the oversight and approval of the Institutional Review Board of Princeton University, a requirement for social science research involving human subjects. To go beyond what was required, we additionally formed a Board of Advisers to review our plan and offer advice. This Board included experts from a wide range of fields.

Beyond the Board, we solicited informal outside advice from a diverse set of anyone we could talk to who might have thoughts about the process, and this proved valuable.  For example, at the advice of someone with experience planning high-risk operations in the military, we developed a response plan in case something went wrong. Having this plan in place meant that we could respond quickly and forcefully should something unexpected have occurred.

 

Ethics

After the process outlined above, we still faced an ethical question: should we share the data and proceed with the Fragile Families Challenge? This was a deep and complex question to which a fully satisfactory answer was likely to be elusive. Much of our final decision drew on the principles of the Belmont Report, a standard set of principles used in social science research ethics. While not perfect, the Belmont Report serves as a reasonable benchmark because it is the standard that has developed in the scientific community regarding human subjects research. The first principle in the Belmont Report is respect for persons. Because families in the Fragile Families Study had consented for their data to be used for research, sharing the data with researchers in a scientific project agreed with this principle. The second principle is beneficence, which requires that the risks of research be balanced against potential benefits. The threat mitigations we carried out were designed with beneficence in mind. The third principle is justice: that the benefits of research should flow to a similar population that bears the risks. Our sample included many disadvantaged urban American families, and the scientific benefits of the research might ultimately inform better policies to help those in similar situations. It would be wrong to exclude this population from the potential to benefit from research, so we reasoned that the project was in line with the principle of justice. For all of these reasons, we decided with our Board of Advisers that proceeding with the project would be ethical.

 

Conclusion

To unlock the power of data while also respecting respondent privacy, those providing access to data must navigate the fundamental tension between scientific benefits and risk to respondents. Our process did not offer provable privacy guarantees, and it was not perfect. Nevertheless, our process to address this tension may be useful to others in similar situations as data stewards. We believe the core principles of threat modeling, threat mitigation, and third-party guidance within an ethical framework will be essential to such a task, and we look forward to learning from others in the future who build on what we have done to improve the process of navigating this tension.

You can read more about our process in our pre-print: Lundberg, Narayanan, Levy, and Salganik (2018) “Privacy, ethics, and data access: A case study of the Fragile Families Challenge.

Securing the Vote — National Academies report

In this November’s election, could a computer hacker, foreign or domestic, alter votes (in the voting machine) or prevent people from voting (by altering voter registrations)?  What should we do to protect ourselves?

The National Academies of Science, Engineering, and Medicine have released a report,  Securing the Vote: Protecting American Democracy about the cybervulnerabilities in U.S. election systems and how to defend them.  The committee was chaired by the presidents of Indiana University and Columbia University, and the members included 5 computer scientists, a mathematician, two social scientists, a law professor, and three state and local election administrators.  I served on this committee, and I am confident that the report presents the clear consensus of the scientific community, as represented not only by the members of the committee but also the 14 external reviewers—election officials, computer scientists, experts on elections—that were part of the National Academies’ process.

The 124-page report, available for free download, lays out the scientific basis for our conclusions and our 55 recommendations.  We studied primarily the voting process; we did not address voter-ID laws, gerrymandering, social-media disinformation, or campaign financing.

There is no national election system in the U.S.; each state or county runs its own elections.  But in the 21st century, state and local election administrators face new kinds of threats.  In the 19th and 20th centuries elections did not face the threat of vote manipulation (and voter-registration tampering) from highly sophisticated adversaries anywhere in the world.  Most state and local election administrators know they must improve their cybersecurity and adopt best practices, and the federal government can (and should) offer assistance.  But it’s impossible to completely prevent all attacks; we must be able to run elections even if the computers might be hacked; we must be able to detect and correct errors in the computer tabulation.

Therefore, our key recommendations are,

4.11.  Elections should be conducted with human-readable paper ballots.  These may be marked by hand or by machine (using a ballot-marking device); they may be counted by hand or by machine (using an optical scanner).  Recounts and audits should be conducted by human inspection of the human-readable portion of the paper ballots.  Voting machines that do not provide the capacity for independent auditing (e.g., machines that do not produce a voter-verifiable paper audit trail) should be removed from service as soon as possible.

In our report, we explain why:  voting machines can never be completely hack-proof, but with paper ballots we can–if we have to–count the votes independent of possibly hacked computers.

4.12.  Every effort should be made to use human-readable paper ballots in the 2018 federal election.  All local, state, and federal elections should be conducted using human-readable paper ballots by the 2020 presidential election.

5.8.  States should mandate risk-limiting audits prior to the certification of election results.  With current technology, this requires the use of paper ballots.  States and local jurisdictions should implement risk-limiting audits within a decade.  They should begin with pilot programs and work toward full implementation.  Risk-limiting audits should be conducted for all federal and state election contests, and for local contests where feasible. 

In our report, we explain why:  examining a small random sample of the paper ballots, and comparing with the results claimed by the computers, can assure with high confidence that the computers haven’t been hacked to produce an incorrect outcome–or else, can provide clear evidence that a recount is needed.

5.11.  At the present time, the Internet (or any network connected to the Internet)  should not be used for the return of marked ballots.  Further, Internet voting should not be used in the future until and unless very robust guarantees of security and verifiability are developed and in place, as no known technology guarantees the secrecy, security, and verifiability of a marked ballot transmitted over the Internet.

4.1.  Election administrators should routinely assess the integrity of voter registration databases and the integrity of voter registration databases connected to other applications.  They should develop plans that detail security procedures for assessing voter registration database integrity and put in place systems that detect efforts to probe, tamper with, or interfere with voter registration systems.  States should require election administrators to report any detected compromises or vulnerabilities in voter registration systems to the U.S. Department of Homeland Security, the U.S. Election Assistance Commission, and state officials.

Many of these recommendations are not controversial, in most states.  Almost all the states use paper ballots, counted by machine;  the few remaining states that use paperless touchscreens are taking steps to move to paper ballots; the states have not adopted internet voting (except for scattered ill-advised experiments); and many, many election administrators nationwide are professionals who are working hard to come up to speed on cybersecurity.

But many election administrators are not sure about risk-limiting audits (RLAs).  They ask, “can’t we just audit the digital ballot images that the machines provide?”  No, that won’t work:  if the machine is hacked to lie about the vote totals, it can easily be hacked to provide fake digital pictures of the ballots themselves.  The good news is, well designed risk-limiting audits, added to well-designed administrative processes for keeping track of batches of ballots, can be efficient and practical.  But it will take some time and effort to get things going: the design of those processes, the design of the audits themselves, training of staff, state legislation where necessary.  And it can’t be a one-size-fits-all design:  different states vote in different ways, and the risk-limiting audit must be designed to fit the state’s election systems and methods.  That’s why we recommend pilots of RLAs as soon as possible, but a 10-year period for full adoption.

Many other findings and recommendations are in the report itself.  For example, Congress should fully fund the Election Assistance Commission to perform its mission, authorize the EAC to set standards for voter-registration systems and e-pollbooks (not just voting machines); the President should nominate and Congress should confirm EAC commissioners.

But the real bottom line is:  there are specific things we can do, at the state level and at the national level; and we must do these things to secure our elections so that we are confident that they reflect the will of the voters.