May 26, 2020

Ballot-level comparison audits: BMD

In my previous posts, I’ve been discussing ballot-level comparison audits, a form of risk-limiting audit. Ballots are imprinted with serial numbers (after they leave the voter’s hands); during the audit, a person must find a particular numbered ballot in a batch of a thousand (more or less).

With CCOS (central-count optical scan) this works fine: the CCOS prints the serial numbers consecutively, and the human auditor can easily find the right ballot in a minute or two. With PCOS (precinct-count optical scan), we are reluctant to print the serial numbers consecutively, because the order in which people insert their ballots at the polling place is visible to the public, and (in theory) someone could learn how you voted by correlating with the CVR file.

What about ballot-marking devices (BMDs)? How do the serial numbers work for use in ballot-level comparison audits?

First of all, let’s remember that RLAs of BMD-marked ballots are not very meaningful, because the RLA can only assure that what’s marked on the paper is correctly tabulated. Because most voters don’t inspect what’s marked on the paper, the RLA cannot assure that what the voter indicated to the BMD (on the touchscreen) has been correctly tabulated, if the BMD had been hacked to make it cheat.

But suppose we set that concern aside. And indeed, some jurisdictions are conducting “RLAs” on BMD-marked ballots. So let’s examine how such “RLAs” should work.

If the BMD prints a serial number onto the marked ballot before presenting the ballot for the voter to examine, then the voter can see the serial number, and can make a note of it. Then the voter can sell their vote, by telling the criminal vote-buyer the serial number. Or the voter can be coerced to do so. You may think this is a far-fetched scenario, but voter coercion and vote selling were common in the 19th-century and early 20th-century United States, and occurs now in some other countries.

Some “all-in-one” BMDs incorporate a scanning function, and don’t require a separate PCOS scanner. Suppose such a BMD prints a serial number onto the marked ballot after presenting the ballot for the voter to examine? That helps address the “voter-sees-the-number” problem. But it’s unpleasant to contemplate voting machines that can mark your ballot after the last time you see it. Any voting machine whose physical hardware can print votes onto the ballot after the last time the voter sees the paper,  is not a voter verified paper ballot system, and is not acceptable. But even so–suppose we permit this–we are in a similar situation to PCOS ballots. That is, the serial numbers should be in random order, not consecutive order, because otherwise observers in the polling place could calculate what serial number you’ll get.

And therefore, ballot-comparison audits of BMD-marked ballots run into just the same problem as audits of PCOS-scanned ballots, and maybe the same solutions would apply.

Because of this problem, some manufacturers of BMDs have done the same as manufacturers of PCOS: omit serial numbers entirely. For example, the ExpressVote and ExpressVote XL do not print serial numbers on the ballot*, and therefore their ballots (like PCOS ballots) cannot be easily audited by ballot-level comparison audits (except by a cumbersome “transitive audit”).

*Based on information about the ExpressVote and ExpressVote XL as configured in 2019 and deployed in more than one state, including New Jersey.

Finding a randomly numbered ballot

In my previous posts, I’ve been discussing ballot-level comparison audits, a form of risk-limiting audit. Ballots are imprinted with serial numbers (after they leave the voter’s hands); during the audit, a person must find a particular numbered ballot in a batch of a thousand (more or less).

If the ballot papers are numbered consecutively, that’s not too difficult. But if the serial numbers are in random order, it’s very time-consuming.

An answer to the second puzzle.

So here’s my next idea. Likely I’m not the first to think of it, so I can’t claim much credit. And this idea may or may not be practical; it would need to be tested in practice.

Problem: You have a batch of serial-numbered ballots, like this, and you need to find the one numbered 0236000482.

Take the pile of ballots, and feed them through a high-volume scanner. Scanners that can do 140 pages per minute cost about $6000. The computer attached to the scanner can use OCR (optical character recognition) software just on the corners of the page, to find and recognize the serial number. When it finds the right number, the computer commands the scanner to stop.

Then the human auditor can pick up the last-scanned page, and examine it to make sure it’s the right number.

If the OCR software does not work perfectly (false positives), no harm done: the human sees that it’s the wrong number, and resumes the scanner. False negatives are more annoying, but still recoverable: the human would have to search through the entire pile. Because we don’t rely on the scanner to work perfectly, because the scanner is not counting or tabulating votes, there’s no need to put this equipment through an EAC certification process.

As you’ll notice, the serial number is printed in fairly low-quality, hard-to-read print. This might pose problems for the OCR software. Better-quality printing would help the OCR, but it would help the humans too, and might be worth doing in any case.

Another variant of this solution is to print the serial number as a barcode in addition to human-readable digits. That would be easier for the scanner to recognize. If the PCOS tries to cheat in some way by making the barcode mismatch the human-readable number, this will be detected immediately by the human auditor.

Puzzle number 3: The solution I propose in this article might work; but surely a creative person can find even better ways to support ballot-level comparison audits of PCOS machines.

Why we can’t do random selection the other way round in PCOS RLAs

In my last article, I posed this puzzle for the reader. We want to do ballot-level comparison audits, a form of RLA (risk-limiting audit) on a precinct-count optical-scan (PCOS) voting system. This requires a serial number printed on every ballot, linked with an entry in the cast-vote-record (CVR) file. The standard method is to pick a random entry in the CVR, and find the corresponding paper ballot in the appropriate batch of ballots. If the ballots in each batch are consecutively numbered, this only takes a minute or two.

But if the ballots in a batch have randomly ordered serial numbers, in order to preserve the secret ballot in a PCOS context, then it takes much longer to find the right ballot in a large batch of ballots. This slows down the RLA considerably.

I thought of a brilliant solution to this problem, and only after conversation with Ron Rivest did I understand why it doesn’t work. Then, when I discussed this problem with other smart people I know, several of them came up with the same brilliant solution–and it still doesn’t work.

Answer to the puzzle. What’s wrong with this solution?

Here’s why it doesn’t work. Suppose the voting machines are hacked, that is, clever vote-stealing software is installed. The machines must still produce a CVR file containing a list of ballot summaries, sorted by serial number. Here’s an example:

00001   Benedict Arnold
00002   George Washington
00003   Benedict Arnold 
00004   Benedict Arnold  
00005   George Washington 
00006   Benedict Arnold 
00007   Benedict Arnold  
00008   George Washington  
00009   George Washington 
00010   Benedict Arnold 

So it looks like Benedict Arnold won this election, 6 to 4. In the RLA, we pick randomly from among the sheets of paper in the ballot box, and it reads,

00005 [X] George Washington    [ ] Benedict Arnold

That’s consistent with the CVR file, so this election passes the audit.

Now, pause to reflect before we open the ballot box again (click on “read more”) and look inside.

[Read more…]