In my last article, I posed this puzzle for the reader. We want to do ballot-level comparison audits, a form of RLA (risk-limiting audit) on a precinct-count optical-scan (PCOS) voting system. This requires a serial number printed on every ballot, linked with an entry in the cast-vote-record (CVR) file. The standard method is to pick a random entry in the CVR, and find the corresponding paper ballot in the appropriate batch of ballots. If the ballots in each batch are consecutively numbered, this only takes a minute or two.
But if the ballots in a batch have randomly ordered serial numbers, in order to preserve the secret ballot in a PCOS context, then it takes much longer to find the right ballot in a large batch of ballots. This slows down the RLA considerably.
I thought of a brilliant solution to this problem, and only after conversation with Ron Rivest did I understand why it doesn’t work. Then, when I discussed this problem with other smart people I know, several of them came up with the same brilliant solution–and it still doesn’t work.
Answer to the puzzle. What’s wrong with this solution?
Here’s why it doesn’t work. Suppose the voting machines are hacked, that is, clever vote-stealing software is installed. The machines must still produce a CVR file containing a list of ballot summaries, sorted by serial number. Here’s an example:
00001 Benedict Arnold 00002 George Washington 00003 Benedict Arnold 00004 Benedict Arnold 00005 George Washington 00006 Benedict Arnold 00007 Benedict Arnold 00008 George Washington 00009 George Washington 00010 Benedict Arnold
So it looks like Benedict Arnold won this election, 6 to 4. In the RLA, we pick randomly from among the sheets of paper in the ballot box, and it reads,
00005 [X] George Washington [ ] Benedict Arnold
That’s consistent with the CVR file, so this election passes the audit.
Now, pause to reflect before we open the ballot box again (click on “read more”) and look inside.
This next step would never be performed in an actual audit, but suppose we inspect every ballot in the box:
00005 [X] George Washington [ ] Benedict Arnold 00002 [X] George Washington [ ] Benedict Arnold 00005 [X] George Washington [ ] Benedict Arnold 00002 [X] George Washington [ ] Benedict Arnold 00004 [ ] George Washington [X] Benedict Arnold 00005 [X] George Washington [ ] Benedict Arnold 00002 [X] George Washington [ ] Benedict Arnold 00002 [X] George Washington [ ] Benedict Arnold 00005 [X] George Washington [ ] Benedict Arnold 00002 [X] George Washington [ ] Benedict Arnold
What happened? Nine voters marked their ballot for George, and only one for Benedict. But the cheating PCOS printed serial numbers 5 and 2 a total of 9 times, printed serial number 4 once, and didn’t print the other serial numbers at all!
In a real situation, if there are 1000 randomly ordered ballots in the bin, and the cheating PCOS lies about only 10% of the votes, it will not be easy to notice that there are duplicate serial numbers.
On the other hand, suppose this election were audited by the standard method for ballot-level comparison audits: pick a random line from the CVR file, then look for the ballot. Let’s pick line 7 for example:
00007 Benedict Arnold
When the auditor looks in the bin for ballot number 00007, it’s not there! This is evidence that something is seriously wrong, and the audit will successfully detect the fraud. Furthermore, the recount of ballots in the bin, that the voters marked themselves, will give the true result: George Washington, 9 to 1.
Ballot Polling Audits. This analysis applies to ballot-level comparison audits, but it does not apply to ballot polling audits. Drawing a random paper ballot from the paper-ballots pile (if it can really be done randomly enough) can be a valid method for ballot-polling audits, because the BPAs don’t need serial numbers; so “duplicate serial numbers” isn’t applicable to them.