January 14, 2025

Report on the Sequioa AVC Advantage

Today I am releasing an in-depth study of the Sequoia AVC Advantage direct-recording electronic (DRE) voting machine, available at citp.princeton.edu/voting/advantage. I led a team of six computer scientists in a monthlong examination of the source code and hardware of these voting computers, which are used in New Jersey, Pennsylvania, and other states.

The Rutgers Law School Constitutional Litigation Clinic filed a lawsuit seeking to decommission of all of New Jersey’s voting computers, and asked me to serve as an expert witness. This year the Court ordered the State of New Jersey and Sequoia Voting Systems to provide voting machines and their source code for me to examine. By Court Order, I can release the report no sooner than October 17th, 2008.

Accompanying the report is a video and a FAQ.

Executive Summary

I. The AVC Advantage 9.00 is easily “hacked” by the installation of fraudulent firmware. This is done by prying just one ROM chip from its socket and pushing a new one in, or by replacement of the Z80 processor chip. We have demonstrated that this “hack” takes just 7 minutes to perform.

The fraudulent firmware can steal votes during an election, just as its criminal designer programs it to do. The fraud cannot practically be detected. There is no paper audit trail on this machine; all electronic records of the votes are under control of the firmware, which can manipulate them all simultaneously.

II. Without even touching a single AVC Advantage, an attacker can install fraudulent firmware into many AVC Advantage machines by viral propagation through audio-ballot cartridges. The virus can steal the votes of blind voters, can cause AVC Advantages in targeted precincts to fail to operate; or can cause WinEDS software to tally votes inaccurately. (WinEDS is the program, sold by Sequoia, that each County’s Board of Elections uses to add up votes from all the different precincts.)

III. Design flaws in the user interface of the AVC Advantage disenfranchise voters, or violate voter privacy, by causing votes not to be counted, and by allowing pollworkers to commit fraud.

IV. AVC Advantage Results Cartridges can be easily manipulated to change votes, after the polls are closed but before results from different precincts are cumulated together.

V. Sequoia’s sloppy software practices can lead to error and insecurity. Wyle’s Independent Testing Authority (ITA) reports are not rigorous, and are inadequate to detect security vulnerabilities. Programming errors that slip through these processes can miscount votes and permit fraud.

VI. Anomalies noticed by County Clerks in the New Jersey 2008 Presidential Primary were caused by two different programming errors on the part of Sequoia, and had the effect of disenfranchising voters.

VII. The AVC Advantage has been produced in many versions. The fact that one version may have been examined for certification does not give grounds for confidence in the security and accuracy of a different version. New Jersey should not use any version of the AVC Advantage that it has not actually examined with the assistance of skilled computer-security experts.

VIII. The AVC Advantage is too insecure to use in New Jersey. New Jersey should immediately implement the 2005 law passed by the Legislature, requiring an individual voter-verified record of each vote cast, by adopting precinct-count optical-scan voting equipment.

Life after Driving

I’m working on a three-part series on self-driving automobile technology for Ars Technica. In part one I covered the state of existing self-driving technology and highlighted the dramatic progress that has been made in recent years. In part two, I assume that the remaining technical hurdles can be surmounted and examine what the world might look like when self-driving cars become ubiquitous. The potential benefits are enormous: autonomous vehicles could save thousands of lives, billions of person-hours, and billions of dollars of energy costs.

The article has sparked interesting discussion around the blogosphere. Matt Yglesias has a long-standing interest in urban planning issues, so he did a post about the urban planning implications of self-driving technologies. I argue that by making taxis cheaper, self-driving cars would shift a lot of people from owning cars to renting them. And that, in turn, would dramatically reduce demand for parking lots, which will allow more pleasant, high-density cities. It’s hard to overstate the extent to which the need for parking exacerbates sprawl and congestion problems. Parking lots consume vast amounts of land in suburban areas. This, in turn, means that stuff is farther apart, which forces people to rely even more on their cars to get from place to place.

Matt’s post prompted a number of interesting responses. Ryan Avent chimed in with some thoughts about how self-driving technologies would make urban living more attractive. On the other hand Tom Lee offers a counterpoint: making car travel cheaper and more convenient will, on the margin, cause people to drive (or “ride” anyway) more. This is a good point, and it’s not clear how these factors would balance out. But even if Tom is right, this wouldn’t be an entirely bad thing. Increased mobility is a virtue in its own right.

I think Atrios and Kevin Drum are on less firm ground when they argue that this technology is so far in the future that it’s not worth thinking about. Drum compares self-driving technologies to cold fusion and human-level AI, while Atrios compares them to flying cars and jet packs. I can only assume they didn’t read the first installment of my series, in which I discuss the state of the technology in some detail. The basic technology for self-driving is already here. There are cars in university laboratories that can navigate for hundreds of miles without human supervision, and can interact safely with other cars on urban streets. Of course, there’s still a lot of work to do to enable these vehicles to safely handle the multiplicity of obstacles they would encounter in real urban environments. And after that the technology will need to be made reliable and affordable enough for commercial use. But these problems are nowhere close to the difficulty of human-level AI. Your car doesn’t have to understand why you want to go to the store in order to find a safe path from here to there. If you’re skeptical that this technology can be made to work, I encourage you to read my first article and watch PBS’s excellent documentary on the 2005 DARPA Grand Challenge. There’s a lot of uncertainty about how long until this technology will be mature enough to let loose on our streets, but I think it’s pretty clearly a matter of “when,” not “if.”

Cloud(s), Hype, and Freedom

Richard Stallman’s recent description of ‘the cloud’ as ‘hype’ and a ‘trap’ seems to have stirred up a lot of commentary, but not a lot of clear discussion of the problems Stallman raised. This isn’t surprising- the term ‘the cloud’ has always been vague. (It was hard to resist saying ‘cloudy.’ 😉 When people say ‘the cloud’ they are really lumping at least four ‘cloud types’ together.

traditional applications, hosted elsewhere

Probably the most common type of ‘cloud’ is a service that takes a traditional software functionality and moves it to remotely hosted, (typically) web-delivered servers. Gmail and salesforce.com are like this- fairly traditional email and CRM applications, ‘just’ moved to the web.

If Stallman’s ‘hype’ claim is valid anywhere, it is here. Administration and maintenance costs are definitely lower when an expert like Google funds and runs the server, and reliability may improve as well. But the core functionality of these apps, and the ability to access data over a network, have been present since the dawn of networked computing. On average, this is undoubtedly a significant change in quality, but only rarely a change in type- making the buzz much harder to justify.

Stallman’s ‘trap’ charge is more complex. Computer users have long compromised on personal control by storing data remotely but accessing it via standardized protocols. This introduced risks- you had to trust the data host and couldn’t tinker with the server- but kept some controls- you could switch clients, and typically you could export the data. Some web apps still strike that balance- for example, most gmail features are accessible via good old POP and IMAP. But others don’t.

Getting your data out of a service like salesforce can be a ‘hidden cost’ of an apparently free service, and even with a relatively standards-based service like gmail you have no freedom to make changes to the server. These risks are what Stallman means when he talks about a ‘trap’, and regardless of your conclusion about them, understanding them is important.

services involving data that can’t (yet) be managed locally

Google Maps and Google Search are the canonical examples of this type of cloud service- heaps of data so large that one would need a large data center to host your own copy and a very, very fat pipe to keep it up-to-date.

Hype-wise, these are a mixed bag. These services definitely bring radical new functionality that traditionally can’t exist- I can’t store all of google maps on my phone. That hype is justified. At the same time, our personal ability to store and process data is still growing quickly, so the claims that this type of cloud service will always ‘require’ remote servers may be overblown.

‘Trap’-wise? Dependence on these services reminds me of ‘dependence’ on a library before the internet- you can work to make sure your library respects your privacy, prefer public libraries to private ones, or establish a personal library if your reading interests are narrow, but in the end eschewing large libraries is likely to be a case of cutting off your nose to spite your face. We’re in the same state with this type of cloud service. You can avoid them, but those concerned with freedom might be better off understanding and fixing them than condemning them altogether.

services that make creation of new data technically or economically feasible

Facebook and wikipedia are the canonical examples here. Unlike the first two types of cloud, where data was available but inconvenient before it ended up in the cloud, this class of cloud applications creates information that wasn’t previously feasible to collect at all.

There may well not be enough hype around this type of cloud. Replicating web scale collaborative facilities like these will be very difficult to do in a p2p fashion, and the impact of the creation of new information (even when it is as mundane as facebook’s data often is) is hard to understate.

Like the previous type of cloud, it is hard to call these a trap per se- they do make it hard to leave, but they do so by providing new functionality that is very hard to get with any traditional software model.

services offering computing and storage, rather than data

The most recent type of cloud service is remotely provisioned computing and storage, like Amazon’s EC2/S3 and Google’s App Engine. This is perhaps the most purely generative type of cloud, allowing individuals to create new services and scale them out to service millions of people without having to invest in their own physical infrastructure. It is hard to see any way in which this can reasonably be called ‘hype,’ given the reach it allows individuals and small or transient groups to have which might otherwise cost them many thousands of dollars.

From a freedom perspective, these can be both the best and worst of the cloud types. On the plus side, these services can be incredibly transparent- developers who use them directly have access to their own source code, and end users may not know they are using them at all. On the down side, especially for proprietary platforms like App Engine, these can have very deep lock-in- it is complicated, expensive, and risky to switch deployment platforms after achieving success. And they replace traditional, very open platforms- a tradeoff that isn’t always appreciated.

takeaways

‘The cloud’ isn’t going away, but hopefully we can clarify our thinking about it by talking about the different types of clouds. Hopefully this post is a useful step in that direction.

[This post is an extension of some ideas I’ve been playing around with on my own blog and at the autonomo.us group blog; readers curious about these issues may want to read further in those places. I also recommend reading this piece, which set me on the (very long) road to this particular post.]

Why is printing so hard?

Recently I bought a mildly used laser printer and wanted to set it up on my home network. In a better world, this would be a trivial exercise — just connect the printer to the network and let the computers discover it. In the actual world, it was a forty-five minute project that only a reasonably handy network jockey could have hoped to complete. (If you care about what exactly I had to do, see below.)

John Hartman says, “Printing is the hardest problem in computer science.” It often seems that way. But why?

Plug-and-play printing seems pretty simple, compared to many of the things that computers do routinely without trouble. Granted, it’s not trivial to get the full variety of printers to work with the full variety of computers, but our collective failure to do so is — or should be — surprising.

There must be some lesson here about engineering, or human nature, or something. Lately I’ve gone around asking people why printing is so hard. I’ve gotten some interesting answers, but I don’t think I really understand the issue yet.

What do you think? Why is printing so hard?

[For the record, here’s what I had to do to get our newly acquired HP LaserJet 2200DN printer working on our home network: I plugged the printer in to our network, but the Windows PCs couldn’t auto-discover the printer. I Googled the printer’s user manual, which said the printer had a built-in webserver. But I didn’t know the printer’s IP address, so I had to log in to our router and look at its DHCP tables. Knowing the IP address, I could connect to the printer’s webserver, which had a page telling me what URL to use for IPP printing. (I had to know what IPP was.) After that, I assigned the printer a static IP address, so the IPP URL (containing an IP address) would keep working across reboots. Now that I had a stable IPP URL, I could set up the PCs for printing. Finally, I had to guess which of driver to use on Windows — two drivers were offered, with no advice about which one to use, but only one of the offered drivers supports duplex printing. Total elapsed time: about 45 minutes.]

California Issues Emergency Election Audit Regulations

The Office of the California Secretary of State has issued a set of proposed emergency regulations for post-election manual tallying of paper election records. In this post, my first at FTT, I’ll try to explain and contextualize this development.

Since her election to office, California Secretary of State (CA SoS) Debra Bowen has methodically studied the shortcomings in California’s election equipment. She first initiated a Top-To-Bottom review (TTBR) of California’s voting systems that found them to be of poor technical quality and vulnerable to a myriad of security vulnerabilities, accessibility flaws, reliability issues and inadequate documentation and testing (a number of FTT regulars participated in the TTBR). For this year’s presidential primary in California, Bowen worked to mitigate these problems by decertifying this equipment and then recertifying it subject to a list of about 40 different conditions. One such condition is that the usual 1% manual tally under California law — counties must randomly choose and hand tally ballots cast in 1% of precincts — would be modified to include escalation that would mandate increased tallying for close races (where even small amounts of possible fraud and/or error could make a difference in the outcome of a contest).

Bowen issued these additional requirements (the “PEMT Requirements”) under her authority as CA SoS to regulate election technologies (here are the original PEMT Requirements). Unfortunately, the Registrar in San Diego County sued Bowen arguing that she 1) didn’t have such broad authority and 2) that, even if she did, she could only issue the PEMT Requirements through the California regulatory procedure (specified by the CA Administrative Procedure Act). A state Superior Court found in favor of the CA SoS but a Court of Appeal found that the PEMT Requirements did indeed betray characteristics of regulations and should therefore have gone through the regulatory procedure (for the legal eagles out there, see: County of San Diego v. Debra Bowen (2008) 166 Cal.App.4th 501).

By the time the Court of Appeal had made its decision on August 29, there was no time to follow the normal regulatory process, which takes about four months. Instead, the CA SoS had to follow the process for adopting an emergency regulation which applies when a regulation “is necessary for the immediate preservation of the public peace, health and safety, or general welfare.”

What is so special about these emergency manual tally provisions? First, it represents the increasing relevance and importance of adversarial considerations in the design of an election audit process. As we describe in the NYU Brennan Center / UC Berkeley Samuelson Clinic report on post-election audits (“Post-Election Audits: Restoring Trust In Elections”), fixed-percentage audits of election records are only particularly useful in detecting wide-ranging anomalies in vote counts. Methods that “tune” the amount of records audited depending on the margin in contests on the ballot do a much better job of ensuring that they’ll find evidence of possible error or fraud. Per the emergency PEMT Regulations, any contest with a margin (difference between the winning and losing choice in a contest) of 0.5% or lower is subject to a 10% manual tally, an order of magnitude more scrutiny than the statutory default.

Second, the CA SoS’ emergency PEMT Regulations reflect many best practices from audit theory and research: precincts to audit must be chosen randomly; the precincts to audit are only chosen after the semi-official vote tallies are arrived at; tally activities must be announced publicly and available for public observation; tallies must be conducted under “blind count” rules where the talliers do not know the totals in the precincts they’re tallying; differences between machine and hand counts must be explained or investigated.

The elephant in the room is always Los Angeles County; LA is so amazingly enormous for an election jurisdiction that some things simply aren’t possible. (For example, they frequently pick up ballot materials from precincts in helicopters; that is, traffic in LA is so bad and there are so many polling places (~5,000 or so) that the most reliable form of ballot transmission is via helicopter.) These rules are going to be exceedingly difficult for LA to comply with. I expect they will hire an army of tally managers and talliers to perform their tally and that it will be a race against the clock, counting 24 hours a day, seven days per week, to try and get it all done in the 28-calendar day canvass period.