As Ed pointed out in October, Sequoia Voting Systems, Inc. (“Sequoia”) announced then that it intended to publish the source code of their voting system software, called “Frontier”, currently under development. (Also see EKR‘s post: “Contrarianism on Sequoia’s Disclosed Source Voting System”.)
Yesterday, Sequoia made good on this promise and you can now pull the source code they’ve made available from their Subversion repository here:
http://sequoiadev.svn.beanstalkapp.com/projects/
Sequoia refers to this move in it’s release as “the first public disclosure of source code from a voting systems manufacturer”. Carefully parsed, that’s probably correct: there have been unintentional disclosures of source code (e.g., Diebold in 2003) and I know of two other voting industry companies that have disclosed source code (VoteHere, now out of business, and Everyone Counts), but these were either not “voting systems manufacturers” or the disclosures were not available publicly. Of course, almost all of the research systems (like VoteBox and Helios) have been truly open source. Groups like OSDV and OVC have released or will soon release voting system source code under open source licenses.
I wrote a paper ages ago (2006) on the use of open and disclosed source code for voting systems and I’m surprised at how well that analysis and set of recommendations has held up (the original paper is here, an updated version is in pages 11–41 of my PhD thesis).
The purpose of my post here is to highlight one point of that paper in a bit of detail: disclosed source software licenses need to have a few specific features to be useful to potential voting system evaluators. I’ll start by describing three examples of disclosed source software licenses and then talk about what I’d like to see, as a tinkerer, in these agreements.
The definition of an open source software product is relatively simple: for all practical purposes, anything released under an OSI-approved software license is open source, especially in the sense that one who downloads the source code will have wide latitude to copy, distribute, modify, perform, etc. the source code. What we refer to as disclosed source software is publicly released under a more restrictive license.
Three Disclosed Source Licenses
We have at least three examples of these kinds of licenses from voting systems applications:
-
Sequoia: Sequoia’s license is a good place to start, due to its relative simplicity. It grants the user a limited copyright and patent license for “reference use”, which is defined as (emphasis added):
“Reference use” means use of the software within your company as a reference, in read only form, for the sole purposes of reviewing and inspecting the code. In addition, this license allows inspection of the code to enhance, or develop ancillary software or hardware products to enhance interoperability of your products with the software. Distribution of this source is allowed provided that this license, and all copyright notices remain in-tact.
(If you’re a developer and you suspect that you’ve seen this license before, you might have. It’s appears identical to Microsoft’s Reference Source License (Ms-RSL) which is used to distribute things like the .NET Framework under their shared source program. That makes a good deal of sense since Frontier is written in .NET!)
-
Everyone Counts1: Everyone Counts’ (E1C) license is curious. (Unfortunately, I don’t see a copy of this license posted publicly, so I’ll just quote from it.) I say curious mostly because of the flowery language it includes, such as (emphasis added):
The sources are provided for scrutiny in the same spirit as any member of the public may apply and obtain escorted access to a public election, and is entitled free and open access to election processes to satisfy his or herself of the veracity of the claims of electoral officers and that the process upholds democratic principles. In the same way, the elections observer is not permitted to capture and publish or otherwise make public the processes of the physical count at an election, the same applies for these sources.
I’m not sure what that last part is about; it’s pretty well accepted—in the U.S.—that “making public the processes of the physical count” is a basic requirement of democratic elections.
Anyway, on to the substance: It’s a pretty simple license, although more complex than the Sequoia license. The core of this license allows the user to “examine, compile and execute the resulting bytecodes” from the Java source code. It specifies that the user is allowed these rights for the purpose of “analysis forming electoral scrutiny”, which is a difficult phrase to parse. The license suffers from a lot of such wording problems, which make it pretty hard to understand.
-
VoteHere2: The VoteHere license agreement is considerably more complex and looks more like a commercial software license agreement. My favorite part, of course, is:
TO AVOID ANY DOUBT, THIS SOFTWARE IS NOT BEING LICENSED ON AN OPEN SOURCE BASIS.
The central component is that VoteHere restricts all your rights, other than copying and modifying the source code for evaluation purposes, and owns any derivative works you create from the source code. It has some other quirks; for example, the license, despite being a click-wrap license, has a hard term of 60 days after which all copies and such must be destroyed. (Presumably, you could click through again after 60 days, and restart the term.)
What Does an Evaluator Need in a License?
Each of these licenses has its strengths and weaknesses. The Sequoia license doesn’t seem to permit modification of the source code but is relatively simple and allows distribution of Sequoia’s unmodified code. The VoteHere and E1C licenses, however, understand that modification may be necessary to evaluate the software (VoteHere even includes a license to the system’s documentation, an essential but often overlooked part of evaluating source code.). The VoteHere license is extremely onerous in that it is very strict and places heavy burdens on the evaluator. The E1C license is flowery and hard to understand, but seems simple at the core and seems to understand what evaluators might need in terms of modifying the code during evaluation.
This raises a good question: What rights, exactly, do evaluators need to examine source code? Practically, it depends on what they want to do. If all they want to do is human line-by-line source code analysis, than a “read-only” license like Sequoia’s is probably fine. However, what about compiling the source code with debugging flags set? What about modifying the software to see how another piece of it performs?
Listed (sort of) in terms of the rights granted by U.S. copyright law, here are some thoughts:
-
Examining: If it doesn’t require making a copy, simply looking at the source code with your eyeballs is not covered by copyright law. So licenses that allow you to “examine” the source code aren’t really granting much, unless they define “examine” to mean something that implicates exclusive rights (which are listed in 17 USC 106).
-
Copying: Of course, downloading and loading source code in an IDE or text editor will make a number of copies, so at the most basic level, evaluators will need to be able to do this. Backup copies are also a necessity these days and VoteHere’s license contemplates this by allowing “reproduc[tion] for temporary archive purposes”.
-
Modification: Evaluators will need some ability to modify the code. Either simply in compiling it to execute it or the next logical step which involves special types of compilation such that “debugging flags” are set (this includes special flags and metadata in the compiled code which allows debugging tools to step through the program, set break points, etc.). Some types of evaluation require minor modifications to integrate the code into an analysis method; a simple example is just changing pieces of the code to see how other pieces will respond or inserting code that prints to the screen (which is a very primitive but useful form of debugging!). Each of these actions creates a derivative work, so that should be explicitly allowed in these licenses.
-
Distribution: At first, you might not think that evaluators would need much in the way of rights to distribute the source code. However, distributing modified works, such as a patch that fixes a bug, could be very useful. Also, being able to share the code if the official means of getting the code goes dark is often useful; for example, having a portal of voting system source code would provide this “mirroring” capability and could also allow creating a “one-stop shop” for tinkerers and researchers who would point research tools at these code repositories for analysis.
-
Performance: Both reading the source code out loud or showing it publicly are also implicated by copyright law. Why would an evaluator want to do this? Well, imagine that you’ve completed an analysis of a disclosed source code product and you want to write up your findings. It’s often useful to include snippets of code. Small snippets would likely be covered by fair use, but it’s always nice to not have to worry about that and have explicit permission to at least “display” and possibly “read aloud” source code in these contexts (think accessible or podcast versions of a report!).
-
Executing: There are a line of legal cases that say that “executing” a program is protected by copyright law due to a copy of the object code being loaded into memory at run time. Hopefully, there’s no reason to believe that permission to make copies, the first and most basic need of evaluators highlighted above, wouldn’t also include this interpretation of “executing” the code.
Outside of these types of exclusive rights, there’s also something to be said for simplicity. The simplicity of the BSD license is a great example: it’s widely used and understood to be very generous and easy to understand. The Sequoia license (being the Ms-RSL license) is very simple and easy to understand. The E1C license is not particularly complex, but it’s substance is hard to understand (again, apologies that I cannot post the text of that license). The VoteHere license is easy to understand but very complex and extremely onerous in terms of the burden it places on evaluators.
As I finally finish writing this, I’m told Sequoia might be interested in modifying their license. That would be a wonderful idea and I hope these thoughts are useful for modifying it. I do wonder how they’ll be able to modify the license and still distribute parts of the .NET Framework under a new license. Perhaps they’ll specify that the .NET parts are under the Ms-RSL and any Sequoia-sourced source code is under Sequoia’s new license. We’ll see!
1 Everyone Counts sells internet voting solutions, which are scary as hell to a lot of us.
2 VoteHere was a company, now seemingly out of business, that made a number of products including a cryptographic add-on module, Sentinel, for the Diebold/Premier AccuVote-TS voting system and a absentee ballot tracking system.
Any good voting system should make it possible for anyone to construct a bit-for-bit build of the code actually run in a machine. The hardware should also be constructed so that (1) it is possible to physically write-protect the code store and physically seal it in such a way that it cannot be written without breaking the physical seal; (2) with the code store write-protected, it is possible for an external device to read out the code and verify its correctness without any code having to execute anywhere except in the external device. If someone from each party connects his own reader, all parties can confirm that the code is as it should be.
I had a quick look through the svn repository and I see a few good things. There are solution files for building the application. There’s a list of all the libraries and tools needed to build the application. The code is well commented.
There may be problems – in fact I’m sure there will be some found – but at first glance they have provided enough for the code to be reviewed. It’s a good start.
I think a necessary requirement for evaluating the open source of voting machine software should be assurance that the shipped, installed code in actual machines matches the
supposed source code.
The best way to ensure this is to have the build process also released, as well as specifications for all tools used in the build. If the evaluator can build a bit-for-bit identical program image, that is fairly good evidence that the source matches the executable.
Of course if the compiler is hacked (as Ken Thompson did) then you can’t assure this
and all you can do is maybe decompile the object and try to match it against the source.
It seems that many things can conspire to thwart making a bit-for-bit build from the source, right? Even just variations in build platforms can result in different object code (it would seem). (Incidentally, I’d love to see instructions/procedures/methods for making sure one can create robust bit-for-bit copy builds.) Not to mention that there are a number of ways to cheat this; compiler hacks are one but also bad comparison/hashing programs/platforms, etc.
The way this is done in voting systems (and a number of other places) is to do a “trusted build” or “witnessed build” where the media on which the inspected code was delivered is used on a wiped platform, etc. to build a reference copy that is then what is installed on the machines. The EAC’s VSTCP manual goes into some of the procedures for doing this, and even it isn’t fool proof!
Buildable is also important because one of the best way to learn / test a program is to modify and run it. You might need to build the program to turn on compile-time controlled debugging options, add profilling, etc.