November 24, 2024

A review of the FVAP UOCAVA workshop

The US Federal Voting Assistance Program (FVAP) is the Department of Defense Agency charged with assisting military and overseas voters with all aspects of voting, including registering to vote, obtaining ballots, and returning ballots. FVAP’s interpretations of Federal law (*) says that they must perform a demonstration of electronic return of marked ballots by overseas military voters (**) in a Federal election at the first Federal election that occurs one year after the adoption of guidelines by the US Election Assistance Commission. Since the EAC hasn’t adopted such guidelines yet (and isn’t expected to for at least another year or two), the clock hasn’t started ticking, so a 2012 demonstration is impossible and a 2014 demonstration looks highly unlikely. Hence, this isn’t a matter of imminent urgency; however, such systems are complex and FVAP is trying to get the ball rolling on what such a system would look like.

As has been discussed previously on this blog, nearly all computer security experts are very concerned about the prospect of marked ballot return over the internet (which we will henceforth refer to as “internet voting”). Issues include vulnerability of client computers, issues with auditability, concerns about usability and coercion, etc. On the flip side, many states and localities are marching full steam ahead on their own internet voting systems, generally ignoring the concerns of computer scientists, and focusing on the perceived greater convenience and hoped-for increased turnout. Many of these systems include email return of marked ballots, which computer scientists generally consider to be even riskier than web-based voting.

FVAP has been caught between the legal mandates and the technical experts. In an effort to break this logjam, they’ve organized a series of open fora – first in August 2010 just before USENIX Security in Washington DC, then in March 2011 just before the Electronic Verification Network workshop in Chicago IL, and last weekend just before USENIX Security in San Francisco CA. All three brought together representatives from FVAP, voting system vendors, election officials, computer scientists, and voting activists to discuss the issues. Several of the Freedom To Tinker bloggers have been present at all three meetings, and have been frustrated that the first two ended at an impasse – computer scientists saying “it doesn’t work” and FVAP (and others) saying “we need a solution anyway”.

Fortunately, the third meeting concluded in a far more constructive way. While all agree there are significant impediments, a consensus was reached that the best solution is a multi-stage competition, in much the same fashion as the National Institute of Standards and Technology (NIST) did for the Advanced Encryption Standard (AES) and is now performing for the Secure Hash Algorithm 3 (SHA-3).

The competition is structured as a series of phases that are completely open, which all expect to be at least somewhat controversial as some organizations (such as vendors) will want to protect their intellectual property. All submissions will be shared with the public, and competitive teams will be encouraged to critique each others’ submissions. In earlier phases this will be focused on the paper requirements and designs; in later phases this may include finding vulnerabilities in architectures and implementations. Submitters may claim patent and/or copyright on their submissions, but these must grant the public (including competitors) rights to use the submissions for analysis, including compiling, testing, and modifying software, for testing purposes. (However, submitters may preclude such use for production or resale purposes.) Thus, trade secrets will be precluded in the competitive process.

The competition will have three phases, each of which may include one or more iterations.

  • In the first phase (which as computer scientists was named “round 0”), submissions will focus on requirements for internet voting systems. Submitters will define characteristics that must be met in following phases. Submissions may also include use cases for which the requirements are applicable – for example, requirements that could apply in environments where all voters have smart cards, such as the US military. As described above, submissions will be open to the public, and anyone (especially submitters) will be encouraged to critique submissions to find the best aspects. At the conclusion of this round, FVAP will (possibly with the assistance of government experts) consolidate the requirements into a single set that will govern the following phase.
  • In the second phase (“round 1”), submissions will provide high level designs and detailed hardware and software architectures, along with procedures necessary for secure operation. The submissions for this round need to be detailed enough that a reasonably skilled person could implement a realization of the system, although many details such as user interfaces and database layouts will be undefined. As with the first phase, submissions will be open for critique. In this phase critiques will focus on identifying areas where designs do not meet the requirements defined in the first phase. The result may be modification of architectures to incorporate ideas from several teams. At the conclusion of this phase, FVAP will (again with assistance from government experts) narrow down the set of acceptable architectures. Or perhaps not – if no architecture is good enough to satisfy the requirements, FVAP may conclude that the experiment should not be run (and cancel the third phase).
  • In the third phase (“round 2”) submitters will create implementations of one or more of the architectures (perhaps even adopting architectures from other teams, if licensing terms permit). During the critique period, teams will seek to find security vulnerabilities in other implementations, and fix problems identified in their own implementation. Usability testing should be part of this phase, as systems too complex for voters to use effectively (even if secure) need to be identified and improved. At the conclusion of this phase, FVAP will identify one or more implementations that are adequate for meeting their demonstration project requirements. Or perhaps not – if no implementation is good enough, FVAP may conclude that the experiment should not be run.

What happens if there is no acceptable solution at the conclusion of the second or third phase? That’s possible – and if it happens, that may be cause for FVAP to request that Congress modify its charter to eliminate the requirement for online blank ballot return. If the best minds in the country conclude that internet voting is a perpetual motion machine, no amount of laws and regulations will make it possible.

How long will all this take? We estimate the entire process will take three or four years, allowing time for FVAP to publish a solicitation, organizations to create submission, the public critique period, FVAPs consolidation and decision making, and transition to the next phase.

In the meantime, there’s little doubt that some states will continue to move forward on the existing insecure solutions. We believe, and expect that most other computer scientists will agree, that this is a case to let science take its course before moving into implementation. We hope that FVAP will speak out publicly against such ill-advised experiments.

For now, we look forward to working with FVAP in realizing the first ever national internet voting competition.

(*) While there is some disagreement on interpretation of the law, since I’m not a lawyer and hence not competent to determine the accuracy of that interpretation, this blog entry presumes that the FVAP interpretation is correct.

(**) The term “military and overseas voters” means both military voters stationed away from their legal home (e.g., at a base in another state or overseas) and civilians living overseas (whether on a temporary basis such as contractors or on a permanent basis). Thus this includes people working for organizations like Peace Corps and embassies as well as expatriates. However, the FVAP mandate for internet voting only applies to overseas military voters, and not domestic military voters or overseas civilians.

Edited Aug 13 @ 12:17pmET: Changed first footnote to explain that I’m not a lawyer and hence not interpreting the law.

Edited Aug 15 @ 1:08pmET: Corrected name of EVN workshop.

The End of Gnutella?

Almost exactly 2 years ago, I wrote an essay that examined the case of Arista Records et al v. Lime Group et al. It was presented on Freedom-to-Tinker in a series of three posts (1, 2, 3). These articles presented an analysis which showed that any open filesharing network, such as Gnutella, is vulnerable to spamming. Lime Wire, without advertising as much, was acting as a spam cop for Gnutella, keeping the network safe for infringers. It was my view that the decision in the case could be made to turn on the actions that Lime Wire was taking to control spammers on the Gnutella network, and if the case were examined in that light, Lime Wire could be found liable for contributory infringement while still respecting the First Amendment rights of software publishers.

Since that time, a great deal has occurred in the world of filesharing. It is worthwhile to examine the the current state of affairs, which is predictable in some ways and yet quite surprising in others.

continue reading…

Retiring FedThread

Nearly two years ago, the Federal Register was published in a structured XML format for the first time. This was a big deal in the open government world: the Federal Register, often called the daily newspaper of our federal government, is one of our government’s most widely read publications. And while it could previously be read in paper and PDF forms, it wasn’t easy to digitally manipulate. The XML release changed all this.

When we heard this was happening, four of us here at CITP—Ari Feldman, Bill Zeller, Joe Calandrino, and myself—decided to see how we might be able to improve how citizens could interact with the Federal Register. Our big idea was to make it easy for anyone to comment paragraph-by-paragraph on any of its documents, like a proposed regulation. The site, which we called FedThread, would provide an informal public forum for annotating these documents, and we hoped it would lead to useful online discussions about the merits and weaknesses of all kinds of federal regulatory activity. We also added other useful features, like a full-text search engine and custom RSS feeds. Building these features for the Federal Register only became a straightforward task because of the new XML version. We built the site in just eight days from conception to release.

Another trio of developers in SF also saw opportunities in this free machine-readable resource and developed their own project called GovPulse, which had already won the Sunlight Foundation’s Apps for America 2 contest. They were then approached by the staff of the Federal Register last summer to expand their site to create what would become the new online face of the publication, Federal Register 2.0. Their approach to user comments actually guided users into participating in the formal regulatory comment process—a great idea. Federal Register 2.0 included several features present in FedThread, and many more. Everything was done using open source tools, and made available to the public as open source.

This has left little reason for us to continue operating FedThread. It has continued to reliably provide the features we developed two years ago, but our regular users will find it straightforward to transition to the similar (and often superior) search and subscription features on Federal Register 2.0. So, we’re retiring FedThread. However, the code that we developed will continue to be available, and we hope that enterprising developers will find components to re-use in their own projects that benefit society. For instance, the general purpose paragraph-commenting code that we developed can be useful in a variety of projects. Of course, that code itself was an adaptation of the code supporting another open source project—the Django Book, a free set of documentation about the web framework that we were using to build FedThread (but this is what developers would call a “meta” observation).

Ideally, this is how hacking open government should work. Free machine readable data sets beget useful new ways for citizens to explore those data and make it useful to other citizens. Along the way, they experiment with different ideas, some of which catch on and others of which serve as fodder for the next great idea. This happens faster than standard government contracting, and often produces more innovative results.

Finally, a big thanks to the GPO, NARA and the White House Open Government Initiative for making FedThread possible and for helping to demonstrate that this approach can work, and congratulations on the fantastic Federal Register 2.0.

What Gets Redacted in Pacer?

In my research on privacy problems in PACER, I spent a lot of time examining PACER documents. In addition to researching the problem of “bad” redactions, I was also interested in learning about the pattern of redactions generally. To this end, my software looked for two redaction styles. One is the “black rectangle” redaction method I described in my previous post. This method sometimes fails, but most of these redactions were done successfully. The more common method (around two-thirds of all redactions) involves replacing sensitive information with strings of XXs.

Out of the 1.8 million documents it scanned, my software identified around 11,000 documents that appeared to have redactions. Many of them could be classified automatically (for example “123-45-xxxx” is clearly a redacted Social Security number, and “Exxon” is a false positive) but I examined several thousand by hand.

Here is the distribution of the redacted documents I found.

Type of Sensitive Information No. of Documents
Social Security number 4315
Bank or other account number 675
Address 449
Trade secret 419
Date of birth 290
Unique identifier other than SSN 216
Name of person 129
Phone, email, IP address 60
National security related 26
Health information 24
Miscellaneous 68
Total 6208

To reiterate the point I made in my last post, I didn’t have access to a random sample of the PACER corpus, so we should be cautious about drawing any precise conclusions about the distribution of redacted information in the entire PACER corpus.

Still, I think we can draw some interesting conclusions from these statistics. It’s reasonable to assume that the distribution of redacted sensitive information is similar to the distribution of sensitive information in general. That is, assuming that parties who redact documents do a decent job, this list gives us a (very rough) idea of what kinds of sensitive information can be found in PACER documents.

The most obvious lesson from these statistics is that Social Security numbers are by far the most common type of redacted information in PACER. This is good news, since it’s relatively easy to build software to automatically detect and redact Social Security numbers.

Another interesting case is the “address” category. Almost all of the redacted items in this category—393 out of 449—appear in the District of Columbia District. Many of the documents relate to search warrants and police reports, often in connection with drug cases. I don’t know if the high rate of redaction reflects the different mix of cases in the DC District, or an idiosyncratic redaction policy voluntarily pursued by the courts and/or the DC police but not by officials in other districts. It’s worth noting that the redaction of addresses doesn’t appear to be required by the federal redaction rules.

Finally, there’s the category of “trade secrets,” which is a catch-all term I used for documents whose redactions appear to be confidential business information. Private businesses may have a strong interest in keeping this information confidential, but the public interest in such secrecy here is less clear.

To summarize, out of 6208 redacted documents, there are 4315 Social Security that can be redacted automatically by machine, 449 addresses whose redaction doesn’t seem to be required by the rules of procedure, and 419 “trade secrets” whose release will typically only harm the party who fails to redact it.

That leaves around 1000 documents that would expose risky confidential information if not properly redacted, or about 0.05 percent of the 1.8 million documents I started with. A thousand documents is worth taking seriously (especially given that there are likely to be tens of thousands in the full PACER corpus). The courts should take additional steps to monitor compliance with the redaction rules and sanction parties who fail to comply with them, and they should explore techniques to automate the detection of redaction failures in these categories.

But at the same time, a sense of perspective is important. This tiny fraction of PACER documents with confidential information in them is a cause for concern, but it probably isn’t a good reason to limit public access to the roughly 99.9 percent of documents that contain no sensitive information and may be of significant benefit to the public.

Thanks again to Carl Malamud and Public.Resource.Org for their support of my research.

Universities in Brazil are too closed to the world, and that's bad for innovation

When Brazilian president Dilma Roussef visited China in the beginning of May, she came back with some good news (maybe too good to be entirely true). Among them, the announcement that Foxconn, the largest maker of electronic components, will invest US$12 billion to open a large industrial plant in the country. The goal is to produce iPads and other key electronic components locally.

The announcement was praised, and made it quickly to the headlines of all major newspapers. There is certainly reason for excitement. Brazil lost important waves of economic development, including industrialization (which only really happened in the 1940´s), or the semiconductor wave, an industry that has shown but a few signs of development in the country until now. (continue reading)