April 23, 2014

Joseph Lorenzo Hall

Joseph Lorenzo Hall recently graduated with his Ph.D. from the University of California, Berkeley School of Information working under information law professors Pamela Samuelson and Deirdre Mulligan. Hall will start a postdoctoral research position at the Center for Information Technology Policy (CITP) at Princeton University in Fall 2008. Hall's academic focus is on mechanisms that promote transparency, as core functions of our government become digital. His Ph.D. thesis uses electronic voting as a critical case study in digital transparency. Mr. Hall holds master's degrees in astrophysics and information systems from UC Berkeley and is a founding member of the National Science Foundation CyberTrust ACCURATE Center (A Center for Correct, Usable, Reliable, Auditable and Transparent Elections). He served as a voting technology, policy and law analyst on the teams that conducted the California Secretary of State's Top-To-Bottom Review of voting systems and Project EVEREST, Ohio's review of its voting systems.

avatar

Web Tracking and User Privacy Workshop: Test Cases for Privacy on the Web

This guest post is from Nick Doty, of the W3C and UC Berkeley School of Information. As a companion post to my summary of the position papers submitted for last month’s W3C Do-Not-Track Workshop, hosted by CITP, Nick goes deeper into the substance and interaction during the workshop.

The level of interest and participation in last month’s Workshop on Web Tracking and User Privacy — about a hundred attendees spanning multiple countries, dozens of companies, a wide variety of backgrounds — confirms the broad interest in Do Not Track. The relatively straightforward technical approach with a catchy name has led to, in the US, proposed legislation at both the state and federal level and specific mention by the Federal Trade Commission (it was nice to have Ed Felten back from DC representing his new employer at the workshop), and comparatively rapid deployment of competing proposals by browser vendors. Still, one might be surprised that so many players are devoting such engineering resources to a relatively narrow goal: building technical means that allow users to avoid tracking across the Web for the purpose of compiling behavioral profiles for targeted advertising.

In fact, Do Not Track (in all its variations and competing proposals) is the latest test case for how new online technologies will address privacy issues. What mix of minimization techniques (where one might classify Microsoft’s Tracking Protection block lists) versus preference expression and use limitation (like a Do Not Track header) will best protect privacy and allow for innovation? Can parties agree on a machine-readable expression of privacy preferences (as has been heavily debated in P3P, GeoPriv and other standards work), and if so, how will terms be defined and compliance monitored and enforced? Many attendees were at the workshop not just to address this particular privacy problem — ubiquitous invisible tracking of Web requests to build behavioral profiles — but to grab a seat at the table where the future of how privacy is handled on the Web may be decided. The W3C, for its part, expects to start an Interest Group to monitor privacy on the Web and spin out specific work as new privacy issues inevitably arise, in addition to considering a Working Group to address this particular topic (more below). The Internet Engineering Task Force (IETF) is exploring a Privacy Directorate to provide guidance on privacy considerations across specs.

At a higher level, this debate presents a test case for the process of building consensus and developing standards around technologies like tracking protection or Do Not Track that have inspired controversy. What body (or rather, combination of bodies) can legitimately define preference expressions that must operate at multiple levels in the Web stack, not to mention serve the diverse needs of individuals and entities across the globe? Can the same organization that defines the technical design also negotiate semantic agreement between very diverse groups on the meaning of “tracking”? Is this an appropriate role for technical standards bodies to assume? To what extent can technical groups work with policymakers to build solutions that can be enforced by self-regulatory or governmental players?

Discussion at the recent workshop confirmed many of these complexities: though the agenda was organized to roughly separate user experience, technical granularity, enforcement and standardization, overlap was common and inevitable. Proposals for an “ack” or response header brought up questions of whether the opportunity to disclaim following the preference would prevent legal enforcement; whether not having such a response would leave users confused about when they had opted back in; and how granular such header responses should be. In defining first vs. third party tracking, user expectations, current Web business models and even the same-origin security policy could point the group in different directions.

We did see some moments of consensus. There was general agreement that while user interface issues were key to privacy, trying to standardize those elements was probably counterproductive but providing guidance could help significantly. Regarding the scope of “tracking”, the group was roughly evenly divided on what they would most prefer: a broad definition (any logging), a narrow definition (online behavioral advertising profiling only) or something in between (where tracking is more than OBA but excludes things like analytics or fraud protection, as in the proposal from the Center for Democracy and Technology). But in a “hum” to see which proposals workshop attendees opposed (“non-starters”) no one objected to starting with a CDT-style middle ground — a rather shocking level of agreement to end two days chock full of debate.

For tech policy nerds, then, this intimate workshop about a couple of narrow technical proposals was heady stuff. And the points of agreement suggest that real interoperable progress on tracking protection — the kind that will help the average end user’s privacy — is on the way. For the W3C, this will certainly be a topic of discussion at the ongoing meeting in Bilbao, and we’re beginning detailed conversations about the scope and milestones for a Working Group to undertake technical standards work.

Thanks again to Princeton/CITP for hosting the event, and to Thomas and Lorrie for organizing it: bringing together this diverse group of people on short notice was a real challenge, and it paid off for all of us. If you’d like to see more primary materials: minutes from the workshop (including presentations and discussions) are available, as are the position papers and slides. And the W3C will post a workshop report with a more detailed summary very soon.

avatar

Summary of W3C DNT Workshop Submissions

Last week, we hosted the W3C “Web Tracking and User Privacy” Workshop here at CITP (sponsored by Adobe, Yahoo!, Google, Mozilla and Microsoft). If you were not able to join us for this event, I hope to summarize some of the discussion embodied in the roughly 60 position papers submitted.

The workshop attracted a wide range of participants; the agenda included advocates, academics, government, start-ups and established industry players from various sectors. Despite the broad name of the workshop, the discussion centered around “Do Not Track” (DNT) technologies and policy, essentially ways of ensuring that people have control, to some degree, over web profiling and tracking.

Unfortunately, I’m going to have to expect that you are familiar with the various proposals before going much further, as the workshop position papers are necessarily short and assume familiarity. (If you are new to this area, the CDT’s Alissa Cooper has a brief blog post from this past March, “Digging in on ‘Do Not Track’”, that mentions many of the discussion points. Technically, much of the discussion involved the mechanisms of the Mayer, Narayanan and Stamm IETF Internet-Draft from March and the Microsoft W3C member submission from February.)

Read on for more…

Technical Implementation: First, some quick background and updates: A number of papers point out how analogizing to a Do-Not-Call-like registry–I suppose where netizens would sign-up not to be tracked–would not work in the online tracking sense, so we should be careful not to shape the technology and policy too closely to Do-Not-Call. Having recognized that, the current technical proposals center around the Microsoft W3C submission and the Mayer et al. IETF submission, including some mix of a DNT HTTP header, a DNT DOM flag, and Tracking Protection Lists (TPLs). While the IETF submission focuses exclusively on the DNT HTTP Header, the W3C submission includes all three of these technologies. Browsers are moving pretty quickly here: Mozilla’s FireFox v4.0 browser includes the DNT header, Microsoft’s IE9 includes all three of these capabilities, Google’s Chrome browser now allows extensions to send the DNT Header through the WebRequest API and Apple has announced that the next version of its Safari browser will support the DNT header.

Some of the papers critique certain aspects of the three implementation options while some suggest other mechanisms entirely. CITP’s Harlan Yu includes an interesting discussion of the problems with DOM flag granularity and access control problems when third-party code included in a first-party site runs as if it were first-part code. Toubiana and Nissenbaum talk about a number of problems with the persistence of DNT exceptions (where a user opts back in) when a resource changes content or ownership and then go on to suggest profile-based opting-back-in based on a “topic” or grouping of websites. Avaya’s submission has a fascinating discussion of the problems with implementation of DNT within enterprise environments, where tracking-like mechanisms are used to make sure people are doing their jobs across disparate enterprise web-services; Avaya proposes a clever solution where the browser first checks to see if it can reach a resource only available internally to the enterprise (virtual) network, in which case it ignores DNT preferences for enterprise software tracking mechanisms. A slew of submissions from Aquin et al., Azigo and PDECC favor a culture of “self-tracking”, allowing and teaching people to know more about the digital traces they leave and giving them (or their agents) control over the use and release of their personal information. CASRO-ESOMAR and Apple have interesting discussions of gaming TPLs: CASRO-ESOMAR points out that a competitor could require a user to accept a TPL that blocks traffic from their competitors and Apple talks about spam-like DNS cycling as an example of an “arms race” response against TPLs.

Definitions: Many of the papers addressed definitions definitions definitions… mostly about what “tracking” means and what terms like “third-party” should mean. Many industry submissions such as Paypal, Adobe, SIIA, and Google urge caution so that good types of “tracking”, such as analytics and forensics, are not swept under the rug and further argue that clear definitions of the terms involved in DNT is crucial to avoid disrupting user expectations, innovation and the online ecosystem. Paypal points out, as have others, that domain names are not good indicators of third-party (e.g., metrics.apple.com is the Adobe Omniture service for apple.com and fb.com is equivalent to facebook.com). Ashkan Soltani’s submission distinguishes definitions for DNT that are a “do not use” conception vs. a “do not collect” conception and argues for a solution that “does not identify”, requiring the removal of any unique identifiers associated with the data. Soltani points out how this has interesting measurement/enforcement properties as if a user sees a unique ID in the DNI case, the site is doing it wrong.

Enforcement: Some raised the issue of enforcement; Mozilla, for example, wants to make sure that there are reasonable enforcement mechanisms to deal with entities that ignore DNT mechanisms. On the other side, so to speak, are those calling for self-regulation such as Comcast and SIIA vs. those advocating for explicit regulation. The opinion polling research groups, CASRO-ESOMAR, call explicitly for regulation no matter what DNT mechanism is ultimately adopted, such that DNT headers requests are clearly enforced or that TPLs are regulated tightly so as to not over-block legitimate research activities. Abine wants a cooperative market mechanism that results in a “healthy market system that is responsive to consumer outcome metrics” and that incentivizes advertising companies to work with privacy solution providers to increase consumer awareness and transparency around online tracking. Many of the industry players worried about definitions are also worried about over-prescription from a regulatory perspective; e.g., Datran Media is concerned about over-prescription via regulation that might stifle innovation in new business models. Hoofnagle et al. are evaluating the effectiveness of self-regulation, and find that the self-regulation programs currently in existence are greatly stilted in favor of industry and do not adequately embody consumer conceptions of privacy and tracking.

Research: There were a number of submissions addressing research that is ongoing and/or further needed to gauge various aspects of the DNT puzzle. The submissions from McDonald and Wang et al. describe user studies focusing, respectively, on what consumers expect from DNT–spoiler: they expect no collection of their data–and gauging the usability and effectiveness of current opt-out tools. Both of these lines of work argue for usable mechanisms that communicate how developers implement/envision DNT and how users can best express their preferences via these tools. NIST’s submission argues for empirical studies to set objective and usable standards for tracking protection and describes a current study of single sign-on (SSO) implementations. Thaw et al. discuss a proposal for incentivizing developers to communicate and design the various levels of rich data they need to perform certain kinds of ad targeting, and then uses a multi-arm bandit model to illustrate game-theoretic ad targeting that can be tweaked based on how much data they are allowed to collect. Finally, CASRO-ESOMAR makes a plea for exempting legitimate research purposes from DNT, so that opinion polling and academic research can avoid bias.

Transparency: A particularly fascinating thread of commentary to me was the extent to which submissions touched on or entirely focused on issues of transparency in tracking. Grossklags argues that DNT efforts will spark increased transparency but he’s not sure that will overcome some common consumer privacy barriers they see in research. Seltzer talks about the intimate relationship between transparency and privacy and concludes that a DNT header is not very transparent–in operation, not use–while TPLs are more transparent in that they are a user-side mechanism that users can inspect, change and verify correct operation. Google argues that there is a need for transparency in “what data is collected and how it is used”, leaving out the ability for users to effect or controls these things. In contrast, BlueKai also advocates for transparency in the sense of both accessing a user’s profile and user “control” over the data it collects, but it doesn’t and probably cannot extend this transparency to an understanding how BlueKai’s clients use this data. Datran Media describes their PreferenceCentral tool which allows opting out of brands the user doesn’t want targeting them (instead of ad networks, with which people are not familiar), which they argue is granular enough to avoid the “creepy” targeting feeling that users get from behavioral ads and also allow high-value targeted advertising. Evidon analogizes to physical world shopping transactions and concludes, smartly, “Anytime data that was not explicitly provided is explicitly used, there is a reflexive notion of privacy violation.” and “A permanently affixed ‘Not Me’ sign is not a representation of an engaged, meaningful choice.”

W3C vs. IETF: Finally, Mozilla seems to be the only submission that wrestles a bit with the “which standards-body?” question: W3C, IETF or some mix of both? They point out that the DNT Header is a broader issue than just web browsing so should be properly tackled by IETF where HTTP resides and the W3C effort could be focused on TPLs with a subcommittee for the DNT DOM element.

Finally, here are a bunch of submissions that don’t fit into the above categories that caught my eye:

  • Soghoian talks about the quantity and quality of information needed for security, law enforcement and fraud prevention is usually so big as to risk making it the exception that swallows the rule. Soghoian further recommends a total kibosh on certain nefarious technologies such as browser fingerprinting.

  • Lowenthal makes the very good point that browser vendors need to get more serious about managing security and privacy vulnerabilities, as that kind of risk can be best dealt with in the choke-point of the browsers that users choose, rather than the myriad of possible web entities. This would allow browsers to compete on privacy in terms of how privacy preserving they can be.

  • Mayer argues for a “generative” approach to a privacy choice signaling technology, highlighting that language preferences (via short codes) and browsing platform (via user-agent strings) are now sent as preferences in web requests and web sites are free to respond as they see fit. A DNT signaling mechanism like this would allow for great flexibility in how a web service responded to a DNT request, for example serving a DNT version of the site/resource, prompting the user for their preferences or asking for a payment before serving.

  • Yahoo points out that DNT will take a while to make it into the majority of browsers that users are using. They suggest a hybrid approach using the DAA CLEAR ad notice for backwards compatibility for browsers that don’t support DNT mechanisms and the DNT header for an opt-out that is persistent and enforceable.

Whew; I likely left out a lot of good stuff across the remaining submissions, but I hope that readers get an idea of some of the issues in play and can consult the submissions they find particularly interesting as this develops. We hope to have someone pen a “part 2″ to this entry describing the discussion during the workshop and what the next steps in DNT will be.

avatar

Internet Voting in Union Elections?

The U.S. Department of Labor (DOL) recently asked for public comment on a fascinating issue: what kind of guidelines should they give unions that want to use “electronic voting” to elect their officers? (Curiously, they defined electronic voting broadly to include computerized (DRE) voting systems, vote-by-phone systems and internet voting systems.)

As a technology policy researcher with the NSF ACCURATE e-voting center, I figured we should have good advice for DOL.

(If you need a quick primer on security issues in e-voting, GMU’s Jerry Brito has just posted an episode of his Surprisingly Free podcast where he and I work through a number of basic issues in e-voting and security. I’d suggest you check out Jerry’s podcast regularly as he gets great guests (like a podcast with CITP’s own Tim Lee) and really digs deep into the issues while keeping it at an understandable level.)

The DOL issued a Request for Information (PDF) that asked a series of questions, beginning with the very basic, “Should we issue e-voting guidelines at all?” The questions go on to ask about the necessity of voter-verified paper audit trails (VVPATs), observability, meaningful recounts, ballot secrecy, preventing flawed and/or malicious software, logging, insider threats, voter intimidation, phishing, spoofing, denial-of-service and recovering from malfunctions.

Whew. The DOL clearly wanted a “brain dump” from computer security and the voting technology communities!

It turns out that labor elections and government elections aren’t as different as I originally thought. The controlling statute for union elections (the LMRDA) and caselaw* that has developed over the years require strict ballot secrecy–such that any technology that could link a voter and their ballot is not allowed–both during voting and in any post-election process. The one major difference is that there isn’t a body of election law and regulation on top of which unions and the DOL can run their elections; for example, election laws frequently disallow campaigning or photography within a certain distance of an official polling place while that would be hard to prohibit in union elections.

After a considerable amount of wrangling and writing, ACCURATE submitted a comment, find it here in PDF. The essential points we make are pretty straightforward: 1) don’t allow internet voting from unsupervised, uncontrolled computing devices for any election that requires high integrity; and, 2) only elections that use voter-verified paper records (VVPRs) subject to an audit process that uses those records to audit the reported election outcome can avoid the various types of threats that DOL is concerned with. The idea is simple: VVPRs are independent of the software and hardware of the voting system, so it doesn’t matter how bad those aspects are as long as there is a robust parallel process that can check the result. Of course, VVPRs are no panacea: they must be carefully stored, secured and transported and ACCURATE’s HCI researchers have shown that it’s very hard to get voters to consistently check them for accuracy. However, those problems are much more tractable than, say, removing all the malware and spyware from hundreds of thousands of voter PCs and mobile devices.

I must say I was a bit surprised to see the other sets of comments submitted, mostly by voting system vendors and union organizations, but also the Electronic Privacy Information Center (EPIC). ACCURATE and EPIC seem to be lone voices in this process “porting” what we’ve learned about the difficulties of running secure civic elections to the labor sphere. Many of the unions talked about how they must have forms of electronic, phone and internet voting as their constituencies are spread far and wide, can’t make it to polling places and are concerned with environmental impacts of paper and more traditional voting methods. Of course, we would counter that accommodations can be made for most of these concerns and still not fundamentally undermine the integrity of union elections.

Both unions and vendors used an unfortunate rhetorical tactic when talking about security properties of these systems: “We’ve run x hundreds of elections using this kind of technology and have never had a problem/no one has ever complained about fraud.” Unfortunately, that’s not how security works. Akin to adversarial processes like financial audits, security isn’t something that you can base predictions of future performance on past results. That is, the SEC doesn’t say to companies that their past 10 years of financials have been in order, so take a few years off. No, security requires careful design, affirmative effort and active auditing to assure that a system doe not violate the properties it claims.

There’s a lot more in our comment, and I’d be more than happy to respond to comments if you have questions.

* Check out the “Court Cases” section of the Federal Register notice linked to above.

avatar

Domain Names Can't Defend Themselves

Today, the Kentucky Supreme Court handed down an opinion in the saga of Kentucky vs. 141 Domain Names (described a while back here on this blog). Here’s the opinion.

This case is fascinating. A quick recap: Kentucky attempted a property seizure of 141 domain names allegedly involved in gambling on the theory that the domain names themselves constituted “gambling devices” under Kentucky law and were therefore illegal. The state held a forfeiture hearing where anyone with an interest in the “property” could show up to defend their interest in the property; otherwise, the State would order the registrars to transfer “ownership” of the domain names to Kentucky. No individual claiming that they own one of the domain names showed up. Litigation began when two industry associations (iMEGA and IGC) claimed to represent unnamed persons who owned these domain names (and another lawyer showed up during litigation claiming representation of one specific domain name).

The subsequent litigation gets a bit complicated; suffice it to say that the issue of standing was what got to the KY Supreme Court: could an association that claimed it represented an owner of a domain name affected in this action properly represent this owner in court without identifying that owner and that the owner was indeed the owner of an affected domain name?

The Kentucky Supreme Court said no, that there needs to be at least one identified individual owner that will suffer harm before the association can stand in stead, ruling,

Due to the incapacity of domain names to contest their own seizure and the inability of iMEGA and IGC to litigate on behalf of anonymous registrants, the Court of Appeals is reversed and its writ is vacated.

And on the issue of whether a piece of property can represent itself:

“An Internet domain name does not have an interest in itself any more than a piece of land is interested in its own use.”

Anyway, it would seem that the options for next steps include, 1) identifying at least one owner that would suffer harm, then motion back up to the Supreme Court (given that merits had been argued at the Appeals level), or 2) decide that the anonymity of domain name ownership in this case is more important than the fight over this very weird seizure of domain names.

As a non-lawyer, I wonder if it’s possible to represent an owner as a John Doe with an affidavit of ownership of an affected domain name submitted.

UPDATE (2010-03-19T00:07:07 EDT): Check the comments for why a John Doe strategy won’t work when the interest in anonymity is to avoid personal liability rather than free expression.

A weird bonus for people that have read this far: if I open the PDF of the opinion on my Mac in Preview.app or Skim.app (two PDF readers), the “SPORTSBOOK.COM” entry in the listing of the parties on the first page is hyperlinked. However, I don’t see this in Adobe Acrobat Pro or Reader. Seems like the KY Supreme Court is, likely inadvertently, linking to one of the 141 domain names. Of course, Preview.app and Skim.app might be sharing the same library that causes this one URL to be linked… I’m not good-enough of a PDF sleuth to figure it out.

avatar

Open Government Workshop at CITP

Here at Princeton’s CITP, we have a healthy interest in issues of open government and government transparency. With the release last week of the Open Government Directive by the Obama Administration, our normally gloomy winter may prove to be considerably brighter.

In addition to creating tools like Recap and FedThread, we’ve also been thinking deeply about the nature of open and transparent government, how system designers and architects can better create transparent systems and how to achieve sustainability in open government. Related to these questions are those of the law.gov effort—providing open access to primary legal materials—and how to best facilitate the tinkerers who work on projects of open government.

These are deep issues, so we thought it best to organize a workshop and gather people from a variety of perspectives to dig in.

If you’re interested, come to our workshop next month! While we didn’t consciously plan it this way, the last day of this workshop corresponds to the first 45-day deadline under the OGD.

Open Government: Defining, Designing, and Sustaining Transparency
January 21–22, 2010
http://citp.princeton.edu/open-government-workshop/

Despite increasing interest in issues of open government and governmental transparency, the values of “openness” and “transparency” have been under-theorized. This workshop will bring together academics, government, advocates and tinkerers to examine a few critical issues in open and transparent government. How can we better conceptualize openness and transparency for government? Are there specific design and architectural needs and requirements placed upon systems by openness and transparency? How can openness and transparency best be sustained? How should we change the provision and access of primary legal materials? Finally, how do we best coordinate the supply of open government projects with the demand from tinkerers?

Anil Dash, Director of the AAAS’ new Expert Labs, will deliver the keynote. We are thrilled with the diverse list of speakers, and are looking forward to a robust conversation.

The workshop is free and open to the public, although we ask that you RSVP to so that we be sure to have a name tag and lunch for you.

avatar

Tinkering with Disclosed Source Voting Systems

As Ed pointed out in October, Sequoia Voting Systems, Inc. (“Sequoia”) announced then that it intended to publish the source code of their voting system software, called “Frontier”, currently under development. (Also see EKR‘s post: “Contrarianism on Sequoia’s Disclosed Source Voting System”.)

Yesterday, Sequoia made good on this promise and you can now pull the source code they’ve made available from their Subversion repository here:
http://sequoiadev.svn.beanstalkapp.com/projects/

Sequoia refers to this move in it’s release as “the first public disclosure of source code from a voting systems manufacturer”. Carefully parsed, that’s probably correct: there have been unintentional disclosures of source code (e.g., Diebold in 2003) and I know of two other voting industry companies that have disclosed source code (VoteHere, now out of business, and Everyone Counts), but these were either not “voting systems manufacturers” or the disclosures were not available publicly. Of course, almost all of the research systems (like VoteBox and Helios) have been truly open source. Groups like OSDV and OVC have released or will soon release voting system source code under open source licenses.

I wrote a paper ages ago (2006) on the use of open and disclosed source code for voting systems and I’m surprised at how well that analysis and set of recommendations has held up (the original paper is here, an updated version is in pages 11–41 of my PhD thesis).

The purpose of my post here is to highlight one point of that paper in a bit of detail: disclosed source software licenses need to have a few specific features to be useful to potential voting system evaluators. I’ll start by describing three examples of disclosed source software licenses and then talk about what I’d like to see, as a tinkerer, in these agreements.

The definition of an open source software product is relatively simple: for all practical purposes, anything released under an OSI-approved software license is open source, especially in the sense that one who downloads the source code will have wide latitude to copy, distribute, modify, perform, etc. the source code. What we refer to as disclosed source software is publicly released under a more restrictive license.

Three Disclosed Source Licenses

We have at least three examples of these kinds of licenses from voting systems applications:

  • Sequoia: Sequoia’s license is a good place to start, due to its relative simplicity. It grants the user a limited copyright and patent license for “reference use”, which is defined as (emphasis added):

    “Reference use” means use of the software within your company as a reference, in read only form, for the sole purposes of reviewing and inspecting the code. In addition, this license allows inspection of the code to enhance, or develop ancillary software or hardware products to enhance interoperability of your products with the software. Distribution of this source is allowed provided that this license, and all copyright notices remain in-tact.

    (If you’re a developer and you suspect that you’ve seen this license before, you might have. It’s appears identical to Microsoft’s Reference Source License (Ms-RSL) which is used to distribute things like the .NET Framework under their shared source program. That makes a good deal of sense since Frontier is written in .NET!)

  • Everyone Counts1: Everyone Counts’ (E1C) license is curious. (Unfortunately, I don’t see a copy of this license posted publicly, so I’ll just quote from it.) I say curious mostly because of the flowery language it includes, such as (emphasis added):

    The sources are provided for scrutiny in the same spirit as any member of the public may apply and obtain escorted access to a public election, and is entitled free and open access to election processes to satisfy his or herself of the veracity of the claims of electoral officers and that the process upholds democratic principles. In the same way, the elections observer is not permitted to capture and publish or otherwise make public the processes of the physical count at an election, the same applies for these sources.

    I’m not sure what that last part is about; it’s pretty well accepted—in the U.S.—that “making public the processes of the physical count” is a basic requirement of democratic elections.

    Anyway, on to the substance: It’s a pretty simple license, although more complex than the Sequoia license. The core of this license allows the user to “examine, compile and execute the resulting bytecodes” from the Java source code. It specifies that the user is allowed these rights for the purpose of “analysis forming electoral scrutiny”, which is a difficult phrase to parse. The license suffers from a lot of such wording problems, which make it pretty hard to understand.

  • VoteHere2: The VoteHere license agreement is considerably more complex and looks more like a commercial software license agreement. My favorite part, of course, is:

    TO AVOID ANY DOUBT, THIS SOFTWARE IS NOT BEING LICENSED ON AN OPEN SOURCE BASIS.

    The central component is that VoteHere restricts all your rights, other than copying and modifying the source code for evaluation purposes, and owns any derivative works you create from the source code. It has some other quirks; for example, the license, despite being a click-wrap license, has a hard term of 60 days after which all copies and such must be destroyed. (Presumably, you could click through again after 60 days, and restart the term.)

What Does an Evaluator Need in a License?

Each of these licenses has its strengths and weaknesses. The Sequoia license doesn’t seem to permit modification of the source code but is relatively simple and allows distribution of Sequoia’s unmodified code. The VoteHere and E1C licenses, however, understand that modification may be necessary to evaluate the software (VoteHere even includes a license to the system’s documentation, an essential but often overlooked part of evaluating source code.). The VoteHere license is extremely onerous in that it is very strict and places heavy burdens on the evaluator. The E1C license is flowery and hard to understand, but seems simple at the core and seems to understand what evaluators might need in terms of modifying the code during evaluation.

This raises a good question: What rights, exactly, do evaluators need to examine source code? Practically, it depends on what they want to do. If all they want to do is human line-by-line source code analysis, than a “read-only” license like Sequoia’s is probably fine. However, what about compiling the source code with debugging flags set? What about modifying the software to see how another piece of it performs?

Listed (sort of) in terms of the rights granted by U.S. copyright law, here are some thoughts:

  • Examining: If it doesn’t require making a copy, simply looking at the source code with your eyeballs is not covered by copyright law. So licenses that allow you to “examine” the source code aren’t really granting much, unless they define “examine” to mean something that implicates exclusive rights (which are listed in 17 USC 106).

  • Copying: Of course, downloading and loading source code in an IDE or text editor will make a number of copies, so at the most basic level, evaluators will need to be able to do this. Backup copies are also a necessity these days and VoteHere’s license contemplates this by allowing “reproduc[tion] for temporary archive purposes”.

  • Modification: Evaluators will need some ability to modify the code. Either simply in compiling it to execute it or the next logical step which involves special types of compilation such that “debugging flags” are set (this includes special flags and metadata in the compiled code which allows debugging tools to step through the program, set break points, etc.). Some types of evaluation require minor modifications to integrate the code into an analysis method; a simple example is just changing pieces of the code to see how other pieces will respond or inserting code that prints to the screen (which is a very primitive but useful form of debugging!). Each of these actions creates a derivative work, so that should be explicitly allowed in these licenses.

  • Distribution: At first, you might not think that evaluators would need much in the way of rights to distribute the source code. However, distributing modified works, such as a patch that fixes a bug, could be very useful. Also, being able to share the code if the official means of getting the code goes dark is often useful; for example, having a portal of voting system source code would provide this “mirroring” capability and could also allow creating a “one-stop shop” for tinkerers and researchers who would point research tools at these code repositories for analysis.

  • Performance: Both reading the source code out loud or showing it publicly are also implicated by copyright law. Why would an evaluator want to do this? Well, imagine that you’ve completed an analysis of a disclosed source code product and you want to write up your findings. It’s often useful to include snippets of code. Small snippets would likely be covered by fair use, but it’s always nice to not have to worry about that and have explicit permission to at least “display” and possibly “read aloud” source code in these contexts (think accessible or podcast versions of a report!).

  • Executing: There are a line of legal cases that say that “executing” a program is protected by copyright law due to a copy of the object code being loaded into memory at run time. Hopefully, there’s no reason to believe that permission to make copies, the first and most basic need of evaluators highlighted above, wouldn’t also include this interpretation of “executing” the code.

Outside of these types of exclusive rights, there’s also something to be said for simplicity. The simplicity of the BSD license is a great example: it’s widely used and understood to be very generous and easy to understand. The Sequoia license (being the Ms-RSL license) is very simple and easy to understand. The E1C license is not particularly complex, but it’s substance is hard to understand (again, apologies that I cannot post the text of that license). The VoteHere license is easy to understand but very complex and extremely onerous in terms of the burden it places on evaluators.

As I finally finish writing this, I’m told Sequoia might be interested in modifying their license. That would be a wonderful idea and I hope these thoughts are useful for modifying it. I do wonder how they’ll be able to modify the license and still distribute parts of the .NET Framework under a new license. Perhaps they’ll specify that the .NET parts are under the Ms-RSL and any Sequoia-sourced source code is under Sequoia’s new license. We’ll see!

1 Everyone Counts sells internet voting solutions, which are scary as hell to a lot of us.

2 VoteHere was a company, now seemingly out of business, that made a number of products including a cryptographic add-on module, Sentinel, for the Diebold/Premier AccuVote-TS voting system and a absentee ballot tracking system.

avatar

Sunlight on NASED ITA Reports

Short version: we now have gobs of voting system ITA reports, publicly available and hosted by the NSF ACCURATE e-voting center. As I explain below, ITA’s were the Independent Testing Authority laboratories that tested voting systems for many years.

Long version: Before the Election Assistance Commission (EAC) took over the testing and certification of voting systems under the Help America Vote Act (HAVA), this critical function was performed by volunteers. The National Association of State Election Directors (NASED) recognized a need for voting system testing and partnered with the Federal Election Commission (FEC) to establish a qualification program that would test systems as having met or exceeded the requirements of the 1990 and 2002 Voting System Standards.*

However, as I’ve lamented many, many times over the years, the input, output and intermediate work product of the NASED testing regime were completely secret, due to proprietary concerns on behalf of the manufacturers. Once a system completed testing, members of the public could see that an entry was made in a publicly-available spreadsheet listing the tested components and a NASED qualification number for the system. But the public was permitted no other insight into the NASED qualification regime.

Researchers were convinced from what evidence was available that the quality of the testing was highly inadequate and that the expertise didn’t exist within either the testing laboratories to perform adequate testing or the NASED technical committee to competently review the ultimate test reports submitted by the laboratories (called Independent Testing Authorities (ITA)). Naturally, when reports of problems started to crop-up, like the various Hursti vulnerabilities with Diebold memory cards, the NASED system scrambled to figure out what went wrong.

I know have more moderate views with respect to the NASED regime: sure, it was pretty bad and a lot of serious vulnerabilities slipped through the cracks, but I’m not yet convinced that just having the right people or a different process in place would have resulted in fewer problems in the field. To have fixed the NASED system would have required improvements on all fronts: the technology, the testing paradigms, the people involved and the testing and certification process.

The EAC has since taken over testing and certification. Their process is notable in its much higher level of openness and accountability; the test plans are published (previously claimed as proprietary by the testing labs), the test reports are published (previously claimed as proprietary by the vendors) and the process is specified in detail with a program manual, a laboratory manual, notices of clarification, etc.

This is all great and it helps to increase the transparency of the EAC certification program. But, what about the past? What about the testing that NASED did? Well, we don’t know much about it for a number of reasons, chief among them that we never saw any of the materials mentioned above that are now available in the new EAC system.

Through a fortunate FOIA request made of the EAC on behalf of election sleuth Susan Greenhalgh, we now have available a slew of ITA reports from one of the ITAs, Ciber.

The reports are available at the following location (hosted by our NSF ACCURATE e-voting center):

http://accurate-voting.org/docs/ita-reports/

These reports cover the Software ITA testing performed by the ITA Ciber for the following voting systems:

  • Automark AIMS 1.0.9
  • Diebold GEMS 1.18.19
  • Diebold GEMS 1.18.22
  • Diebold GEMS 1.18.24
  • Diebold AccuVote-TSx Model D
  • Diebold AccuVote-TSx Model D w/ AccuView Printer
  • Diebold Assure 1.0
  • Diebold Assure 1.1
  • Diebold Election Media Processor 4.6.2
  • Diebold Optical Scan Accumulator Adapter
  • Hart System 4.0
  • Hart System 4.1
  • Hart System 6.0
  • Hart System 6.2
  • Hart System 6.2.1

I’ll be looking at these in my leisure over coming weeks and pointing out interesting features of these reports and the associated correspondence included in the FOIA production.

*The distinction between certification and qualification, although vague, appears to be that under the NASED system, states did the ultimate certification of a voting system for fitness in future elections.

avatar

Obama's CTO: two positions?

Paul Blumenthal over at the Sunlight Foundation Blog points to a new report from the Congressional Research Service: “A Federal Chief Technology Officer in the Obama Administration: Option and Issues for Consideration”.

This report does a good job of analyzing both existing positions in federal government that have roles that overlap with some of the potential responsibilities of an “Obama CTO” and the questions that Congress would want to consider if such a position is established by statute rather than an executive order.

The crux of the current issue, for me, is summed up well by this quote from the CRS report’s conclusion:

Although the campaign position paper and transition website provide explicit information on at least some of the duties of a CTO, they do not provide information on a CTO’s organizational placement, structure, or relationship to existing offices. In addition, neither the paper nor website states whether the president intends to establish this position/office by executive order or whether he would seek legislation to create a statutory foundation for its duties and authorities.

The various issues in the mix here lead me to one conclusion: an “Obama CTO” position will be very different from the responsibilities of a traditional chief technology officer. There seem to be at least two positions involved: one visionary and one fixer. That is, one person to push the envelope in a grounded-but-futurist style in terms of what is possible and then one person to negotiate the myriad of agencies and bureaucratic parameters to get things done.

As for the first position, I’d like to say a futurist would be a good idea. However, futurists don’t like to be tethered so much to current reality. A better idea is, I think, a senior academic with broad connections and deep interest and understanding in emerging technologies. The culture of academia, when it works well, can produce individuals who make connections quickly, know how to evaluate complex ideas and are good at filling gaps between what is known and not known for a particular proposal. I’m thinking a Felten, Lessig, etc. here.

As for the fixer, this desperately needs to be someone with experience negotiating complex endeavors between conflicting government fiefdoms. Vivek Kundra, the CTO for the District of Columbia, struck me as exactly this kind of person when he came to visit last semester here at Princeton’s CITP. When Kundra’s name came up as one of two shortlisted candidates for “Obama CTO”, I was a bit skeptical as I wasn’t convinced he had the appropriate visionary qualities. However, as part of a team, I think he’d be invaluable.

It could be possible that the other shortlisted candidate, Cisco’s Padmasree Warrior, would have enough of the visionary element to make up the other side of the team; I doubt she has (what I consider to be) the requisite governmental fixer qualities.

So, why not two positions? Does anyone have both these qualities? Do people agree that these are the right qualities?

As to how it would be structured, it’s almost as if it should be a spider position — a reference to a position in soccer that isn’t tethered by role. That is, they should be free from some of the encumbrances that make government information technology innovation so difficult.

avatar

CA SoS Bowen sends proposals to EAC

California Secretary of State Debra Bowen has sent a letter to Chair Gineen Beach of the US Election Assistance Commission (EAC) outlining three proposals that she thinks will markedly improve the integrity of voting systems in the country.

I’ve put a copy of Bowen’s letter here (87kB PDF).

Bowen’s three proposals are:

  • Vulnerability Reporting — The EAC should require that vendors disclose vulnerabilities, flaws, problems, etc. to the EAC as the system certification authority and to all the state election directors that use the affected equipment.
  • Uniform Incident Reporting — The EAC should create and adopt procedures that jurisdictions can follow to collect and report data about incidents they experience with their voting systems.
  • Voting System Performance Measurement — As part of the Election Day Survey, the EAC should systematically collect data from election officials about how voting systems perform during general elections.

In my opinion, each of these would be a welcome move for the EAC.

These proposals would put into place a number of essential missing elements of administering computerized elections equipment. First, for the users of these systems, election officials, it can be extremely frustrating and debilitating if they suspect that some voting system flaw is responsible for problems they’re experiencing. Often, when errors arise, contingency planning requires detailed knowledge about specific details of a voting system flaw. Without knowing as much as possible about the problem they’re facing, election officials can exacerbate the problem. At best, not knowing about a potential flaw can do what Bowen describes: doom the election official, and others with the same equipment, to repeatedly encounter the flaw in subsequent elections. Of course, vendors are the most likely to have useful information on a given flaw, and they should be required to report this information to both the EAC and election officials.

Often the most information we have about voting system incidents come from reports from local journalists. These reporters don’t tend to cover high technology too often; their reports are often incomplete and in many cases simply and obviously incorrect. Having a standardized set of elements that an election official can collect and report about voting system incidents will help to ensure that the data comes directly from those experiencing a given problem. The EAC should design such procedures and then a system for collecting and reporting these issues to other election officials and the public.

Finally, many of us were disappointed to learn that the 2008 Election Day survey would not include questions about voting system performance. Election Day is a unique and hard-to-replicate event where very little systematic data is collected about voting machine performance. The OurVoteLive and MyVote1 efforts go a long way towards actionable, qualitative data that can help to increase enfranchisement. However, self-reported data from the operators of the machinery of our democracy would be a gold mine in terms of identifying and examining trends in how this machinery performs, both good and bad.

I know a number of people, including Susannah Goodman at Common Cause as well as John Gideon and Ellen Theisen of VotersUnite!, who have been championing one or another of these proposals in their advocacy. The fact that Debra Bowen has penned this letter is a testament to the reason behind their efforts.

avatar

Total Election Awareness

Ed recently made a number of predictions about election day (“Election 2008: What Might Go Wrong”). In terms of long lines and voting machine problems, his predictions were pretty spot on.

On election day, I was one of a number of volunteers for the Election Protection Coalition at one of 25 call centers around the nation. Kim Zetter describes the OurVoteLive project, involving 100 non-profit organizations, ten thousand volunteers that answered 86,000 calls with a 750 line call-center operation (“U.S. Elections — It Takes a Village”):

The Election Protection Coalition, a network of more than 100 legal, voting rights and civil liberties groups was the force behind the 1-866-OUR-VOTE hotline, which provided legal experts to answer nearly 87,000 calls that came in over 750 phone lines on Election Day and dispatched experts to address problems in the field as they arose.

Pam Smith of the Verified Voting Foundation made sure each call center had a voting technologist responsible for responding to voting machine reports and advising mobile legal volunteers how to respond on the ground. It was simply a massive operation. Matt Zimmerman and Tim Jones of the Electronic Frontier Foundation and their team get serious props as developers and designers of the their Total Election Awareness (TEA) software behind OurVoteLive.

As Kim describes in the Wired article, the call data is all available in CSV, maps, tables, etc.: http://www.ourvotelive.org/. I just completed a preliminary qualitative analysis of the 1800 or so voting equipment incident reports: “A Preliminary Analysis of OVL Voting Equipment Reports”. Quite a bit of data in there with which to inform future efforts.