February 21, 2018

Archives for August 2016

The workshop on Data and Algorithmic Transparency

From online advertising to Uber to predictive policing, algorithmic systems powered by personal data affect more and more of our lives. As our society begins to grapple with the consequences of this shift, empirical investigation of these systems has proved vital to understand the potential for discrimination, privacy breaches, and vulnerability to manipulation.

This emerging field of research, which we’re calling Data and Algorithmic Transparency, seems poised to grow dramatically. But it faces a number of methodological challenges which can only be solved by bringing together expertise from a variety of disciplines. That is why Alan Mislove and I are organizing the first workshop on Data and Algorithmic Transparency at Columbia University on Nov 19, 2016.

Here are three reasons you should participate in this workshop.

  1. Start of a new, interdisciplinary community. The set of disciplines represented on the Program Committee is strikingly diverse: Internet measurement, information privacy/security, computer systems, human-computer interaction, law, and media studies. Industrial research and government are also represented. We expect the workshop itself to have a similar mix of participants, and that is exactly what is needed to make transparency research a success. Alan and I (and others including Nikolaos Laoutaris) are committed to growing and nurturing this community over the next several years.
  1. Co-located with two other exciting events: the Data Transparency Lab conference (DTL ‘16) and the Fairness, Accountability, and Transparency in Machine Learning workshop (FAT-ML ‘16). DTL shares many of the goals of the DAT workshop, but is non-academic. FAT-ML has a complementary relationship with the goals of DAT: it seeks to develop machine learning techniques for developers of algorithmic systems to improve fairness and accountability, whereas DAT seeks to analyze existing systems, typically “from the outside”. The events are consecutive and non-overlapping, and participants of each event are encouraged to attend the others.
  1. A format that makes the most of everyone’s time. At most computer science conferences, each speaker mumbles through their slides while the audience is a sea of laptops, awaiting their turn. DAT will be the opposite. We plan to have paper discussions instead of paper presentations, with commenters and participants, rather than authors, doing most of the speaking about each paper. This first edition of DAT will be non-archival (but peer-reviewed), and one goal of the discussions is to help authors improve their papers for later publication. We are also soliciting talk proposals about already published work; groups of accepted talks will be organized into panels.

See you in New York City!

A response to the National Association of Secretaries of State

NASS logo
Election administration in the United States is largely managed state-by-state, with a small amount of Federal involvement. This generally means that each state’s chief election official is that state’s Secretary of State. Their umbrella organization, the National Association of Secretaries of State, consequently has a lot of involvement in voting issues, and recently issued a press release concerning voting system security that was remarkably erroneous. What follows is a point-by-point commentary on their press release.

To date, there has been no indication from national security agencies to states that any specific or credible threat exists when it comes to cyber security and the November 2016 general election.

Unfortunately, we now know that it appears that Russia broke into the DNC’s computers and leaked emails with clear intent to influence the U.S. presidential election (see, e.g., the New York Times’s article on July 26: “Why Security Experts Think Russia was Behind the DNC Breach”). It’s entirely reasonable to extrapolate from this that they may be willing to conduct further operations with the same goals, meaning that it’s necessary to take appropriate steps to mitigate against such attacks, regardless of the level of specificity of available intel.

However, as a routine part of any election cycle, Secretaries of State and their local government counterparts work with federal partners, such as the U.S. Election Assistance Commission (EAC) and the National Institute of Standards and Technology (NIST), to maintain rigorous testing and certification standards for voting systems. Risk management practices and controls, including the physical handling and storage of voting equipment, are important elements of this work.

Expert analyses of current election systems (largely conducted ten years ago in California, Ohio, and Florida) found a wide variety of security problems. While some states have responded to these issues by replacing the worst paperless electronic voting systems, other states, including several “battleground” states, continue to use unacceptably insecure systems.

State election offices also proactively utilize election IT professionals and security experts to regularly review, identify and address any vulnerabilities with systems, including voter registration databases and election night reporting systems (which display the unofficial tallies that are ultimately verified via statewide canvassing).

The implication here is that all state election officials have addressed known vulnerabilities. This is incorrect. While some states have been quite proactive, other states have done nothing of the sort.

A national hacking of the election is highly improbable due to our unique, decentralized process.

Security vulnerabilities have nothing to do with probabilities. They instead have to do with a cost/benefit analysis on the part of the attacker. An adversary doesn’t have to attack all 50 states. All they have to do is tamper with the “battleground” states where small shifts in the vote can change the outcome for the whole state.

Each state and locality conducts its own system of voting, complete with standards and security requirements for equipment and software. Most states publicly conduct logic and accuracy testing of their machines prior to the election to ensure that they are working and tabulating properly, then they are sealed until Election Day to prevent tampering.

So-called “logic and accuracy testing” varies from location to location, but most boil down to casting a small number of votes for each candidate, on a handful of machines, and making sure they’re all there in a mock tally. Similarly, local election officials will have procedures in place to make sure machines are properly “zeroed”. Computer scientists refer to these as “sanity tests”, in that if the system fails, then something is obviously broken. If these tests pass, they say nothing about the sort of tampering that a sophisticated nation-state adversary might conduct.

Some election officials conduct more sophisticated “parallel testing”, where some voting equipment is pulled out of general service and is instead set up in a mock precinct, on election day, where mock voters cast seemingly real ballots. These machines would have a harder time distinguishing whether they were in “test” versus “production” conditions. But what happens if the machines fail the parallel test? By then, the election is over, the voters are gone, and there’s potentially no way to reconstruct the intent of the voters.

Furthermore, electronic voting machines are not Internet-based and do not connect to each other online.

This is partially true. Electronic voting systems do connect to one another through in-precinct local networks or through the motion of memory cards of various sorts. They similarly connect to election management systems before the start of the election (when they’re loaded with ballot definitions) and after the end of the election (for backups, recounts, inventory control, and/or being cleared prior to subsequent elections). All of these “touch points” represent opportunities for malware to cross the “air gap” boundaries. We built attacks like these a decade ago as part of the California Top to Bottom Review, showing how malware could spread “virally” to an entire county’s fleet of voting equipment. Attacks like these require a non-trivial up-front engineering effort, plus additional effort for deployment, but these efforts are well within the capabilities of a nation-state adversary.

Following the election, state and local jurisdictions conduct a canvass to review vote counting, ultimately producing the election results that are officially certified. Post-election audits help to further guard against deliberate manipulation of the election, as well as unintentional software, hardware or programming problems.

Post-election audits aren’t conducted at all in some jurisdictions, and would likely be meaningless against the sort of adversary we’re talking about. If a paperless electronic voting system was hacked, there might well be forensic evidence that the attackers left behind, but such evidence would be a challenge to identify quickly, particularly in the charged atmosphere of a disputed election result.

We look forward to continued information-sharing with federal partners in order to evaluate cyber risks, and respond to them accordingly, as part of ongoing state election emergency preparedness planning for November.

“Emergency preparedness” is definitely the proper way to consider the problem. Just as we must have contingency plans for all sorts of natural phenomena, like hurricanes, we must also be prepared for man-made phenomena, where we might be unable to reconstruct an election tally that accurately represents the will of the people.

The correct time to make such plans is right now, before the election. Since it’s far too late to decommission and replace our insecure equipment, we must instead plan for rapid responses, such as quickly printing single-issue paper ballots, bringing voters back to the polls, and doing it all over again. If such plans are made now, their very existence changes the cost/benefit equation for our adversaries, and will hopefully dissuade these adversaries from acting.

Supplement for Revealing Algorithmic Rankers (Table 1)

Table 1: A ranking of Computer Science departments per csrankings.org, with additional attributes from the NRC assessment dataset. Here, the average count computes the geometric mean of the adjusted number of publications in each area by institution, faculty is the number of faculty in the department, pubs is the average number of publications per faculty (2000-2006) , GRE is the average GRE scores (2004-2006). Departments are ranked by average count.

Rank (CSR) Name Average Count (CSR) Faculty (CSR) Pubs (NRC) GRE (NRC)
1 Carnegie Mellon University 18.3 122 2 791
2 Massachusetts Institute of Technology 15 64 3 772
3 Stanford University 14.3 55 5 800
4 University of California–Berkeley 11.4 50 3 789
5 University of Illinois–Urbana-Champaign 10.5 55 3 772
6 University of Washington 10.3 50 2 796
7 Georgia Institute of Technology 8.9 81 2 797
8 University of California–San Diego 7.8 49 3 797
9 Cornell University 6.9 45 2 800
10 University of Michigan 6.8 63 3 800
11 University of Texas–Austin 6.6 43 3 789
12 Columbia University 6.3 49 3 788
13 University of Massachusetts–Amherst 6.2 47 2 796
14 University of Maryland–College Park 5.5 42 2 791
15 University of Wisconsin–Madison 5.1 35 2 793
16 University of Southern California 4.4 47 3 793
17 University of California–Los Angeles 4.3 32 3 797
18 Northeastern University 4 46 2 797
19 Purdue University–West Lafayette 3.6 42 2 772
20 Harvard University 3.4 29 3 794
20 University of Pennsylvania 3.4 32 3 800
22 University of California–Santa Barbara 3.2 28 4 793
22 Princeton University 3.2 27 2 796
24 New York University 3 29 2 796
24 Ohio State University 3 39 3 798
26 University of California–Davis 2.9 27 2 771
27 Rutgers The State University of New Jersey–New Brunswick 2.8 33 2 758
27 University of Minnesota–Twin Cities 2.8 37 2 777
29 Brown University 2.5 24 2 768
30 Northwestern University 2.4 35 1 787
31 Pennsylvania State University 2.3 28 3 790
31 Texas A & M University–College Station 2.3 36 1 775
33 State University of New York–Stony Brook 2.2 33 3 796
33 Indiana University–Bloomington 2.2 35 1 765
33 Duke University 2.2 22 3 800
33 Rice University 2.2 18 2 800
37 University of Utah 2.1 29 2 776
37 Johns Hopkins University 2.1 24 2 766
39 University of Chicago 2 28 2 779
40 University of California–Irvine 1.9 28 2 787
41 Boston University 1.6 15 2 783
41 University of Colorado–Boulder 1.6 32 1 761
41 University of North Carolina–Chapel Hill 1.6 22 2 794
41 Dartmouth College 1.6 18 2 794
45 Yale University 1.5 18 2 800
45 University of Virginia 1.5 18 2 789
45 University of Rochester 1.5 18 3 786
48 Arizona State University 1.4 14 2 787
48 University of Arizona 1.4 18 2 784
48 Virginia Polytechnic Institute and State University 1.4 32 1 780
48 Washington University in St. Louis 1.4 17 2 790