October 11, 2024

What We Lose if We Lose Data.gov

In its latest 2011 budget proposal, Congress makes deep cuts to the Electronic Government Fund. This fund supports the continued development and upkeep of several key open government websites, including Data.gov, USASpending.gov and the IT Dashboard. An earlier proposal would have cut the funding from $34 million to $2 million this year, although the current proposal would allocate $17 million to the fund.

Reports say that major cuts to the e-government fund would force OMB to shut down these transparency sites. This would strike a significant blow to the open government movement, and I think it’s important to emphasize exactly why shuttering a site like Data.gov would be so detrimental to transparency.

On its face, Data.gov is a useful catalog. It helps people find the datasets that government has made available to the public. But the catalog is really a convenience that doesn’t necessarily need to be provided by the government itself. Since the vast majority of datasets are hosted on individual agency servers—not directly by Data.gov—private developers could potentially replicate the catalog with only a small amount of effort. So even if Data.gov goes offline, nearly all of the data still exist online, and a private developer could go rebuild a version of the catalog, maybe with even better features and interfaces.

But Data.gov also plays a crucial behind the scenes role, setting standards for open data and helping individual departments and agencies live up to those standards. Data.gov establishes a standard, cross-agency process for publishing raw datasets. The program gives agencies clear guidance on the mechanics and requirements for releasing each new dataset online.

There’s a Data.gov manual that formally documents and teaches this process. Each agency has a lead Data.gov point-of-contact, who’s responsible for identifying publishable datasets and for ensuring that when data is published, it meets information quality guidelines. Each dataset needs to be published with a well-defined set of common metadata fields, so that it can be organized and searched. Moreover, thanks to Data.gov, all the data is funneled through at least five stages of intermediate review—including national security and privacy reviews—before final approval and publication. That process isn’t quick, but it does help ensure that key goals are satisfied.

When agency staff have data they want to publish, they use a special part of the Data.gov website, which outside users never see, called the Data Management System (DMS). This back-end administrative interface allows agency points-of-contact to efficiently coordinate publishing activities agency-wide, and it gives individual data stewards a way to easily upload, view and maintain their own datasets.

My main concern is that this invaluable but underappreciated infrastructure will be lost when IT systems are de-funded. The individual roles and responsibilities, the informal norms and pressures, and perhaps even the tacit authority to put new datasets online would likely also disappear. The loss of structure would probably mean that sharply reduced amounts of data will be put online in the future. The datasets that do get published in an ad hoc way would likely lack the uniformity and quality that the current process creates.

Releasing a new dataset online is already a difficult task for many agencies. While the current standards and processes may be far from perfect, Data.gov provides agencies with a firm footing on which they can base their transparency efforts. I don’t know how much funding is necessary to maintain these critical back-end processes, but whatever Congress decides, it should budget sufficient funds—and direct that they be used—to preserve these critically important tools.

Federating the "big four" computer security conferences

Last year, I wrote a report about rebooting the CS publication process (Tinker post, full tech report; an abbreviated version has been accepted to appear as a Communications of the ACM viewpoint article). I talked about how we might handle four different classes of research papers (“top papers” which get in without incident, “bubble papers” which could well have been published if only there was capacity, “second tier” papers which are only of interest to limited communities, and “noncompetitive” papers that have no chance) and I suggested that we need to redesign how we handle our publication process, primarily by adopting something akin to arXiv.org on a massive scale. My essay goes into detail on the benefits and challenges of making this happen.

Of all the related ideas out there, the one I find most attractive is what the database community has done with Proceedings of the VLDB Endowment (see also, their FAQ). In short, if you want to publish a paper in VLDB, one of the top conferences in databases, you must submit your manuscript to the PVLDB. Submissions then go through a journal-like two-round reviewing process. You can submit a paper at any time and you’re promised a response within two months. Accepted papers are published immediately online and are also presented at the next VLDB conference.

I would love to extend the PVLDB idea to the field of computer security scholarship, but this is troublesome when our “big four” security conferences — ISOC NDSS, IEEE Security & Privacy (the “Oakland” conference), USENIX Security, and ACM CCS — are governed by four separate professional societies. Back in the old days (ten years ago?), NDSS and USENIX Security were the places you sent “systems” security work, while Oakland and CCS were where you sent “theoretical” security work. Today, that dichotomy doesn’t really exist any more. You pretty much just send your paper to the conference with next deadline. Pretty much the same community of people serves on each program committee and the same sorts of papers appear at every one of these conferences. (Although USENIX Security and NDSS may well still have a preference for “systems” work, the “theory” bias at Oakland and CCS is gone.)

My new idea: Imagine that we set up the “Federated Proceedings of Computer Security” (representing a federation of the four professional societies in question). It’s a virtual conference, publishing exclusively online, so it has no effective limits on the number of papers it might publish. Manuscripts could be submitted to FPCS with rolling deadlines (let’s say one every three months, just like we have now) and conference-like program committees would be assembled for each deadline. (PVLDB has continuous submissions and publications. We could do that just as well.) Operating like a conference PC, top papers would be accepted rapidly and be “published” with the speed of a normal conference PC process. The “bubble” papers that would otherwise have been rejected by our traditional conference process would now have a chance to be edited and go through a second round of review with the same reviewers. Noncompetitive papers would continue to be rejected, as always.

How would we connect FPCS back to the big four security conferences? Simple: once a paper is accepted for FPCS publication, it would appear at the next of the “big four” conferences. Initially, FPCS would operate concurrently with the regular conference submission process, but it could quickly replace it as well, just as PVLDB quickly became the exclusive mechanism for submitting a paper to VLDB.

One more idea: there’s no reason that FPCS submissions need to be guaranteed a slot in one of the big four security conferences. It’s entirely reasonable that we could increase the acceptance rate at FPCS, and have a second round of winnowing for which papers are presented at our conferences. This could either be designed as a “pull” process, where separate conference program committees pick and choose from the FPCS accepted papers, or it could be designed as a “push” process, where conferences give a number of slots to FPCS, which then decides which papers to “award” with a conference presentation. Either way, any paper that’s not immediately given a conference slot is still published, and any such paper that turns out to be a big hit can always be awarded with a conference presentation, even years after the fact.

This sort of two-tier structure has some nice benefits. Good-but-not-stellar papers get properly published, better papers get recognized as such, the whole process operates with lower latency than our current system. Furthermore, we get many fewer papers going around the submit/reject/revise/resubmit treadmill, thus lowering the workload on successive program committees. It’s full of win.

Of course, there are many complications that would get in the way of making this happen:

  • We need a critical mass to get this off the ground. We could initially roll it out with a subset of the big four, and/or with more widely spaced deadlines, but it would be great if the whole big four bought into the idea all at once.
  • We would need to harmonize things like page length and other formatting requirements, as well as have a unified policy on single vs. double-blind submissions.
  • We would need a suitable copyright policy, perhaps adopting something like the Usenix model where authors retain their copyright while agreeing to allow FPCS (and its constituent conferences) the right to republish their work. ACM and IEEE would require arm-twisting to go along with this.
  • We would need a governance structure for FPCS. That would include a steering committee for selecting the editor/program chairs, but who watches the watchers?
  • What do we do with our journals? FPCS changes our conference process around, but doesn’t touch our journals at all. Of course, the journals could also reinvent themselves, but that’s a separate topic.

In summary, my proposed Federated Proceedings of Computer Security adapts many of the good ideas developed by the database community with their PVLDB. We could adopt it incrementally for only one of the big four conferences or we could go whole hog and try to change all four at once.

Thoughts?

Why seals can't secure elections

Over the last few weeks, I’ve described the chaotic attempts of the State of New Jersey to come up with tamper-indicating seals and a seal use protocol to secure its voting machines.

A seal use protocol can allow the seal user to gain some assurance that the sealed material has not been tampered with. But here is the critical problem with using seals in elections: Who is the seal user that needs this assurance? It is not just election officials: it is the citizenry.

Democratic elections present a uniquely difficult set of problems to be solved by a security protocol. In particular, the ballot box or voting machine contains votes that may throw the government out of office. Therefore, it’s not just the government—that is, election officials—that need evidence that no tampering has occurred, it’s the public and the candidates. The election officials (representing the government) have a conflict of interest; corrupt election officials may hire corrupt seal inspectors, or deliberately hire incompetent inspectors, or deliberately fail to train them. Even if the public officials who run the elections are not at all corrupt, the democratic process requires sufficient transparency that the public (and the losing candidates) can be convinced that the process was fair.

In the late 19th century, after widespread, pervasive, and long-lasting fraud by election officials, democracies such as Australia and the United States implemented election protocols in an attempt to solve this problem. The struggle to achieve fair elections lasted for decades and was hard-fought.

A typical 1890s solution works as follows: At the beginning of election day, in the polling place, the ballot box is opened so that representatives of all political parties can see for themselves that it is empty (and does not contain hidden compartments). Then the ballot box is closed, and voting begins. The witnesses from all parties remain near the ballot box all day, so they can see that no one opens it and no one stuffs it. The box has a mechanism that rings a bell whenever a ballot is inserted, to alert the witnesses. At the close of the polls, the ballot box is opened, and the ballots are counted in the presence of witnesses.

drawing of 1890 polling place
(From Elements of Civil Government by Alexander L. Peterman, 1891)

In principle, then, there is no single person or entity that needs to be trusted: the parties watch each other. And this protocol needs no seals at all!

Democratic elections pose difficult problems not just for security protocols in general, but for seal use protocols in particular. Consider the use of tamper-evident security seals in an election where a ballot box is to be protected by seals while it is transported and stored by election officials out of the sight of witnesses. A good protocol for the use of seals requires that seals be chosen with care and deliberation, and that inspectors have substantial and lengthy training on each kind of seal they are supposed to inspect. Without trained inspectors, it is all too easy for an attacker to remove and replace the seal without likelihood of detection.

Consider an audit or recount of a ballot box, days or weeks after an election. It reappears to the presence of witnesses from the political parties from its custody in the hands of election officials. The tamper evident seals are inspected and removed—but by whom?

If elections are to be conducted by the same principles of transparency established over a century ago, the rationale for the selection of particular security seals must be made transparent to the public, to the candidates, and to the political parties. Witnesses from the parties and from the public must be able to receive training on detection of tampering of those particular seals. There must be (the possibility of) public debate and discussion over the effectiveness of these physical security protocols.

It is not clear that this is practical. To my knowledge, such transparency in seal use protocols has never been attempted.


Bibliographic citation for the research paper behind this whole series of posts:
Security Seals On Voting Machines: A Case Study, by Andrew W. Appel. Accepted for publication, ACM Transactions on Information and System Security (TISSEC), 2011.

The Next Step towards an Open Internet

Now that the FCC has finally acted to safeguard network neutrality, the time has come to take the next step toward creating a level playing field on the rest of the Information Superhighway. Network neutrality rules are designed to ensure that large telecommunications companies do not squelch free speech and online innovation. However, it is increasingly evident that broadband companies are not the only threat to the open Internet. In short, federal regulators need to act now to safeguard social network neutrality.

The time to examine this issue could not be better. Facebook is the dominant social network in countries other than Brazil, where everybody uses Friendster or something. Facebook has achieved near-monopoly status in the social networking market. It now dominates the web, permeating all aspects of the information landscape. More than 2.5 million websites have integrated with Facebook. Indeed, there is evidence that people are turning to social networks instead of faceless search engines for many types of queries.

Social networks will soon be the primary gatekeepers standing between average Internet users and the web’s promise of information utopia. But can we trust them with this new-found power? Friends are unlikely to be an unbiased or complete source of information on most topics, creating silos of ignorance among the disparate components of the social graph. Meanwhile, social networks will have the power to make or break Internet businesses built atop the enormous quantity of referral traffic they will be able to generate. What will become of these businesses when friendships and tastes change? For example, there is recent evidence that social networks are hastening the decline of the music industry by promoting unknown artists who provide their music and streaming videos for free.

Social network usage patterns reflect deep divisions of race and class. Unregulated social networks could rapidly become virtual gated communities, with users cut off from others who could provide them with a diversity of perspectives. Right now, there’s no regulation of the immense decision-influencing power that friends have, and there are no measures in place to ensure that friends provide a neutral and balanced set of viewpoints. Fortunately, policy-makers have a rare opportunity to preempt the dangerous consequences of leaving this new technology to develop unchecked.

The time has come to create a Federal Friendship Commission to ensure that the immense power of social networks is not abused. For example, social network users who have their friend requests denied currently have no legal recourse. Users should have the option to appeal friend rejections to the FFC to verify that they don’t violate social network neutrality. Unregulated social networks will give many users a distorted view of the world dominated by the partisan, religious, and cultural prejudices of their immediate neighbors in the social graph. The FFC can correct this by requiring social networks to give equal time to any biased wall post.

However, others have suggested lighter-touch regulation, simply requiring each person to have friends of many races, religions, and political persuasions. Still others have suggested allowing information harms to be remedied through direct litigation—perhaps via tort reform that recognizes a new private right of action against violations of the “duty to friend.” As social networking software will soon be found throughout all aspects of society, urgent intervention is needed to forestall “The Tyranny of The Farmville.”

Of course, social network neutrality is just one of the policy tools regulators should use to ensure a level playing field. For example, the Department of Justice may need to more aggressively employ its antitrust powers to combat the recent dangerous concentration of social networking market share on popular micro-blogging services. But enacting formal social network neutrality rules is an important first step towards a more open web.

Building a better CA infrastructure

As several Tor project authors, Ben Adida and many others have written, our certificate authority infrastructure has the flaw that any one CA, anywhere on the planet, can issue a certificate for any web site, anywhere else on the planet. This was tolerable when the only game in town was VeriSign, but now that’s just untenable. So what solutions are available?

First, some non-solutions: Extended validation certs do nothing useful. Will users be properly trained to look for the extra changes in browser behavior as to scream when they’re absent via a normal cert? Fat chance. Similarly, certificate revocation lists buy you nothing if you can’t actually download them (a notable issue if you’re stuck behind the firewall of somebody who wants to attack you).

A straightforward idea is to track the certs you see over time and generate a prominent warning if you see something anomalous. This is available as a fully-functioning Firefox extension, Certificate Patrol. This should be built into every browser.

In addition to your first-hand personal observations, why not leverage other resources on the network to make their own observations? For example, while Google is crawling the web, it can easily save SSL/TLS certificates when it sees them, and browsers could use a real-time API much like Google SafeBrowsing. A research group at CMU has already built something like this, which they call a network notary. In essence, you can have multiple network services, running from different vantage points in the network, all telling you whether the cryptographic credentials you got match what others are seeing. Of course, if you’re stuck behind an attacker’s firewall, the attacker will similarly filter out all these sites.

UPDATE: Google is now doing almost exactly what I suggested.

There are a variety of other proposals out there, notably trying to leverage DNSSEC to enhance or supplant the need for SSL/TLS certificates. Since DNSSEC provides more control over your DNS records, it also provides more control over who can issue SSL/TLS certificates for your web site. If and when DNSSEC becomes universally supported, this would be a bit harder for attacker firewalls to filter without breaking everything, so I certainly hope this takes off.

Let’s say that future browsers properly use all of these tricks and can unquestionably determine for you, with perfect accuracy, when you’re getting a bogus connection. Your browser will display an impressive error dialog and refuses to load the web site. Is that sufficient? This will certainly break all the hotel WiFi systems that want to redirect you to an internal site where they can charge you to access the network. (Arguably, this sort of functionality belongs elsewhere in the software stack, such as through IEEE 802.21, notably used to connect AT&T iPhones to the WiFi service at Starbucks.) Beyond that, though, should the browser just steadfastly refuse to allow the connection? I’ve been to at least one organization whose internal WiFi network insists that it proxy all of your https sessions and, in fact, issues fabricated certificates that you’re expected to configure your browser to trust. We need to support that sort of thing when it’s required, but again, it would perhaps best be supported by some kind of side-channel protocol extension, not by doing a deliberate MITM attack on the crypto protocol.

Corner cases aside, what if you’re truly in a hostile environment and your browser has genuinely detected a network adversary? Should the browser refuse the connection, or should there be some other option? And if so, what would that be? Should the browser perhaps allow the connection (with much gnashing of teeth and throbbing red borders on the window)? Should previous cookies and saved state be hidden away? Should web sites like Gmail and Facebook allow users to have two separate passwords, one for “genuine” login and a separate one for “Yes, I’m in a hostile location, but I need to send and receive email in a limited but still useful fashion?”

[Editor’s note: you may also be interested in the many prior posts on this topic by Freedom to Tinker contributors: 1, 2, 3, 4, 5, 6, 7, 8 — as well as the “Emerging Threats to Online Trust: The Role of Public Policy and Browser Certificates” event that CITP hosted in DC last year with policymakers, industry, and activists.]