April 24, 2014

avatar

What We Lose if We Lose Data.gov

In its latest 2011 budget proposal, Congress makes deep cuts to the Electronic Government Fund. This fund supports the continued development and upkeep of several key open government websites, including Data.gov, USASpending.gov and the IT Dashboard. An earlier proposal would have cut the funding from $34 million to $2 million this year, although the current proposal would allocate $17 million to the fund.

Reports say that major cuts to the e-government fund would force OMB to shut down these transparency sites. This would strike a significant blow to the open government movement, and I think it’s important to emphasize exactly why shuttering a site like Data.gov would be so detrimental to transparency.

On its face, Data.gov is a useful catalog. It helps people find the datasets that government has made available to the public. But the catalog is really a convenience that doesn’t necessarily need to be provided by the government itself. Since the vast majority of datasets are hosted on individual agency servers—not directly by Data.gov—private developers could potentially replicate the catalog with only a small amount of effort. So even if Data.gov goes offline, nearly all of the data still exist online, and a private developer could go rebuild a version of the catalog, maybe with even better features and interfaces.

But Data.gov also plays a crucial behind the scenes role, setting standards for open data and helping individual departments and agencies live up to those standards. Data.gov establishes a standard, cross-agency process for publishing raw datasets. The program gives agencies clear guidance on the mechanics and requirements for releasing each new dataset online.

There’s a Data.gov manual that formally documents and teaches this process. Each agency has a lead Data.gov point-of-contact, who’s responsible for identifying publishable datasets and for ensuring that when data is published, it meets information quality guidelines. Each dataset needs to be published with a well-defined set of common metadata fields, so that it can be organized and searched. Moreover, thanks to Data.gov, all the data is funneled through at least five stages of intermediate review—including national security and privacy reviews—before final approval and publication. That process isn’t quick, but it does help ensure that key goals are satisfied.

When agency staff have data they want to publish, they use a special part of the Data.gov website, which outside users never see, called the Data Management System (DMS). This back-end administrative interface allows agency points-of-contact to efficiently coordinate publishing activities agency-wide, and it gives individual data stewards a way to easily upload, view and maintain their own datasets.

My main concern is that this invaluable but underappreciated infrastructure will be lost when IT systems are de-funded. The individual roles and responsibilities, the informal norms and pressures, and perhaps even the tacit authority to put new datasets online would likely also disappear. The loss of structure would probably mean that sharply reduced amounts of data will be put online in the future. The datasets that do get published in an ad hoc way would likely lack the uniformity and quality that the current process creates.

Releasing a new dataset online is already a difficult task for many agencies. While the current standards and processes may be far from perfect, Data.gov provides agencies with a firm footing on which they can base their transparency efforts. I don’t know how much funding is necessary to maintain these critical back-end processes, but whatever Congress decides, it should budget sufficient funds—and direct that they be used—to preserve these critically important tools.

Comments

  1. ESV says:

    This is scare tactic, straw man cost cutting. Federal government services that people feel strongly about, regardless of their actual budgetary “mass”, are likely to be cut precisely because the threat will raise the impassioned cries of many who feel strongly about their favorite gov’t function. E.g., national parks, museums, public broadcasting, and data.gov.

    The data.gov budget is a meaningless drop in the 3.8 trillion dollar bucket, but it will be cut long before any substantial budget cuts happen. E.g., medicare, medicaid, social security, overseas military deployments. It’s my opinion that these insubstantial cuts will be proposed so that politicians can reasonably claim that they tried to make cuts, but faced such overwhelming resistance from their constituents that they dare not cut deeper.

  2. JD says:

    It’s more than that. Before you steal the Mona Lisa, you cut off the feed to the security cameras and alarms, right? So first you reduce peoples’ ability to see what’s going on in their government, then you go after bigger things.

  3. Daniel Schuman says:

    Data.gov does allow for hosting of some data for agencies that don’t publish the data on their websites. By and large, most agencies do publish data, but there is definitely a subset that only go through services available through the data.gov project.

    • sjs says:

      In addition to the practical hosting that data.gov provides, my impression is that it also serves as a motivation for publication of government data. If people at agencies can say, “there is an executive mandate to publish data via data.gov,” they are more likely to think about that mandate as part of their everyday practices… whether or not they also publish it on their own agency sites.

      If instead the perception is that basic data publishing is not valued, they are less likely to do it… on data.gov or their own sites.

  4. Kevin Curry says:

    Harlan,

    The clear point I read from you here and in other posts is that the Web sites are only the object of our near-sighted attention. Meanwhile governance, standards, processes, practices, training and the like are what matter most (to us proponents) of open government data. No one thinks about these things, which cost money to develop, when all focus is on glitzy Web sites that may or may not cater to niche audiences.

    So, I ‘m wondering: what do the RFPs and contracts say? What are the stated requirements? I’ve not seen any references to how the money is being used, just assumptions that all of it goes into Web pages. As long as people read $34M as a few Web sites the case is not just weak but based on an inappropriate model. Measuring the value of the $34M that goes into open government data at the federal level based on metrics like page hits is like comparing apples to rocks. Data.gov is a poster and only one channel of access to the real object of our investment: an infrastructure for opening government. data.

    I know I’m preaching to the choir here. While there is focus only on what people can see and virtually touch in the form of Web sites, the enabling infrastructure will remain invisible and therefore at risk of being ignored in strategic planning, i.e., de-funded.

    I agree that if government doesn’t curate data then private businesses will. Naturally, private businesses will create their own vocabularies, data structures, and methods for handling open government data. In such cases government fits well into the role of convener and arbitrator and can facilitate development of standards among differing parties.

    This is just one example of the infrastructure we get for $34M. Another example is all the data migration and transformation that is taking place to make government data more visible, accessible, and understandable on the Web. Here again, we seem to talk about the enabling infrastructure only in terms of what shows up on a Web site like Data.gov and who goes there to see it. Even if Data.gov goes away, however, the principles that enable Data.gov to exist are the same as those used to publish data on the Web, generally. These principles don’t go away. Only the financial support goes away. That is a detrimental step backward. The way we are approaching this situation, people think we are killing Data.gov, a Web site. In effect we are not only neglecting the infrastructure, we are undermining and undoing it.

    Bottom line message to gov: Don’t cut the funding. Take a look at how it’s being used. Spend less on Web sites and features. Spend more on infrastructure.

    Best,
    Kevin

  5. Cheap Tory Burch Flats sale says:

    I really enjoy reading this blog as well.