April 25, 2014


Government Data and the Invisible Hand

David Robinson, Harlan Yu, Bill Zeller, and I have a new paper about how to use infotech to make government more transparent. We make specific suggestions, some of them counter-intuitive, about how to make this happen. The final version of our paper will appear in the Fall issue of the Yale Journal of Law and Technology. The best way to summarize it is to quote the introduction:

If the next Presidential administration really wants to embrace the potential of Internet-enabled government transparency, it should follow a counter-intuitive but ultimately compelling strategy: reduce the federal role in presenting important government information to citizens. Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use. We argue that this understanding is a mistake. It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

In the current Presidential cycle, all three candidates have indicated that they think the federal government could make better use of the Internet. Barack Obama’s platform explicitly endorses “making government data available online in universally accessible formats.” Hillary Clinton, meanwhile, remarked that she wants to see much more government information online. John McCain, although expressing excitement about the Internet, has allowed that he would like to delegate the issue, possible to a vice-president.

But the situation to which these candidates are responding – the wide gap between the exciting uses of Internet technology by private parties, on the one hand, and the government’s lagging technical infrastructure on the other – is not new. The federal government has shown itself consistently unable to keep pace with the fast-evolving power of the Internet.

In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that “exposes” the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.

Our approach follows the engineering principle of separating data from interaction, which is commonly used in constructing websites. Government must provide data, but we argue that websites that provide interactive access for the public can best be built by private parties. This approach is especially important given recent advances in interaction, which go far beyond merely offering data for viewing, to offer services such as advanced search, automated content analysis, cross-indexing with other data sources, and data visualization tools. These tools are promising but it is far from obvious how best to combine them to maximize the public value of government data. Given this uncertainty, the best policy is not to hope government will choose the one best way, but to rely on private parties with their vibrant marketplace of engineering ideas to discover what works.

To read more, see our preprint on SSRN.


  1. Michael Donnelly says:

    This will be great when implemented behind a contract saying that the content cannot be copied, used to create derivative works (including adding styles and links), displayed in a public forum, etc.

    All seriousness aside, it’s got my vote. There are more than enough interested eyes to handle the icky part of dissemination to the masses… and they’ll do a better job.

  2. matt z says:

    Too bad the link requires registration to download. Can you just post a direct link to the PDF?

  3. Ed Felten says:


    You can download the paper without registering. Look for the “Download Anonymously” tab.

  4. Adrian McCarthy says:

    The government seems to be caught up in the same trends that the software industry in general has been following. Once upon a time, we wrote programs that were general tools. More and more, we build for particular “user scenarios”. As a result, we have applications with greatly improved usability for particular tasks but are much less flexible if you want to do something unanticipated with the data.

  5. matt z says:

    Didn’t see the “download anonomously” tab, but found a preprint at http://www.yjolt.org/files/robinson-11-YJOLT-draft.pdf

  6. HaeB says:

    Hans Rosling (whose “Gapminder” software is surely among the finest examples of such data visualization tools by private parties) made a similar point with respect to publicly funded statistical data from the UN and national agencies, in this now legendary talk:


    (starting from about 15:00, note the nice metaphor of plants growing out of the raw data once barriers like clumsy web interfaces are removed and the sunlight of the public can shine)

  7. anon says:

    Look for the “Download Anonymously” tab.


    What “Download Anonymously” tab?

    Searching the source of http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1138083 finds no strings containing “anon”.

    I’m using NoScript v 1.6.8, Firefox (w/ spoofed UA) on Linux/X11, and browsing via Tor.

    I’d like to read your paper.

  8. Ed Felten says:


    You have to click a few times, as is to download, before the tab appears. The easiest thing is probably to follow the advice of Matt Z’s comment, above.

  9. Mike W. says:

    One major concern raised by the paper is of how can the proposed shift
    of public service to the private sector be accompanied by the required
    neutrality of access to the government for all of the citizens and

    A potential outcome and risk is that *not* all interests will be
    treated equally. This is especially to be expected in the environment
    where the primary economic revenue model of the commercial Internet
    services is based on targeted advertising, as it is now. Communities
    not belonging to a profitable target group are likely to be
    marginalized, as not enough profit can be extracted from them.

    Today, on-line businesses often rely on the ability to profile users
    based on their on-line footprint. With the continuing advances in data
    mining and network monitoring, further details will become accessible
    in the profiles of on-line users, letting the service operators to
    direct their economic interests to targeted audience groups
    potentially differentiated by race, gender, economic status, or
    medical conditions.

    In this context, is it reasonable to count on the private industry to
    provide neutral public service ? For example, is it reasonable to
    assume that the industry will provide equal service to an unemployed
    single mother attempting to find government unemployment policies, as
    it will serve a middle class citizen researching passport laws ? It is
    a significant concern, that left to the choice of the industry, the
    quality of service, and in turn the quality of access to the
    government, of the unemployed person will be significantly less, and
    potentially lead to further isolation of people in need of the
    government’s assistance.

    Furthermore this isolation may be manifested in non-trivially
    detectable ways. For example, access to a particular government data
    may be denied to a citizen simply because providing it is not an
    economically profitable proposition, this may be easy to
    detected. However, a more difficult case to detect or quantify, may be
    of presentation bias. Even if all of the resources are available to
    all, it does not imply that their access is not manipulated in a way
    that is aligned with extracting maximum profit and in conflict with
    neutrality of providing equal service to all. For example, search
    results can be ranked alphabetically, perhaps more useful they can be
    ordered according to relevance, but alternatively they can be ranked
    according to profitability.

    Introducing new regulations for the process by which the private
    sector proxies for the government’s responsibility may be very
    difficult to define, even more difficult to enforce. Further it may be
    counter productive as it could restrict the creative forces of the
    market that the authors of the above paper wish to enable.

    Relaying on NGO institutions to pickup where the private sector fails
    to provide is only avoiding the problem. NGO organizations, although
    also suspect to targeted agendas and interests, tend to help
    tremendously in the areas where the government fails. However, this
    should not mean that the government should further design policies
    that force NGO action to compensate its failures.

    The Internet provides a new and unique opportunity for the government
    to reorient its relation to the citizens, to reach a new social
    contract, and not to mimic its failures from the past. Analogies to
    the physical world, showing that people already accept the role of the
    private sector as a membrane surrounding their government should not
    be taken only as a justifications or a model for the new IT
    policies. It should also be a reminder of how willing are the people
    to accept and forget the socially unjust status quo, and it should
    amplify our responsibility to avoid the easy temptation to mirror the
    policies of the past in the new environment.

    –Mike W.

  10. paul says:

    I’m with Mike W. Done right, this kind of thing would be wonderful, but if the data were captured by the usual information-industry suspects, the result would be really ugly. Get the right set of consultants to help agencies design their “transparent” data formats and access techniques, and Westlaw will look like a bunch of information-should-be-free hippies in comparison. (Does anyone else remember the attempt to zero out NOAA’s barebones forecast pages?)

    In addition, it’s not just the data formats, it’s the structures for finding the right data, and that’s something that I would hate to put in private hands. I’m thinking in particular of sites like the BLS, where there are hundreds or thousands of different data series, and “free” access to the numbers themselves would be utterly useless without the layers that help you figure out which data series you need.

  11. Spudz says:


  12. Silona says:


    http://www.slideshare.net/silona/social-networks-and-government-application/ A presentation I made in Dec 07 to the ec3.org about doing exactly that…

    it is CC so have fun with it and lets all keep spreading the word of open data formats!


  13. Spudz says:

    I’m thinking in particular of sites like the BLS