Senators Obama, Coburn, McCain, and Carper have introduced the Strengthening Transparency and Accountability in Federal Spending Act of 2008 (S. 3077), which would modify their 2006 transparency act. That first bill created USASpending.gov, a searchable web site of government outlays. USASpending.gov—which was based on software developed by OMB Watch and the Sunlight Foundation—allows end users to search across a variety of criteria. It has begun offering an API, an interface that lets developers query the data and display the results on their own sites. This allows a kind of reuse, but differs significantly from the approach suggested in our recent “Invisible Hand” paper. We urge that all the data be published in open formats. An API delivers search results, but that makes the search interface itself very important: having to work through an interface sometimes limits developers from making innovative, unforeseen uses of the data.
The new bill would expand the scope of information available via USASpending.gov, adding information about federal contracts, leases, and audit disputes, among other areas. But it would also elevate the API itself to a matter of statutory mandate. I’m all in favor of mandates that make data available and reusable, but the wording here is already a prime example of why technical standards are often better left to expert regulatory bodies than etched in statute:
” (E) programmatically search and access all data in a serialized machine readable format (such as XML) via a web-services application programming interface”
A technical expert body would (I hope) recognize that there is added value in allowing the data itself to be published so that all of it can be accessed at once. This is significantly different from the site’s current attitude; addressing the list of top contractors by dollar volume, the site’s FAQ says it “does not allow the results of these tables to be downloaded in delimited or XML format because they are not standard search results.” I would argue that standardizers of search results, whomever they may be, should not be able to disallow any data from being downloaded. There doesn’t necessarily need to be a downloadable table of top contractors, but it should be possible for citizens to download all the data so that they can compose such a table themselves if they so desire. The API approach, if it substitutes for making all the data available for download, takes us away from the most vibrant possible ecosystem of data reuse, since whenever government web sites design an interface (whether it’s a regular web interface for end users, or a code-level interface for web developers), they import assumptions about how the data will be used.
All that said, it’s easy to make the data available for download, and a straightforward additional requirement that could be added to the bill. And in any cause we owe a debt of gratitude to Senators Coburn, Obama, McCain and Carper for their pioneering, successful efforts in this area.
==
Update, June 12: Amended the list of cosponsors to include Sens. Carper and (notably) McCain. With both major presidential candidates as cosponsors, the bill seems to reflect a political consensus. The original bill back in 2006 had 48 cosponsors and passed unanimously.
Another agency that publishes open-format data, which can be
massaged into any desired form with Perl, is the Census Bureau.
http://www.census.gov/geo/www/gazetteer/places2k.html
A good (though somewhat dated) example of this sort of transparency is the FCC. The have, for a long time now, published on an FTP site, a collection of files that contain exports from their database. It is possible (and even relatively easy) to take these data, scrub them up a little, and toss them into the database of your choice (I have historically used PostgreSQL for this, and do the scrubbing with some Perl scripts) to have your own private copy of the data with which to do what you wish.
Granted, an XML-formatted version would probably be superior, but this setup works, and works well.
I see a similar thing with geo data, where often it is possible to view the data in a map server (sometimes google maps or just released google earth API), but difficult or impossible to get to the actual data so you could download it for use in your own GIS software. In some cases the data have to be purchased.