The next piece of proposed bailout legislation is called the American Recovery and Reinvestment Act of 2009. Chris Soghoian, who is covering the issue on his Surveillance State blog at CNET, brought the bill to my attention, particularly a provision requiring that a new web site called Recovery.gov “provide data on relevant economic, financial, grant, and contract information in user-friendly visual presentations to enhance public awareness of the use funds made available in this Act.” As a group of colleagues and I suggested last year in Government Data and the Invisible Hand, there’s an easy way to make rules like this one a great deal more effective.
Ultimately, we all want information about bailout spending to be available in the most user-friendly way to the broadest range of citizens. But is a government monopoly on “presentations” of the data the best way to achieve that goal? Probably not. If Congress orders the federal bureaucracy to provide a web site for end users, then we will all have to live with the one web site they cook up. Regular citizens would have more and better options for learning about the bailout if Congress told the executive branch to provide the relevant data in a structured machine-readable format such as XML, so many sites can be made to analyze the data. (A government site aimed at end users would also be fine. But we’re only apt to get machine-readable data if Congress makes it a requirement.)
Why does this matter? Because without the underlying data, anyone who wants to provide a useful new tool for analysis must first try to reconstruct the underlying numbers from the “user-friendly visual presentations” or “printable reports” that the government publishes. Imagine trying to convert a nice-looking graph back into a list of figures, or trying to turn a printed transcript of a congressional debate into a searchable database of who said what and when. It’s not easy.
Once the computer-readable data is out there—whether straightforwardly published by the government officials who have it in the first place, or painstakingly recreated by volunteers who don’t—we know that a small army of volunteers and nonprofits stands ready to create tools that regular citizens, even those with no technical background at all, will find useful. This group of volunteers is itself a small constituency, but the things they make, like Govtrack, Open Congress, and Washington Watch, are used by a much broader population of interested citizens. The federal government might decide to put together a system for making maps or graphs. But what about an interactive one like this? What about three-dimensional animated visualizations over time? What about an interface that’s specially designed for blind users, who still want to organize and analyze the data but may be unable to benefit as most of us can from visualizations? There might be an interface in Spanish, the second most common American language, but what about one in Tagalog, the sixth most common?
There’s a deep and important irony here: The best way for government data to reach the broadest possible population is probably to release it in a form that nobody wants to read. XML files are called “machine-readable” because they make sense to a computer, rather than to human eyes. Releasing the data that way—so a variety of “user-friendly presentations,” to match the variety of possible users, can emerge—is what will give regular citizens the greatest power to understand and react to the bailout. It would be a travesty to make government the only source for interaction with bailout data—the transparency equivalent of central planning. It would be better for everyone, and easier, to let a thousand mashups bloom.
XBRL, an XML markup language for Business Reporting, is the supplemental filing standard adopted by the SEC for businesses to report their results. See http://en.wikipedia.org/wiki/XBRL for a short description and additional links.
You can check out http://www.budget.gov.au/ to get some idea of what the result will look like. Nothing is machine readable, there is no clear breakdown of either revenue or expenses, no system of account numbers, a subtle mix of the incremental figures (i.e. the change from year to year) with the absolute figures (i.e. the real money that is spent) and spanning figures (future plans aggregated over several years), and each year has a completely different layout to each other year.
However, at least you can dig up the data with enough effort (some years are harder than others).
I would be much happier for every end-account in the entire system to be allocated a number out of a sensible structured tree of departmental accounts and sub-accounts, and then one table be published of all the transactions to and from those accounts (in CSV would be best, XML is tedious and bloated) and another table be published of the account numbers and their plain-English description (also in CSV). And add to that, an account number cannot change from year to year unless the entire department is abolished (maybe also make a rule that any dead account number may never be reused).
They might be listening.
json FTW
What??
I believe the poster is saying
JavaScript Object Notation For The Win
coming from a background familiar with biker terms I thought at first the poster wanted something to do with “The World” /me shrugs
see http://en.wikipedia.org/wiki/Json
for further elucidation on why the poster thinks this might be a more accessible format for data that really should be readily and easily accessible.
I myself am bamboozled by both formats, XML or Json
but am familiar enough with some popular Web Frameworks to understand that either would make this data much more accessible for a wider variety of uses and display purposes.
This general problem is one commonly facing GIS users. As an amateur geographer, I am constantly on the lookout for free geo-data. Often government websites provide online mapping interfaces which allow the user to display the data, but for more effective uses it is necessary to obtain the actual data sets. The US federal gov’t (in particular USGS, NOAA and Census) is pretty good at making the data sets reasonably available for download. The states and local governments are not nearly so good. In many cases this is because of business models that fund (at least partially) the relevant agency through the sale of data. This limitation also seems to be generally the case outside of the US.
In many cases US Gov’t financial data is available in Excel format, but the problem here is the reliability of the underlying financial/accounting systems. I’m not sure how much faith I would put into a site that purports to be able to track the financial doings of any particular government program. If you look at the annual financial report from the US Treasury FMS (and the accompanying GAO audit findings) you will see how GAO caveats their report.
The phrase “to provide the relevant data in a structured machine-readable format such as XML” should be further qualified with “non-proprietary” or something else that makes it clear that anyone is free to read the data.