November 25, 2024

Basic Data Format Lessons

[This is the second post in a series on best practices for government datasets by Harlan Yu and me. (previous post)]

When creating a dataset, the preferences of developers may not be obvious to those producing the dataset. Seemingly innocuous choices by data providers can lead to major headaches for developers. In this post, we discuss some of the more basic challenges that developers encounter when working with a dataset. These lessons may seem trivial to our more technical readers, but they’re often learned through experience. Our hope is to reduce this learning curve by explaining how various practices affect developers. We’ll focus on XML datasets, but many of the topics apply to CSV and other data formats.

One of the hardest parts of working with a dataset can be figuring out what’s in it and how it’s organized. What data comes inside an “<FL47>” tag? Can a “<TEXT>” element ever contain a “<PARAGRAPH>” element? Developers rely heavily on documentation to explain the structure and contents of a dataset. When working with XML, one particularly relevant item is known as a schema. An XML schema is a separate file with an extension such as “.dtd” or “.xsd,” and it provides a blueprint of the permitted structure for corresponding XML files. XML schema files tell developers where they can recover the information that they need from a dataset. These schema files and other documentation are often a necessity for developers, and they should be treated as such by data providers. Any XML file supplied by an agency should contain a complete URL address at which its schema can be found. Further, any link to an XML document on a government site should have prominent links near it for the corresponding schema file and reasonable documentation describing the contents of the dataset.

XML schema files can be seen as an informal contract between data providers and developers, effectively promising that a dataset will match the specified structure. Unfortunately, sometimes datasets contain flaws causing them not to match that structure. Although experienced developers produce software that detects the existence of structural errors, these errors can be difficult or impossible for them to isolate and correct. The people in the best position to catch and fix structural errors are the people producing a dataset. Numerous validation tools exist for ensuring that an XML document is well-formed and valid—that is, the document is structurally sound and matches its XML schema. Prior to releasing a dataset, an agency should run a validator on it to check for structural flaws. This sanity check can take just a few moments for an agency but save hours of developer time.

When deciding on the structure of a dataset, an agency should strive for simplicity while logically representing the underlying data. The addition of elements, attributes, or children in a schema can improve the quality and clarity of the dataset, but it can also add unnecessary complexity. When designing schemas, there’s a tendency to include elements or other structure that will almost certainly go unused in practice. Schema designers may assume that extraneous items do no harm, but developers must cautiously account for them if allowed by schema. The result can be wasted developer time and increased software complexity. The true cost of various structural choices is not just the time necessary to encode these choices in a schema but also the burden these choices impose on developers. Additional structural complexity must provide a justifiable benefit.

In some cases, however, the addition of elements or attributes is not only justifiable but highly desirable for developers: logically distinct pieces of data should appear in separate XML elements or attributes. Suppose that a developer wishes to access a piece of data in a dataset. If the data is combined with other information, the developer will need to figure out how to extract it from the combined field. This extraction can be difficult, time-consuming, and prone to errors. For example, assume that a data provider includes the following element:

<DOCINFO>Doc No. 2001345--Released 01-01-2001</DOCINFO>

To extract the document number, a developer might look for all characters following “No.” but before a dash. While this is straightforward enough, other parts of the same or future datasets might instead use the document number format “2001-345” or separate the document number and release date with a space rather than a double-dash. Neither case would lead to invalid XML, but both would break the developer’s extraction tool. Now consider this alternative:

<DOCINFO>
  <DOC_NO>2001345</DOC_NO>
  <RELEASE_DATE>01-01-2001</RELEASE_DATE>
</DOCINFO>

Using extra elements to separate logically distinct data can prevent extraction errors. This lesson often applies even when the combined data is related. For example, the version number 5.3.2 could be broken into major version 5, minor version 3, and revision 2. In general, agencies should separate such items themselves when they can do so more easily than developers.

Even when the basic structure of a dataset is ideal, choices about how to provide data inside this structure can affect developers. Developers thrive on consistency. Suppose that a dataset details various costs. Consider all possible ways of writing cost: $4,300, 5938.37, 74 dollars and 63 cents, etc. Unless an agency decides on, documents, and adheres to a standard format, developers’ software must handle a large number of possibilities to avoid unexpected surprises. Consistency in a dataset can make a developer’s life far easier, and it reduces the possibility that surprises will break an application. Note that a schema can be helpful for enforcing consistency for certain fields—for example, cost might be defined as a decimal field with a constraint on the number of fractional digits.

Redundant information is another source of difficulty for developers. Redundancy can appear in numerous ways. Suppose that a dataset contains the element “<VERSION>Version 5</VERSION>.” The word “Version” is unnecessary, and developers must go through additional trouble to extract the version number. In so doing, developers must consider the possibility that “Version” could be misspelled, abbreviated, or omitted. Supplying a version number alone (“<VERSION>5</VERSION>”) would avoid this issue altogether. More subtly, suppose that a dataset contains all bills introduced in Congress on a certain date:

<INTRODUCED_BILLS>
  <DATE>11-12-2014</DATE>
  <HOUSE_BILLS DATE="NOV 12, 2014">
    [...]
  </HOUSE_BILLS>
  <SENATE_BILLS DATE="NOV 12, 2014">
    [...]
  </SENATE_BILLS>
</INTRODUCED_BILLS>

Date information appears three times even though it must be the same in all cases. The more often a piece of information appears in a dataset, the more likely that inconsistencies will occur. These inconsistencies can lead to software errors requiring manual resolution. While redundancy can serve as a sanity check for errors, agencies typically should perform this check themselves if possible before releasing the data. After all, the agency is in the best position to fix inconsistencies. Unless well-justified, agencies should avoid redundancy.

Processing datasets often requires a significant amount of developer time, so adherence to even basic rules can dramatically increase innovation. What other low-level recommendations do FTT readers have for non-developers producing datasets?

Tomorrow, we’ll discuss how labeling elements in a dataset can help developers.

Government Datasets That Facilitate Innovation

[This is the first post in a series on best practices for government datasets by Harlan Yu and me.]

There’s a growing consensus that the government can increase its openness and transparency by publishing its raw data in bulk online. As several Freedom to Tinker contributors argued in Government Data and the Invisible Hand, publishing data empowers third party software developers to produce innovative new technologies that engage citizens and illuminate government’s inner workings. With the establishment of Data.gov and the federal Open Government Initiative, federal agencies are quickly embracing a culture of machine-readable data release, and many states and municipalities are now following their lead.

But how usable are these datasets for developers? The answer lies primarily in the structure and contents of the datasets themselves. While all data in digital form is technically machine-readable in some sense, the ease of use for machine-readable datasets can vary widely. In fact, machine-readability is just a baseline requirement: a developer can’t start to work with a dataset until it’s in this form. Once that minimum standard is met, the critical factor is how easy it is for developers to use the dataset in new, innovative ways.

In this series of posts, we’ll draw on our experience building applications that use government data to offer some thoughts about best practices government could follow in releasing data. By taking a few straightforward steps in preparing its datasets, government can make the data much more useful to developers.

One key factor in determining ease of use for developers is the structure of the dataset, and that is the topic of our first post. Let’s start with a trivial example:

<BOOK>A Tale of Two Cities by Charles Dickens. Chapter 1. The Period. It was the best of times, it was the worst of times [...] The end.</BOOK>

This is a “well-formed” XML version of Dicken’s “A Tale of Two Cities” in its entirety. Though more usable than a PDF copy of the book, the XML document lacks basic structure and is not particularly helpful to a developer building tools to display or analyze the book. Compare that to:

<BOOK>
  <HEADER>
    <TITLE>A Tale of Two Cities</TITLE>
    <AUTHOR>Charles Dickens</AUTHOR>
  </HEADER>
  <BODY>
    <CHAPTER NUMBER="1">
      <TITLE>The Period</TITLE>
      <PARAGRAPH NUMBER="1">
        <SENTENCE NUMBER="1">It was the best of times [...]</SENTENCE>
      </PARAGRAPH>
      [...]
    </CHAPTER>
    [...]
  </BODY>
</BOOK>

This data is far more structured, and a developer can take it and immediately do lots of new things. If the developer plans to build an interface for a new e-book reader for instance, it’s easy to extract the component parts of the book for appropriate formatting. With the less-structured version, the developer needs to guess where chapters, titles, and paragraphs begin and end. Because manual analysis is infeasible for large, complex datasets, developers who have only minimally-structured data will need to build automated processing scripts to make these guesses. Developing these scripts can be difficult and time-consuming, and data quality will suffer because the scripts will inevitably make mistakes.

Whether a dataset facilitates innovative uses by developers is not a yes or no question but a matter of degree, and it depends largely on the quality of the data’s structure and the needs of specific developers. In deciding what structure to add, agencies should consider who is in the best position to add various types of structure to the data. Sometimes, the agency is in the best position. Employees of an agency may amass specialized knowledge about the data, or the agency may already internally store the data with structural details like explicit database columns. In these cases, the agency can provide this structure with little effort, relieving developers from the potentially Herculean task of reconstructing these details. In other cases, the agency may have no significant advantage over private parties.

Agencies should get as close to this dividing line as is reasonably possible to broaden the range of creative possibilities for application developers. The goal is to minimize structural obstacles that might prevent developers from tinkering with the data. Better structure leads to more innovative tools, a more transparent government, and a greater appreciation for the work done by federal agencies.

Over our next several posts, we’ll discuss choices that agencies make when releasing datasets and the ways these choices affect developers. Among other things, we’ll explore basic data format lessons, data labeling, and correction/modification of datasets. Our goal is to turn this series into a best practices white paper for government use, and we’d appreciate any comments, suggestions, or insights from readers.

Web Certification Fail: Bad Assumptions Lead to Bad Technology

It should be abundantly clear, from two recent posts here, that the current model for certifying the identity of web sites is deeply flawed. When you connect to a web site, and your browser displays an https URL and a happy lock or key icon indicating a secure connection, the odds that you’re connecting to an impostor site, despite your browser’s best efforts, are uncomfortably high.

How did this happen? The last two posts unpacked some of the detailed problems with the current system. Today I want to explore the root cause: today’s system is based on wildly unrealistic assumptions about organizations and trust.

The theory behind the system is simple. Browser vendors will identify a set of Certificate Authorities (CAs) who are trusted to certify identities. Browsers will automatically accept any identity certificate issued by any of the trusted CAs.

The first step in making this system work is identifying some CA who is trusted by everybody in the world.

If that last sentence didn’t strike you as odd, go back and read it again. That’s right, the system assumes that there is some party who is trusted by everyone in the world — a spectacularly naive assumption.

Network engineers like to joke about the “evil bit”, a hypothetical label put on each network packet, indicating whether the packet is evil. (See RFC 3514, Steve Bellovin’s classic parody standards document codifying the evil bit. I’ve always loved that the official Internet standards series accepts parody standards.) Well, the “trusted bit” for certificate authorities is pretty much as the same as the evil bit, only applied to organizations rather than network packets. Yet somehow we ended up with a design that relies on this “trusted bit”.

The reason, in part, is unclear thinking about institutional trust, abetted by the unclear language we often use in discussing trust online. For example, we tend to conflate two meanings of the word “trusted”. The first meaning of “trusted”, which is the everyday meaning, implies a judgment that a party is unlikely to misbehave. The second meaning of “trusted”, more common in military security settings, is a factual statement that someone is vulnerable to misbehavior by another. In an ideal world, we would make sure that someone was trusted in the first sense before they became trusted in the second sense, that is, we would make sure that someone was unlikely to misbehave before we we made ourselves vulnerable to their misbehavior. This isn’t easy to do — and we will forget entirely to do it if we confuse the two meanings of trusted.

The second linguistic problem is to use the passive-voice construction “A is trusted to do X” rather than the active form “B trusts A to do X.” The first form is problematic because it doesn’t say who is doing the trusting. Consider these two statements: (A) “CNNIC is a trusted certificate authority.” (B) “Everyone trusts CNNIC to be a certificate authority.” The first statement might sound plausible, but the second is obviously false.

If you try to explain to yourself why the existing web certification system is sound, while avoiding the two errors above (confusing two senses of “trusted”, and failing to say who is doing the trusting), you’ll see pretty quickly that the argument for the current system is tenuous at best. You’ll see, too, that we can’t fix the system by using different cryptography — what we need are new institutional arrangements.

Google Buzzkill

The launch of Google Buzz, the new social networking service tied to GMail, was a fiasco to say the least. Its default settings exposed people’s e-mail contacts in frightening ways with serious privacy and human rights implications. Evgeny Morozov, who specializes in analyzing how authoritarian regimes use the Internet, put it bluntly last Friday in a blog post: “If I were working for the Iranian or the Chinese government, I would immediately dispatch my Internet geek squads to check on Google Buzz accounts for political activists and see if they have any connections that were previously unknown to the government.”

According to the BBC, the Buzz development team bypassed Google’s standard trial and testing procedures in order to launch the product quickly. Apparently, the company only tested it internally with Google employees and failed to test the product with a more diverse range of users who are more likely to have brought up the issues which were so glaringly obvious after launch. Google has apologized and moved to correct the most eggregious privacy flaws, though problems – including security issues – continue to be raised. PC World has a good overview of Buzz’s evolution since launch.

Meanwhile, damage has been done not only to Google’s reputation but also to an unknown number of users who found themselves and their contacts exposed in ways they did not choose or want. Exposing vulnerable users without their knowledge or choice even for a few hours can potentially have irreversible consequences. While Google is scoring some points around the tech policy world for reacting quickly and earnestly to the strident user outcry, the Electronic Information Privacy Center (EPIC) has filed an official complaint with the FTC, and that Canada’s Privacy Commissioner has expressed disappointment and asked Google to explain itself. (UPDATE: A class complaint has been filed in San Jose, claiming that Google broke the law by sharing personal data without users’ consent.)

Earlier this week I asked people in my Twitter network how they’re feeling about Buzz after the fixes they’ve made. Some are now reassured but others aren’t. Joe Hall wrote:

@rmack totally lost me for good.. I just can’t believe that they won’t do it again. It will have to be very useful/different to get me back

Some are leaving GMail altogether. Judson Dunn reported:

@rmack my boyfriend deleted his long time gmail account for good 🙁

I was so concerned about exposing people in my GMail network during the first week after launch that I stayed off Buzz entirely until Monday afternoon. By then I felt that the worst privacy problems had been fixed, and I understood well enough how to tweak the settings that I could at least go in without doing harm to others. After playing with it a bit and poking around I posted some initial observations and invited the people in my network to respond. There are still plenty of issues – some people who claimed in Twitter that they had turned off Buzz are still there, and I think Buzz should make it easier for people to use pseudonyms or nicknames not tied to their email address if they prefer.  From Beijing, Jeremy Goldkorn of the influential media blog Danwei responded: “I like the way Buzz works now, and it seems to me that the privacy concerns have been addressed.”

I’ve noticed that some Chinese Buzz users have been using it to post and discuss material that has been censored by Chinese blog-hosting platforms and social networking sites. If Buzz becomes useful as a way to preserve and spread censored information around quickly, it seems to me that’s a plus as long as people aren’t being exposed in ways they don’t want. My friend Isaac Mao wrote:

It’s more important to Chinese to make information flowing rather than privacy concern this moment. With more hibernating animals in cave, we can’t tell too much on the risks about identity, but more on how to wake up them.

Buzz has unleashed some potentials on sharing which just follows my Sharism theory, people actually have much more stuff to share before they realize them.

But I agree with any conerns on privacy, including the risks that authority may trace publishers in China. It’s very much possible to be targeted once they were notified how profound the new tool is.

The “Great Firewall” is already at work on Buzz, at least in Beijing. While most people seem to be able to access Buzz through GMail on Chinese Internet connections, numerous people report from Beijing that at least some Google profiles – including mine and Isaac’s – are blocked, though people in Shanghai and Guangzhou say they’re not blocked. Others in China report having trouble posting comments to Buzz, though it’s unclear whether this is a technical issue with Buzz or a Chinese network blocking issue, or some strange combination of the two.

It will be interesting to see how things evolve, and whether activists in various countries find Buzz to be a useful alternative to Facebook and other platforms – or not. Whatever happens, I do think that Google fully deserves the negative press it has gotten and continues to get for the thoughtless way in which Buzz was rolled out. There are  senior people at Google whose job it is to focus on free expression issues, and others who work full time on privacy issues. Either the Buzz development team completely failed to consult with these people or were allowed to ignore them. I am inclined to believe the former instead of the latter, based on my interactions with the company through the Global Network Initiative and Google’s support for Global Voices. Call me biased or sympathetic if you want, but I don’t think that the company made a conscious dec
ision to ignore the risks it was creatin
g for human rights activists or people with abusive spouses – or anybody else with privacy concerns. However, if we do give Google the benefit of the doubt, then the only logical conclusion is that in this case, something about the company’s management and internal communications was so broken that the company was unable to prevent a new product from unintentionally doing evil. Nick Summers at Newsweek thinks the problem is broader:

Google is so convinced of the righteousness of its mission statement that it launches products heedlessly. Take Google Books—the company was so in thrall with its plan to make all hardbound knowledge searchable that it did not anticipate a $125 million legal challenge from publishers. With Google Wave, engineers got high on their own talk that they had invented a means of communication superior to e-mail—until Wave launched and users laughed at its baffling un-usability. Last week, with Buzz, Google seemed so bewitched by the possibilities of a Google-y take on social networking that it went live without thinking through the privacy implications.

Whatever the case may be in terms of Google’s internal thinking or intentions, we have a right to be concerned. Too many of us depend on Google for too many things. As I’ve written before, I believe Google has a responsibility to netizens around the world to develop more effective mechanisms to ensure that the concerns, interests, and rights of the world’s netizens are adequately incorporated into the development process.

I’d very much like to hear your ideas for how netizens’ concerns around the world – particularly from at-risk and marginalized communities who have the most to lose when Google gets things wrong – might be channeled to Google’s development teams and product managers. Rather than wait for Google to figure this out, are there mechanisms that we as netizens might be able to build?  Are there things we can proactively do to help companies like Google avoid doing evil? Can we help them to avoid hurting us – and also help them to maximize the amount of good they can do?

(Cross-posted from RConversation)

The Engine of Job Growth? Tracking SBA-backed Loans Through Recovery.gov

Last week at a Town Hall Meeting in New Hampshire, President Obama stated that “we’re going to start where most new jobs start—with small businesses,” and he encouraged Congress to transfer $30 billion from the Troubled Asset Relief Program to a new program called the Small Business Lending Fund. As this proposal was unveiled, the Administrator of the U.S. Small Business Administration (SBA) Karen Mills sat directly behind the President, reflecting the fact that the Administration’s proposal is a vote of confidence in the SBA and its existing loan programs.

The central role proposed for the SBA invites questions about existing SBA loans made with Recovery Act funds. These loans can be tracked through Recovery.gov, the official “user-friendly, public-facing website” that has evolved under the direction of the Recovery Accountability and Transparency Board, an agency created when the President signed into law the American Recovery and Reinvestment Act of 2009 (ARRA) on February 17, 2009.

Curious about how well Recovery.gov works, I analyzed a stimulus loan to a business in Red Lodge, Montana, where I live. First I accessed “Agency Reported” data through Recovery.gov, and then compared that information with what I could learn from field visits with the loan recipient and the community banker who made the loan.

What the drill-down map at Recovery.gov tells you: According to the map available at the official website, a local business called “Sheep Mountain Feed” received an $81,000 loan through the Small Business Administration’s (SBA) “Rural Lender Advantage.”

What the drill-down map at Recovery.gov doesn’t tell you: The official website does not specify how the loan proceeds were spent. Nor does the website explain if the $81,000 is the face value of the loan or the amount guaranteed by the SBA. For that matter, SBA’s role in making the loan is not clarified.

To learn more about these things, I called Sheep Mountain Feed and arranged a visit with the owner, a woman named Deb Padget who, before opening the store, had ranched 2,000 head of bison. I also met with the local banker who arranged the loan (the SBA relies on lenders to make the loans it guarantees), and an SBA employee based in Helena Montana. And for background I reviewed the June 8, 2009 Federal Register Notice relating to SBA’s temporary 90% guarantee (thanks to Princeton’s Fed Thread project).

Sheep Mountain Feed is a retail store catering to animal farmers and pet owners that sells animal feed, electric fencing, baby chicks, and other odds and ends such as buckets and horseshoes sold at any rural animal store. When Deb decided to buy the business in April of 2009, she had managed the retail store for three years, and she wanted to make some changes. Without abandoning the “large-animal” owners who had built the feed business, she saw an opportunity to focus more on pet owners. “Everybody in Red Lodge has a dog,” she told me. “Not everybody has a horse.”

She would need to buy pet supplies to take things in this new direction, and she would also need money to buy the business and remodel the interior of the store. This is how she spent the loan proceeds that she eventually received—buying and remodeling Sheep Mountain Feed, and purchasing inventory. However, the first bank she visited rejected her within ten minutes. At the second bank she tried out, she met with local loan officer and learned quickly that he was also from a North Dakota farming family. Here she got a warmer welcome, and was told that her timing was good: In March 2009, about one month before Deb’s visit, the SBA received $730 million in funding from the ARRA to offer increased loan guarantees and the temporary elimination of loan fees.

To get this “stimulus loan” Deb would need to submit a business plan with her loan application, but she’d never before needed a business plan and didn’t even have an executive summary. She was sent to an SBA employee in Billings for free counseling, and this employee helped Deb to prepare a business plan from scratch. (At one point, in order to develop Deb’s financial projections, the SBA contact called her own dog-groomer to find out about the going-rate for grooming sessions in Billings).

The U.S. Small Business Administration (SBA) was created in 1953 as an independent agency of the federal government to help people start and grow businesses. Even without the stimulus money, SBA’s so-called 7(a) loan program guarantees up to 85% of a qualifying loan made to a local business through a local bank. The guarantee is designed to induce local banks to lend more into the community by removing most of the risk of default. And as previously mentioned, in early 2009 the SBA received Recovery money to guarantee up to 90% of 7(a) loans. This is the kind of loan that Deb received.

In addition to subsidizing SBA’s temporary 90 percent guarantee, the Recovery Act also allowed SBA to temporarily waive certain fees that it charges. Usually the agency collects fees equal to three percent of the loan’s face value to cover delinquencies. Lenders and borrowers pay these fees. In this case, the community bank that made the loan and Deb would have had to pay $2,790 just to close the deal. We know this because the breakdown of the loan to Sheep Mountain Feed at USASpending.gov shows an “original subsidy cost” of $2,790. By studying the data at USASpending, and interviewing offline sources, it also emerged that $81,000 is the amount guaranteed by the SBA (Sheep Mountain Feed got $90,000).

The takeaway from this study is that Recovery.gov provides good data, but not always enough context (e.g. an explanation of SBA’s role) to understand the data. Yet in the absence of Recovery.gov, even learning that Sheep Mountain Feed received a government-backed loan would have been difficult, so the official website is a helpful starting point for people motivated to track stimulus money.

By disseminating information about a Montana-based loan to citizens in every state, including citizens not predisposed to support any specific local project, Recovery.gov provides the public with information about what the government is doing and invites feedback. How the government processes this feedback—and in general takes advantage of the insight of people inside and outside the Federal government—is an open question, but at least the Recovery Board is on it, and now it’s also the focus of a working group (pursuant to OMB’s December 8, 2009 Open Government Directive).

In that spirit, here are a few suggestions for making Recovery.gov more useful to people trying to track SBA-backed stimulus loans.

(1) Create web links to the SBA website where the agency explains how the standard and stimulus-enriched 7(a) loan program works (SBA itself does not make loans, but instead guarantees a portion of loans made and administered by banks);

(2) Create links to the Small Business Act (15 U.S.C. § 636, as amended), the relevant provisions of the American Recovery and Reinvestment Act of 2009 affecting the SBA, (ARRA, P. L. 111-5, §§501-502), and the provisions of the Department of Defense Appropriations Act, 2010 that extend the stimulus-enriched SBA program through the end of February 2010;

(3) Establish links from Recovery.gov to USASpending.gov, particularly targeted links showing the source of the stimulus loan information. Recovery.gov does explain that “Agency Reported Data” comes from three sources, including USAspending.gov, but there are no links from stimulus projects to USASpending.

This project was more about Recovery.gov than the SBA, but listening to President Obama urge the creation of a Small Business Lending Fund because it “will help small banks do even more of what our economy needs – and that’s ensure that small businesses are once again the engine of job growth in America,” there was the obvious question about the $90,000 loan to Sheep Mountain Feed: Would it create or retain any jobs? I put this question to Deb. She said that the loan “created” one full-time job, her job running the business. She’s also employing a dog-groomer part-time, and another part-time employee (a student) who works on weekends. Getting these facts is easier than knowing if the full $90,000 loan to Sheep Mountain Feed should be credited to the Recovery Act. Would the business have received the loan anyway, even without SBA’s extra 5% guarantee and the temporary elimination of $2,790.00 in fees? The only sure thing is that estimating the employment impact of the Recovery Act is complicated (it was the subject of a recent OMB Guidance Memorandum). That’s something everybody can agree on.