May 3, 2024

Government Shouldn't "Help" Citizens Pick Tough Questions for Obama

A couple of weeks ago, Julian Sanchez at Ars Technica, Ben Smith at Politico and others noted a disturbing pattern on the incoming Obama administration’s Change.gov website: polite but pointed user-submitted questions about the Blagojevich scandal and other potentially uncomfortable topics were being flagged as “inappropriate” by other visitors to the site.

In less than a week, more than a million votes-for-particular-questions were cast. The transition team closed submissions and posted answers to the five most popular questions. The usefulness and interest of these answers was sharply limited: They reiterated some of the key talking points and platform language of Obama’s campaign without providing any new information. The transition site is now hosting a second round of this process.

It shouldn’t surprise us that there are, among the Presdient-elect’s many supporters, some who would rather protect their man from inconvenient questions. And for all the enthusiastic talk about wide-open debate, a crowdsourced system that lets anyone flag an item as inappropriate can give these few a perverse kind of veto over the discussion.

If the site’s operators recognize this kind of deliberative narrowing as a problem, there are ways to deal with it. One could require a consensus judgement of “inappropriateness” by a cross-section of Change.gov users that is large enough, or is diverse with respect to geography, time of visit, amount of past involvement in the site, or any number of other criteria before taking a question out of circulation. Questions that have been preliminarily flagged as inappropriate could enter a secondary moderation queue where their appropriateness can be debated, leading to a considered “up or down” vote on whether a given question belongs in the mix. The Obama transition team could even crowdsource this problem itself, looking for lay input (or input from experts at places like Digg) about how to make sure that reasonable-but-pointed questions stay in, while off topic, off color, or otherwise unacceptable ones remain out.

But what are the incentives of the new administration’s online team? They might well find it convenient, as Julian writes, to “crowdsource a dodge” to inconvenient questions–if the users of Change.gov adopt an expansive view of “inappropriateness,” the Obama team will likely benefit slightly from soft, supportive questions in the near term, though it will run the risk of allowing substantive problems, or citizen concerns, to fester over the longer term. And that tradeoff could hold much more appeal for the median administration staffer than it does for the median American.

In other words, having the administration’s own tech people manage the moderation of questions directed at the President may be like having the fox guard the henhouse. I agree that even this is much more open than recent past administrations, but I think the more interesting question here is what would be ideal.

I suspect this key plank of the new administration’s plans will never be able to be fully realized within government. The President needs to answer questions that a nonzero number of his most enthusiastic supporters are willing to characterize as “inappropriate.” And for that to happen, the online moderation needs to take place outside of .gov. A collective move toward one of the .org alternatives, for this key activity of sifting questions, would be a great first step. That way, the goal of finding tough but honest questions can plausibly sit paramount.

Election Transparency Project Finds Ballot-Counting Bug

Yesterday, Kim Zetter at Wired News reported an amazing e-voting story about lost ballots and the public advocates who found them.

Here’s a summary: Humboldt County, California has an innovative program to put on the Internet scanned images of all the optical-scan ballots cast in the county. In the online archive, citizens found 197 ballots that were not included in the official results of the November election. Investigation revealed that the ballots disappeared from the official count due to a programming error in central tabulation software supplied by Premier (formerly known as Diebold), the county’s e-voting vendor.

The details of the programming error are jaw-dropping. Here is Zetter’s deadpan description:

Premier explained that due to a programming problem, the first “deck” or batch of ballots that are counted by the GEMS software sometimes gets randomly deleted if any subsequent deck is intentionally deleted. The GEMS system names the first deck of ballots “deck 0”, with subsequent batches called “deck 1,” “deck 2,” etc. For some reason “deck 0” is sometimes erased from the system if any other deck is erased. Since it’s common for officials to intentionally erase a deck in the normal counting process if they’ve made an error and want to rescan a deck, the chance that a GEMS system containing this flaw will delete a batch of ballots is pretty high.

The system never provides any indication to election officials when it’s deleting a batch of ballots in this manner. The problem occurs with version 1.18.19 of the GEMS software, though it’s possible that other versions have the problem as well. [County election director Carolyn] Crnich said an official in the California secretary of state’s office told her the problem was still prevalent in version 1.18.22 of Premier’s software and wasn’t fixed until version 1.18.24.

Neither Premier nor the secretary of state’s office, which certifies voting systems for use in the state, has returned calls for comment about this.

After examining Humboldt’s database, Premier determined that the “deck 0” in Humboldt was deleted at some point in between processing decks 131 and 135, but so far Crnich has been unable to determine what caused the deletion. She said she did at one point abort deck 132, instead of deleting it, when she made a mistake with it, but that occurred before election day, and the “deck 0” batch of ballots was still in the system on November 23rd, after she’d aborted deck 132. She couldn’t recall deleting any other deck after election night or after the 23rd that might have caused “deck 0” to disappear in the manner Premier described.

The deletion of “deck 0” wasn’t the only problem with the GEMS system. As I mentioned previously, the audit log not only didn’t show that “deck 0” had been deleted, it never showed that the deck existed in the first place.

The system creates a “deck 0” for each ballot type that is scanned. This means, the system should have three “deck 0” entries in the log — one for vote-by-mail ballots, one for provisional ballots, and one for regular ballots cast at the precinct. Crnich found that the log did show a “deck 0” for provisional ballots and precinct-cast ballots but none for vote-by-mail ballots, even though the machine had printed a receipt at the time that an election worker had scanned the ballots into the machine. In fact, the regular audit log provides no record of any files that were deleted, including deck 132, which she intentionally deleted. She said she had to go back to a backup of the log, created before the election, to find any indication that “deck 0” had ever been created.

I don’t know which is more alarming: that the vendor failed to treat as an emergency a programming error that silently deletes ballots, or that the tabulator’s “audit log” looks more like an after the fact reconstruction of what-must-have-happened rather than a log of what actually did happen.

The good news here is that Humboldt County’s opening of election records to the public paid off, when members of the public found important facts in the records that officials and the vendor had missed. If other jurisdictions opened their records, how many more errors would we find and fix?

Could Too Much Transparency Lead to Sunburn?

On Tuesday, the Houston Chronicle published a story about the salaries of local government employees. Headlined “Understaffing costs Houston taxpayers $150 million in overtime,” it was in many respects a typical piece of local “enterprise” journalism, where reporters go out and dig up information that the public might not already be aware is newsworthy. The story highlighted short staffing in the police department, which has too few workers for all the protection it is required to provide the citizens of Houston.

The print story used summaries and cited a few outliers, like a police sergeant who earned $95,000 in overtime. But the reporters had much more data: using Texas’s strong Public Information Act, they obtained electronic payroll data on 81,000 local government employees—essentially the entire workforce. Rather than keep this larger data set to themselves, as they might have done in a pre-Internet era, they posted the whole thing online. The notes to the database say that the Chronicle obtained even more information than it displays, and that before republishing the data, the newspaper “lumped together” what it obliquely descibes as “wellness and termination pay” into each employee’s reported base salary.

In a related blog post, Chronicle staffer Matt Stiles writes:

The editors understand this might be controversial. But this information already is available to anyone who wants to see it. We’re only compiling it in a central location, and following a trend at other news organizations publishing databases. We hope readers will find the information interesting, and, even better, perhaps spot some anomalies we’ve missed.

The value proposition here seems plausible: Among the 81,000 payroll records that have just been published, there very probably are news stories of legitimate public interest, waiting to be uncovered. Moreover (given that the Chronicle, like everyone else in the news business, is losing staff) it’s likely that crowdsourcing the analysis of this data will uncover things the reporting staff would have missed.

But it also seems likely that this release of data, by making it overwhelmingly convenient to unearth the salary of any government worker in Houston, will have a raft of side effects—where by “side” I mean that they weren’t intended by the Chronicle. For example, it’s now easy as pie for any nonprofit that raises funds from public employees in Houston to get a sense of the income of their prospects. Comparing other known data, such as approximate home values or other visible spending patterns, with information about salary can allow inferences about other sources of income. In fact, you might argue that this method—researching and linking the home value for every real estate transaction related to a city worker, and combining this data with salary information—would be an extraordinary screening mechanism for possible corruption, since those who buy above what their salary would suggest they should be able to afford must have additional income, and corruption is presumably one major reason why (generally low-paid) government workers are sometimes able to live beyond their apparent means.

More generally, it seems like there is a new world of possible synergies opened up by the wide release of this information. We almost certainly haven’t thought of all the consequences that will turn out, in retrospect, to be serious.

Houston isn’t the first place to try this—it turns out that the salaries of faculties at state schools are often quietly available for download as well, for example—but it seems to highlight a real problem. It may be good for the salaries of all public employees to be a click away, but the laws that make this possible generally weren’t passed in the last ten years, and therefore weren’t drafted with the web in mind. The legislative intent reflected in most of our current statutes, when a piece of information is statutorily required to be publicly available, is that citizens should be able to get the information by obtaining, filling out, and mailing a form, or by making a trip to a particular courthouse or library. Those small obstacles made a big difference, as their recent removal reveals: Information that you used to need a good reason to justify the cost of obtaining is now worth retrieving for the merest whim, on the off chance that it might be useful or interesting. And massive projects that require lots of retrieval, which used to be entirely impractical, can now make sense in light of any of a wide and growing range of possible motivations.

Put another way: As technology evolves, the same public information laws create novel and in some cases previously unimaginable levels of transparency. In many cases, particularly those related to the conduct of top public officials, this seems to be a clearly good thing. In others, particularly those related to people who are not public figures, it may be more of a mixed blessing or even an outright problem. I’m reminded of the “candidates” of ancient Rome—the Latin word candidatus literally means “clothed in white robes,” which would-be officeholders wore to symbolize the purity and fitness for office they claimed to possess. By putting themselves up for public office, they invited their fellow citizens to hold them to higher standards. This logic still runs strong today—for example, under the Supreme Court’s Sullivan precedent, public figures face a heightened burden if they try to sue the press for libel after critical coverage.

I worry that some kinds of progress in information technology are depleting a kind of civic ozone layer. The policy solutions here aren’t obvious—one shudders to think of a government office with the power to foreclose new, unforeseen transparencies—but it at least seems like something that legislators and their staffs ought to keep an eye on.

Copyright, Technology, and Access to the Law

James Grimmelmann has an interesting new essay, “Copyright, Technology, and Access to the Law,” on the challenges of ensuring that the public has effective knowledge of the laws. This might sound like an easy problem, but Grimmelmann combines history and explanation to show why it can be difficult. The law – which includes both legislators’ statutes and judges’ decisions – is large, complex, and ever-changing.

Suppose I gave you a big stack of paper containing all of the laws ever passed by Congress (and signed by the President). This wouldn’t be very useful, if what you wanted was to know whether some action you were contemplating would violate the law. How would you find the laws bearing on that action? And if you did find such a law, how would you determine whether it had been repealed or amended later, or how courts had interpreted it?

Making the law accessible in practice, and not just in theory, requires a lot of work. You need reliable summaries, topic-based indices, reverse-citation indices (to help you find later documents that might affect the meaning of earlier ones), and so on. In the old days of paper media, all of this had to be printed and distributed in large books, and updated editions had to be published regularly. How to make this happen was an interesting public policy problem.

The traditional answer has been copyright. Generally, the laws themselves (statutes and court opinions) are not copyrightable, but extra-value content such as summaries and indices can be copyrighted. The usual theory of copyright applies: give the creators of extra-value content some exclusive rights, and the profit motive will ensure that good content is created.

This has some similarity to our Princeton model for government transparency, which urges government to publish information in simple open formats, and leave it to private parties to organize and present the information to the public. Here government was creating the basic information (statutes and court opinions) and private parties were adding value. It wasn’t exactly our model, as government was not taking care to publish information in the form that best facilitated private re-use, but it was at least evidence for our assertion that, given data, private parties will step in and add value.

All of this changed with the advent of computers and the Internet, which made many of the previously difficult steps cheaper and easier. For example, it’s much easier to keep a website up to date than to deliver updates to the owners of paper books. Computers can easily construct citation indices, and a search engine provides much of the value of a printed index. Access to the laws can be cheaper and easier now.

What does this mean for public policy? First, we can expect more competition to deliver legal information to the public, thanks to the reduced barriers to entry. Second, as competition drives down prices we’ll see fewer entities that are solely in the business of providing access to laws; instead we’ll see more non-profits, along with businesses providing free access. More competition and lower prices will mean better and more effective access to the law for citizens. Third, copyright will still play a role by supporting the steps that remain costly, such as the writing of summaries.

Finally, it will matter more than ever exactly how government provides access to the raw information. If, as sometimes happens now, government provides the raw information in an awkward or difficult-to-use form, private actors must invest in converting it into a more usable form. These investments might not have mattered much in the past when the rest of the process was already expensive; but in the Internet age they can make a big difference. Given access to the right information in the right format, one person can produce a useful mashup or visualization tool with a few weeks of spare-time work. Government, by getting the details of data publication right, can enable a flood of private innovation, not to mention a better public debate.

New bill advances open data, but could be better for reuse

Senators Obama, Coburn, McCain, and Carper have introduced the Strengthening Transparency and Accountability in Federal Spending Act of 2008 (S. 3077), which would modify their 2006 transparency act. That first bill created USASpending.gov, a searchable web site of government outlays. USASpending.gov—which was based on software developed by OMB Watch and the Sunlight Foundation—allows end users to search across a variety of criteria. It has begun offering an API, an interface that lets developers query the data and display the results on their own sites. This allows a kind of reuse, but differs significantly from the approach suggested in our recent “Invisible Hand” paper. We urge that all the data be published in open formats. An API delivers search results, but that makes the search interface itself very important: having to work through an interface sometimes limits developers from making innovative, unforeseen uses of the data.

The new bill would expand the scope of information available via USASpending.gov, adding information about federal contracts, leases, and audit disputes, among other areas. But it would also elevate the API itself to a matter of statutory mandate. I’m all in favor of mandates that make data available and reusable, but the wording here is already a prime example of why technical standards are often better left to expert regulatory bodies than etched in statute:

” (E) programmatically search and access all data in a serialized machine readable format (such as XML) via a web-services application programming interface”

A technical expert body would (I hope) recognize that there is added value in allowing the data itself to be published so that all of it can be accessed at once. This is significantly different from the site’s current attitude; addressing the list of top contractors by dollar volume, the site’s FAQ says it “does not allow the results of these tables to be downloaded in delimited or XML format because they are not standard search results.” I would argue that standardizers of search results, whomever they may be, should not be able to disallow any data from being downloaded. There doesn’t necessarily need to be a downloadable table of top contractors, but it should be possible for citizens to download all the data so that they can compose such a table themselves if they so desire. The API approach, if it substitutes for making all the data available for download, takes us away from the most vibrant possible ecosystem of data reuse, since whenever government web sites design an interface (whether it’s a regular web interface for end users, or a code-level interface for web developers), they import assumptions about how the data will be used.

All that said, it’s easy to make the data available for download, and a straightforward additional requirement that could be added to the bill. And in any cause we owe a debt of gratitude to Senators Coburn, Obama, McCain and Carper for their pioneering, successful efforts in this area.

==

Update, June 12: Amended the list of cosponsors to include Sens. Carper and (notably) McCain. With both major presidential candidates as cosponsors, the bill seems to reflect a political consensus. The original bill back in 2006 had 48 cosponsors and passed unanimously.