April 19, 2014

avatar

Reflecting on Sunshine Week

Last Wednesday evening, I attended the D.C. Open Government Summit: Street View, which took place at the National Press Club in conjunction with Sunshine Week. The Summit was sponsored by the D.C. Open Government Coalition, a non-profit that “seeks to enhance the public’s access to government information and ensure the transparency of government operations of the District of Columbia.” The Summit successfully focused on two main ideas – using government information to innovate and using government information to inform. I left the Summit encouraged by the enthusiasm for innovation and transparency in the attendees and among some District of Columbia government leaders, but also discouraged because there was a consensus that Washington, DC is still far behind cities such as New York, Kansas City, and Boston in using technology for innovation in government and there is not a vision or financial commitment from the Mayor’s office to facilitate government-wide progress.
[Read more...]

avatar

Local Expertise is Exceedingly Valuable- Principle #7 for Fostering Civic Engagement Through Digital Technologies

One of the most rewarding and enjoyable aspects of my research has been my series of conversations with innovators in civic engagement in various cities across the country. These conversations have been enlightening for me as I think about how Washington, DC can maximize its natural advantages to foster civic engagement in its neighborhoods. The ways in which a local community uses technology to share information and solve urban problems reflect its character.

Two of the conversations that have helped shape my thinking took place earlier this year with Abby Miller, a Bloomberg Innovation Fellow and member of the Memphis Innovation Delivery Team and John Keefe from WNYC, the NPR station in New York City. Today, I will discuss their work leveraging the resources of their very different communities in very different roles – one working inside Memphis city government and the other in the media in New York City.
[Read more...]

avatar

Government Needs to Embrace the Social Web – Principle #6 for Fostering Civic Engagement Through Digital Technologies

As Rahm Emanuel said, “You never want a serious crisis to go to waste. And what I mean by that – it’s an opportunity to do things you think you could not do before.” The Federal government shutdown has, at least temporarily, shed light on the valuable day-to-day work done by the Federal government and its employees. Now is the time for the Federal government to strengthen the connection between the public and Federal employees. The Federal government should embrace the social web as a part of its employees’ work lives.

To this point open government has generally meant that citizens have the right to access the documents and proceedings of the government to allow for effective public oversight. Open government should include people too. Putting a human face – along with professional contact information and areas of expertise – as a part of Agencies’ public facing websites will facilitate transparency. Employees should have something like a Facebook-lite or more open version of Linked-in, where everyone’s profile is visible. Certainly, there will be limitations. For example, employees with military or law enforcement responsibilities will continue to be largely anonymous. As with e-mail, Agencies will develop oversight mechanisms. Even so, the public and Federal employees should have better access to each other.
[Read more...]

avatar

The New Ambiguity of "Open Government"

David Robinson and I have just released a draft paper—The New Ambiguity of “Open Government”—that describes, and tries to help solve, a key problem in recent discussions around online transparency. As the paper explains, the phrase “open government” has become ambiguous in a way that makes life harder for both advocates and policymakers, by combining the politics of transparency with the technologies of open data. We propose using new terminology that is politically neutral: the word adaptable to describe desirable features of data (and the word inert to describe their absence), separately from descriptions of the governments that use these technologies.

Clearer language will serve everyone well, and we hope this paper will spark a conversation among those who focus on civic transparency and innovation. Thanks to Justin Grimes and Josh Tauberer, for their helpful insight and discussions as we drafted this paper.

Download the full paper here.

Abstract:

“Open government” used to carry a hard political edge: it referred to politically sensitive disclosures of government information. The phrase was first used in the 1950s, in the debates leading up to passage of the Freedom of Information Act. But over the last few years, that traditional meaning has blurred, and has shifted toward technology.

Open technologies involve sharing data over the Internet, and all kinds of governments can use them, for all kinds of reasons. Recent public policies have stretched the label “open government” to reach any public sector use of these technologies. Thus, “open government data” might refer to data that makes the government as a whole more open (that is, more transparent), but might equally well refer to politically neutral public sector disclosures that are easy to reuse, but that may have nothing to do with public accountability. Today a regime can call itself “open” if it builds the right kind of web site—even if it does not become more accountable or transparent. This shift in vocabulary makes it harder for policymakers and activists to articulate clear priorities and make cogent demands.

This essay proposes a more useful way for participants on all sides to frame the debate: We separate the politics of open government from the technologies of open data. Technology can make public information more adaptable, empowering third parties to contribute in exciting new ways across many aspects of civic life. But technological enhancements will not resolve debates about the best priorities for civic life, and enhancements to government services are no substitute for public accountability.

avatar

What We Lose if We Lose Data.gov

In its latest 2011 budget proposal, Congress makes deep cuts to the Electronic Government Fund. This fund supports the continued development and upkeep of several key open government websites, including Data.gov, USASpending.gov and the IT Dashboard. An earlier proposal would have cut the funding from $34 million to $2 million this year, although the current proposal would allocate $17 million to the fund.

Reports say that major cuts to the e-government fund would force OMB to shut down these transparency sites. This would strike a significant blow to the open government movement, and I think it’s important to emphasize exactly why shuttering a site like Data.gov would be so detrimental to transparency.

On its face, Data.gov is a useful catalog. It helps people find the datasets that government has made available to the public. But the catalog is really a convenience that doesn’t necessarily need to be provided by the government itself. Since the vast majority of datasets are hosted on individual agency servers—not directly by Data.gov—private developers could potentially replicate the catalog with only a small amount of effort. So even if Data.gov goes offline, nearly all of the data still exist online, and a private developer could go rebuild a version of the catalog, maybe with even better features and interfaces.

But Data.gov also plays a crucial behind the scenes role, setting standards for open data and helping individual departments and agencies live up to those standards. Data.gov establishes a standard, cross-agency process for publishing raw datasets. The program gives agencies clear guidance on the mechanics and requirements for releasing each new dataset online.

There’s a Data.gov manual that formally documents and teaches this process. Each agency has a lead Data.gov point-of-contact, who’s responsible for identifying publishable datasets and for ensuring that when data is published, it meets information quality guidelines. Each dataset needs to be published with a well-defined set of common metadata fields, so that it can be organized and searched. Moreover, thanks to Data.gov, all the data is funneled through at least five stages of intermediate review—including national security and privacy reviews—before final approval and publication. That process isn’t quick, but it does help ensure that key goals are satisfied.

When agency staff have data they want to publish, they use a special part of the Data.gov website, which outside users never see, called the Data Management System (DMS). This back-end administrative interface allows agency points-of-contact to efficiently coordinate publishing activities agency-wide, and it gives individual data stewards a way to easily upload, view and maintain their own datasets.

My main concern is that this invaluable but underappreciated infrastructure will be lost when IT systems are de-funded. The individual roles and responsibilities, the informal norms and pressures, and perhaps even the tacit authority to put new datasets online would likely also disappear. The loss of structure would probably mean that sharply reduced amounts of data will be put online in the future. The datasets that do get published in an ad hoc way would likely lack the uniformity and quality that the current process creates.

Releasing a new dataset online is already a difficult task for many agencies. While the current standards and processes may be far from perfect, Data.gov provides agencies with a firm footing on which they can base their transparency efforts. I don’t know how much funding is necessary to maintain these critical back-end processes, but whatever Congress decides, it should budget sufficient funds—and direct that they be used—to preserve these critically important tools.

avatar

Assessing PACER's Access Barriers

The U.S. Courts recently conducted a year-long assessment of their Electronic Public Access program which included a survey of PACER users. While the results of the assessment haven’t been formally published, the Third Branch Newsletter has an interview with Bankruptcy Judge J. Rich Leonard that discusses a few high-level findings of the survey. Judge Leonard has been heavily involved in shaping the evolution of PACER since its inception twenty years ago and continues to lead today.

The survey covered a wide range of PACER users—“the courts, the media, litigants, attorneys, researchers, and bulk data collectors”—and Judge Leonard claims they found “a remarkably high level of satisfaction”: around 80% of those surveyed were “satisfied” or “very satisfied” with the service.

If we compare public access before we had PACER to where we are now, there is clearly much success to celebrate. But the key question is not only whether current users are satisfied with the service but also whether PACER is reaching its entire audience of potential users. Are there artificial obstacles preventing potential PACER users—who admittedly would be difficult to poll—from using the service? The satisfaction statistic may be fine at face value, assuming that a representative sample of users were polled, but it could be misleading if it’s being used to gauge the overall success of PACER as a public access system.

One indicator of obstacles may be another statistic cited by Judge Leonard: “about 45% of PACER users also use CM/ECF,” the Courts’ electronic case management and filing system. To put it another way, nearly half of all PACER users are currently attorneys who practice federal law.

That number seems inordinately high to me and suggests that significant barriers to public access may exist. In particular, account registration requires all users to submit a valid credit card for billing (or alternatively a valid home address to receive log-in credentials and billing statements by mail.) Even if users’ credit cards are never charged, this registration hurdle may already turn away many potential PACER users at the door.

The other barrier is obviously the cost itself. With a few exceptions, users are forced to pay a fee for each document they download, at a metered rate of eight-cents per page. Judge Leonard asserts that “surprisingly, cost ranked way down” in the survey and that “most people thought they paid a fair price for what they got.”

But this doesn’t necessarily imply that cost isn’t a major impediment to access. It may just be that those surveyed—primarily lawyers—simply pass the cost of using PACER down to their clients and never bear the cost themselves. For the rest of PACER users who don’t have that luxury, the high cost of access can completely rule out certain kinds of legal research, or cause users to significantly ration and monitor their usage (as is the case even in the vast majority of our nation’s law libraries), or wholly deter users from ever using the service.

Judge Leonard rightly recognizes that it’s Congress that has authorized the collection of user fees, rather than using general taxpayer money, to fund the electronic public access program. But I wish the Courts would at least acknowledge that moving away from a fee-based model, to a system funded by general appropriations, would strengthen our judicial process and get us closer to securing each citizen’s right to equal protection under the law.

Rather than downplaying the barriers to public access, the Courts should work with Congress to establish a way forward to support a public access system that is truly open. They should study and report on the extent to which Congress already funds PACER indirectly, through Executive and Legislative branch PACER fee payments to the Judiciary, and re-appropriate those funds directly. If there is a funding shortfall, and I assume there will be, they should study the various options for closing that gap, such as additional direct appropriations or a slight increase in certain filing fees.

With our other two branches of government making great strides in openness and transparency with the help of technology, the Courts similarly needs to transition away from a one-size-fits-all approach to information dissemination. Public access to the courts will be fundamentally transformed by a vigorous culture of civic innovation around federal court documents, and this will only happen if the Courts confront today’s access barriers head-on and break them down.

(Thanks to Daniel Schuman for pointing me to the original article.)

avatar

Broadband Politics and Closed-Door Negotiations at the FCC

The last seven days at the FCC have been drama-filled, and that’s not something you can often say about an administrative agency. As I noted in my last post, the FCC is considering reclassifying broadband as a “common carrier” service. This would subject the access portion of the service to some additional regulations which currently do not apply, but have (to some extent) been applied in the past. Last Thursday, the FCC voted 3-2 along party lines to pursue a Notice of Inquiry about this approach and others, in order to help solidify its ability to enforce consumer protections and implement the National Broadband Plan in the wake of the Comcast decision in the DC Circuit Court. There was a great deal of politicking and rhetoric around the vote. Then, on Monday, the Wall Street Journal reported that lobbyists were engaged in closed-door meetings at the FCC, discussing possible legislative compromises that would obviate the need for reclassification. This led to public outcry from everyone who was not involved in the meetings, and allegations of misconduct by the FCC for its failure to disclose the meetings. If you sit through my description of the intricacies of reclassification, I promise to give you the juicy bits about the controversial meetings.

The Reclassification Vote and the NOI
As I explained in my previous post, the FCC faces a dilemma. The DC Circuit said it did not have the authority under Title I of the Communications Act to enforce the broadband openness principles it espoused in 2005. This cast into doubt the FCC’s ability to not only police violations of the principles but also to implement many portions of the National Broadband Plan. In the past, the Commission would have had unquestioned authority under Title II of the Act, but in a series of decisions from 2002-2007 it voluntarily “deregulated” broadband by classifying it as a Title I service. Chairman Genachowski has floated what he calls a “Third Way” approach in which broadband is not classified as a Title I service anymore, and is not subject to all provisions of Title II, but instead is classified under Title II but with extensive “forbearance” from portions of that title.

From a legal perspective, the main question is whether the FCC has the authority to reclassify the transmission component of broadband internet service as a Title II service. This gets into intricacies of how broadband service fits into statutory definitions of “information service” (aka Title I), “telecommunications”, “telecommunications service” (aka Title II), and the like. I was going to lay these out in detail, but in the interest of getting to the juicy stuff I will simply direct you to Harold Feld’s excellent post. For the “Third Way” approach to work, the FCC’s interpretation of a “telecommunications service” will have to be articulated to include broadband internet access while not also swallowing a variety of internet services that everyone thinks should remain unregulated — sites like Facebook, content delivery networks like Akamai, and digital media providers like Netflix. However, this narrow definition must not be so narrow that the FCC does not have jurisdiction to police the types of practices it is concerned about (for instance, providers should not be able to discriminate in their delivery of traffic simply by moving the discrimination from their transport layer of the network to the logical layer, or by partnering with an affiliated “ISP” that does discrimination for them). I am largely persuaded of Harold’s arguments, but the AT&T lobbyists present the other side as well. One argument that I don’t see anyone making (yet) is that presuming the transmission component is subject to Title II, the FCC would seem to have a much stronger argument for exercising ancillary jurisdiction with respect to interrelated components like non-facilities-based ISPs that rely on that transmission component.

The other legal debate involves an even more arcane discussion about whether — assuming there is a “telecommunications service” offered as part of broadband service — that “telecommunications service” is something that can be regulated separately from the other “information services” (Title I) that might be offered along with it. This includes things like an email address from your provider, DNS, Usenet, and the like. Providers have historically argued that these were inseparable from the internet access component, and the so-called “Stevens Report” of 1998 introduced the notion that the “inextricably intertwined” nature of broadband service might have the result of classifying all such services as entirely Title I “information services.” To the extent that this ever made any sense, it is far from true today. What consumers believe they are purchasing is access to the internet, and all of those other services are clearly extricable from a definitional and practical standpoint (indeed, customers can and do opt for competitors for all of them on a regular basis).

But none of these legal arguments are at the fore of the current debate, which is almost entirely political. Witness, for example, John Boehner’s claim that the “Third Way” approach was a “government takeover of the Internet,” Fred Upton’s (R-MI) claim that the approach is a “blind power grab,” modest Democratic sign-on to an industry-penned and reasoning-free opposition letter, and an attempt by Republican appropriators to block funding for the FCC unless they swore off the approach. This prompted a strong response from Democratic leaders indicating that any such effort would not see the light of day. Ultimately, the FCC voted in favor of the NOI to explore the issue. Amidst this tumult, the WSJ reported that the FCC had started closed-door meetings with industry representatives in order to discuss a possible legislative compromise.

Possible Legislation and Secret Meetings
It is not against the rules to communicate with the FCC about active proceedings. Indeed, such communications are part of a healthy policymaking process that solicits input from stakeholders. The FCC typically conducts proceedings under the “permit but disclose” regime in which all discussions pertaining to the given proceeding must be described in “ex parte” filings on the docket. Ars has a good overview of the ex parte regime. The NOI passed last week is subject to these rules.

It therefore came as a surprise that a subset of industry players were secretly meeting with the FCC to discuss possible legislation that could make the NOI irrelevant. This issue is made even more egregious by the fact that the FCC just conducted a proceeding on improving ex parte disclosures, and the Chairman remarked:

“Given the complexity and importance of the issues that come before us, ex parte communications remain an essential part of our deliberative process. It is essential that industry and public stakeholders know the facts and arguments presented to us in order to express informed views.”

The Chairman’s Chief of Staff Edward Lazarus sought to explain away the obligation for ex parte disclosure, and nevertheless attached a brief disclosure letter from the meeting attendees that didn’t describe any of the details. There is perhaps a case to be made that the legislative options do not directly fall under the subject matter of the NOI, but even if this position were somehow legally justifiable it clearly falls afoul of the policy intent of the ex parte rules. Harold Feld has a great post in which he describes his nomination for “Worsht Ex Parte Ever“. The letter attached to the Lazarus post would certainly take the title if it were a formal ex parte letter. The industry participants in the meetings deserve some criticism, but ultimately the problems can only be resolved by the FCC by demanding comprehensive openness rather than perpetuating a culture of loopholes.

The public outcry continues, from both public interest groups and in the comments on the Lazarus post. If it’s true that the FCC admits internally that “they f*cked up”, they should do far more to regain the public’s trust in the integrity of the notice-and-comment process.

Update: The Lazarus post was just updated to replace the link to the brief disclosure letter with two new links to letters that describe themselves as Ex Parte letters. The first contains the exact same text as the original, and the second has a few bullet points.

avatar

Release Government Data, Early and Often

One of the key axioms of modern open government is that all public data should be published online in a raw but usable form. Usability in this case is aimed at software programmers. By making government datasets more usable, programmers are more likely to innovate in the civic sphere and build technologies, using the raw data, to enhance the relationships among citizens and with government.

The open government community has provided plenty of valuable guidance about what usability means for programmers. We proclaim that all datasets need to be: published in a format that is reasonably structured and machine-processable; well-documented; downloadable in bulk; authenticated using cryptographic digital signatures; version-controlled; permanent and citable; and the list goes on and on. These are all worthy principles to be sure, and all government datasets should strive to meet them.

But you’ll be hard-pressed to find any government datasets that exist with all of these principles pre-satisfied. While some are in better shape than others, most datasets would make programmers cringe. Data often only exist as informal working sets in proprietary Excel spreadsheets. Sometimes they are in structured databases, but schemas are undocumented, field values are ambiguous, and the semantics are only understood by the employee who created them. Datasets have errors and biases that are known but never explicitly corrected.

For a civil servant who is a data caretaker looking over the laundry list of publishing principles, there’s frequently a huge quality chasm between the dataset she owns and how people are asking to see it released. To her, publishing this data adequately just seems like a lot of extra work. The more attractive alternative is to put off the data publishing—it’s not in her job description or evaluations anyway—and move on to other work instead.

How can this chasm be bridged? A widely-adopted philosophy in software development and entrepreneurship would serve open government data well: release early and release often. And listen to your customers.

In the software development world, a working version of the product is pushed out as soon as possible even with known imperfections—an “alpha” release—so it can be subject to real use by early adopters. Early adopters can provide helpful feedback about what works, what’s broken, and what new features would be most useful to them. The software developers then iterate quickly. They incorporate the suggested fixes and features into their code and release an updated version of the product to their users. The virtuous cycle then starts again. Under this philosophy, software developers can be efficient about how to best improve their code where it matters, and users get software that works better and has more features they desire.

The “release early, release often” philosophy should be applied to government data. For the initial release, data caretakers should take the path of least resistance to get data out the door. This means publishing datasets in whatever format is most convenient, along with as much documentation as can reasonably be mustered. Documentation is especially important with an “alpha” dataset—proper warnings about its problems, instabilities and inductive limitations must be prominently displayed. (Of course, the usual privacy and legal caveats should also be applied.) Sometimes, the “alpha” release will be “good enough” for programmers to start their work, and this will minimize any superfluous work done by caretakers. This is the virtue of “release early.”

In other cases, programmers will need assistance using the dataset and will notice problem spots with the initial release. The dataset might be confusing, contain errors or be difficult to work with. A tight feedback mechanism allows the programmer to get help quickly and continue to innovate, while the data caretaker can fix problems based on real use cases and add clarifying metadata into an updated version of the dataset. Data quality and usability increases for those working with the dataset, both in and outside of government. That’s the virtue of “release often.”

And here is the big opportunity for government: no platform currently exists to engage the prime audience for government data—software programmers. Without a tight feedback mechanism, the virtuous cycle of mutual benefit cannot exist. Government is missing its best opportunity to improve data quality by neglecting useful feedback from programmers who are actually tinkering with the datasets. Society is losing out on potentially game-changing civic innovations, which otherwise would have been built if data were more usable and the uncertainty of failure reduced.

A terrific start in turning the corner would be for government to adopt an issue-tracking system for its datasets. As a public venue, it would help ensure that data caretakers are prompt in addressing developer concerns. It would also allow caretakers to organize feedback in a formal way. Such platforms are commonplace in any successful software development venture. The same needs to be true for government data in order to drive rapid quality improvements and increase developer engagement.

avatar

CITP Expands Scope of RECAP

Today, we’re thrilled to announce the next version of our RECAP technology, dramatically expanding the scope of the project.

Having had some modest success at providing public access to legal documents, we’re now taking the next logical step, offering easy public access to illegal documents.

The Internet Archive, which graciously hosts RECAP’s repository of legal documents, was strangely unreceptive to our offer to let them store the world’s most comprehensive library of illegal documents. Fortunately, the Pirate Bay was happy to step in and help.

Interested in seeing what’s available? Then you might want to watch our brief instructional video.

avatar

Best Practices for Government Datasets: Wrap-Up

[This is the fifth and final post in a series on best practices for government datasets by Harlan Yu and me. (previous posts: 1, 2, 3, 4)]

For our final post in this series, we’ll discuss several issues not touched on by earlier posts, including data signing and the use of certain non-text file formats. The relatively brief discussions of these topics should not be interpreted as an indicator of their importance. The topics simply did not fit cleanly into earlier posts.

One significant omission from earlier posts is the issue of data signing with digital signatures. Before discussing this issue, let’s briefly discuss what a digital signature is. Suppose that you want to email me an IOU for $100. Later, I may want to prove that the IOU came from you—it’s of little value if you can claim that I made it up. Conversely, you may want the ability to prove whether the document has been altered. Otherwise, I could claim that you owe me $100,000.

Digital signatures help in proving the origin and authenticity of data. These signatures require that you create two related big numbers, known as keys: a private signing key (known only by you) and a public verification key. To generate a digital signature, you plug the data and your signing key into a complicated formula. The formula spits out another big number known a digital signature. Given the signature and your data, I can use the verification key to prove that the data came unmodified from you. Similarly, nobody can credibly sign modified data without your signing key—so you should be very careful to keep this key a secret.

Developers may want to ensure the authenticity of government data and to prove that authenticity to users. At first glance, the solution seems to be a simple application of digital signatures: agencies sign their data, and anyone can use the signatures to authenticate an agency’s data. In spite of their initially steep learning curve, tools like GnuPG provide straightforward file signing. In practice, the situation is more complicated. First, an agency must decide what data to sign. Perhaps a dataset contains numerous documents. Developers and other users may want signatures not only for the full dataset but also for individual documents in it.

Once an agency knows what to sign, it must decide who will perform the signing. Ideally, the employee producing the dataset would sign it immediately. Unfortunately, this solution requires all such employees to understand the signature tools and to know the agency’s signing key. Widespread distribution of the signing key increases the risk that it will be accidentally revealed. Therefore, a central party is likely to sign most data. Once data is signed, an agency must have a secure channel for delivering the verification key to consumers of the data—users cannot confirm the authenticity of signed data without this key. While signing a given file with a given key may not be hard, surrounding issues are more tricky. We offer no simple solution here, but further discussion of this topic between government agencies, developers, and the public could be useful for all parties.

Another issue that earlier posts did not address is the use of non-text spreadsheet formats, including Microsoft Excel’s XLS format. These formats can sometimes be useful because they allow the embedding of formulas and other rich information along with the data. Unfortunately, these formats are far more complex than raw text formats, so they present a greater challenge for automated processing tools. A comma-separated value (CSV) file is a straightforward text format that contains values separated by line breaks and commas. It provides an alternative to complicated spreadsheet formats. For example, the medal count from the 2010 Winter Olympics in CSV would be:

  Country,Gold,Silver,Bronze,Total
  USA,9,15,13,37
  Germany,10,13,7,30
  Canada,14,7,5,26
  Norway,9,8,6,23
  ...

Fortunately, the release of data in one format does not preclude its release in another format. Most spreadsheet programs provide an option to save data in CSV form. Agencies should release spreadsheet data in a textual format like CSV by default, but an agency should feel free to also release the data in XLS or other formats.

Similarly, agencies will sometimes release large files or groups of files in a compressed or bundled format (for example, ZIP, TAR, GZ, BZ). In these cases, agencies should prominently specify where users can freely obtain software and instructions for extracting the data. Because so many means of compressing and bundling files exist, agencies should not presume that the necessary tools and steps are obvious from the data files themselves.

The rules suggested throughout this series should be seen as best practices rather than hard-and-fast rules. We are still in the process of fleshing out several of these ideas ourselves, and exceptional cases sometimes justify exceptional treatment. In unusual cases, an agency may need to deviate from traditional best practices, but it should carefully consider (and perhaps document) its rationale for doing so. Rules are made to be broken, but they should not be broken for mere expedience.

Our hope is that this series will provide agencies with some points to consider prior to releasing data. Because of Data.gov and the increasing traction of openness and transparency initiatives, we expect to see many more datasets enter the public domain in the coming years. Some agencies will approach the release of bulk data with minimal previous experience. While this poses a challenge, it also present an opportunity for committed agencies to institute good practices early, before bad habits and poor-quality legacy datasets can accumulate. When releasing new datasets, agencies will make numerous conscious and unconscious choices that impact developers. We hope to help agencies understand developers’ challenges when making these choices.

After gathering input from the community, we plan to create a technical report based on this series of posts. Thanks to numerous readers for insightful feedback; your comments have influenced and clarified our thoughts. If any FTT readers inside or outside of government have additional comments about this post or others, please do pass them along.