December 6, 2023

Collateral Freedom in China

OpenITP has just released a new report—Collateral Freedom—that studies the state of censorship circumvention tool usage in China today. From the report’s overview: This report documents the experiences of 1,175 Chinese Internet users who are circumventing their country’s Internet censorship—and it carries a powerful message for developers and funders of censorship circumvention tools. We believe […]

The New Ambiguity of "Open Government"

David Robinson and I have just released a draft paper—The New Ambiguity of “Open Government”—that describes, and tries to help solve, a key problem in recent discussions around online transparency. As the paper explains, the phrase “open government” has become ambiguous in a way that makes life harder for both advocates and policymakers, by combining the politics of transparency with the technologies of open data. We propose using new terminology that is politically neutral: the word adaptable to describe desirable features of data (and the word inert to describe their absence), separately from descriptions of the governments that use these technologies.

Clearer language will serve everyone well, and we hope this paper will spark a conversation among those who focus on civic transparency and innovation. Thanks to Justin Grimes and Josh Tauberer, for their helpful insight and discussions as we drafted this paper.

Download the full paper here.

Abstract:

“Open government” used to carry a hard political edge: it referred to politically sensitive disclosures of government information. The phrase was first used in the 1950s, in the debates leading up to passage of the Freedom of Information Act. But over the last few years, that traditional meaning has blurred, and has shifted toward technology.

Open technologies involve sharing data over the Internet, and all kinds of governments can use them, for all kinds of reasons. Recent public policies have stretched the label “open government” to reach any public sector use of these technologies. Thus, “open government data” might refer to data that makes the government as a whole more open (that is, more transparent), but might equally well refer to politically neutral public sector disclosures that are easy to reuse, but that may have nothing to do with public accountability. Today a regime can call itself “open” if it builds the right kind of web site—even if it does not become more accountable or transparent. This shift in vocabulary makes it harder for policymakers and activists to articulate clear priorities and make cogent demands.

This essay proposes a more useful way for participants on all sides to frame the debate: We separate the politics of open government from the technologies of open data. Technology can make public information more adaptable, empowering third parties to contribute in exciting new ways across many aspects of civic life. But technological enhancements will not resolve debates about the best priorities for civic life, and enhancements to government services are no substitute for public accountability.

Retiring FedThread

Nearly two years ago, the Federal Register was published in a structured XML format for the first time. This was a big deal in the open government world: the Federal Register, often called the daily newspaper of our federal government, is one of our government’s most widely read publications. And while it could previously be read in paper and PDF forms, it wasn’t easy to digitally manipulate. The XML release changed all this.

When we heard this was happening, four of us here at CITP—Ari Feldman, Bill Zeller, Joe Calandrino, and myself—decided to see how we might be able to improve how citizens could interact with the Federal Register. Our big idea was to make it easy for anyone to comment paragraph-by-paragraph on any of its documents, like a proposed regulation. The site, which we called FedThread, would provide an informal public forum for annotating these documents, and we hoped it would lead to useful online discussions about the merits and weaknesses of all kinds of federal regulatory activity. We also added other useful features, like a full-text search engine and custom RSS feeds. Building these features for the Federal Register only became a straightforward task because of the new XML version. We built the site in just eight days from conception to release.

Another trio of developers in SF also saw opportunities in this free machine-readable resource and developed their own project called GovPulse, which had already won the Sunlight Foundation’s Apps for America 2 contest. They were then approached by the staff of the Federal Register last summer to expand their site to create what would become the new online face of the publication, Federal Register 2.0. Their approach to user comments actually guided users into participating in the formal regulatory comment process—a great idea. Federal Register 2.0 included several features present in FedThread, and many more. Everything was done using open source tools, and made available to the public as open source.

This has left little reason for us to continue operating FedThread. It has continued to reliably provide the features we developed two years ago, but our regular users will find it straightforward to transition to the similar (and often superior) search and subscription features on Federal Register 2.0. So, we’re retiring FedThread. However, the code that we developed will continue to be available, and we hope that enterprising developers will find components to re-use in their own projects that benefit society. For instance, the general purpose paragraph-commenting code that we developed can be useful in a variety of projects. Of course, that code itself was an adaptation of the code supporting another open source project—the Django Book, a free set of documentation about the web framework that we were using to build FedThread (but this is what developers would call a “meta” observation).

Ideally, this is how hacking open government should work. Free machine readable data sets beget useful new ways for citizens to explore those data and make it useful to other citizens. Along the way, they experiment with different ideas, some of which catch on and others of which serve as fodder for the next great idea. This happens faster than standard government contracting, and often produces more innovative results.

Finally, a big thanks to the GPO, NARA and the White House Open Government Initiative for making FedThread possible and for helping to demonstrate that this approach can work, and congratulations on the fantastic Federal Register 2.0.

What We Lose if We Lose Data.gov

In its latest 2011 budget proposal, Congress makes deep cuts to the Electronic Government Fund. This fund supports the continued development and upkeep of several key open government websites, including Data.gov, USASpending.gov and the IT Dashboard. An earlier proposal would have cut the funding from $34 million to $2 million this year, although the current proposal would allocate $17 million to the fund.

Reports say that major cuts to the e-government fund would force OMB to shut down these transparency sites. This would strike a significant blow to the open government movement, and I think it’s important to emphasize exactly why shuttering a site like Data.gov would be so detrimental to transparency.

On its face, Data.gov is a useful catalog. It helps people find the datasets that government has made available to the public. But the catalog is really a convenience that doesn’t necessarily need to be provided by the government itself. Since the vast majority of datasets are hosted on individual agency servers—not directly by Data.gov—private developers could potentially replicate the catalog with only a small amount of effort. So even if Data.gov goes offline, nearly all of the data still exist online, and a private developer could go rebuild a version of the catalog, maybe with even better features and interfaces.

But Data.gov also plays a crucial behind the scenes role, setting standards for open data and helping individual departments and agencies live up to those standards. Data.gov establishes a standard, cross-agency process for publishing raw datasets. The program gives agencies clear guidance on the mechanics and requirements for releasing each new dataset online.

There’s a Data.gov manual that formally documents and teaches this process. Each agency has a lead Data.gov point-of-contact, who’s responsible for identifying publishable datasets and for ensuring that when data is published, it meets information quality guidelines. Each dataset needs to be published with a well-defined set of common metadata fields, so that it can be organized and searched. Moreover, thanks to Data.gov, all the data is funneled through at least five stages of intermediate review—including national security and privacy reviews—before final approval and publication. That process isn’t quick, but it does help ensure that key goals are satisfied.

When agency staff have data they want to publish, they use a special part of the Data.gov website, which outside users never see, called the Data Management System (DMS). This back-end administrative interface allows agency points-of-contact to efficiently coordinate publishing activities agency-wide, and it gives individual data stewards a way to easily upload, view and maintain their own datasets.

My main concern is that this invaluable but underappreciated infrastructure will be lost when IT systems are de-funded. The individual roles and responsibilities, the informal norms and pressures, and perhaps even the tacit authority to put new datasets online would likely also disappear. The loss of structure would probably mean that sharply reduced amounts of data will be put online in the future. The datasets that do get published in an ad hoc way would likely lack the uniformity and quality that the current process creates.

Releasing a new dataset online is already a difficult task for many agencies. While the current standards and processes may be far from perfect, Data.gov provides agencies with a firm footing on which they can base their transparency efforts. I don’t know how much funding is necessary to maintain these critical back-end processes, but whatever Congress decides, it should budget sufficient funds—and direct that they be used—to preserve these critically important tools.

What are the Constitutional Limits on Online Tracking Regulations?

As the conceptual contours of Do Not Track are being worked out, an interesting question to consider is whether such a regulation—if promulgated—would survive a First Amendment challenge. Could Do Not Track be an unconstitutional restriction on the commercial speech of online tracking entities? The answer would of course depend on what restrictions a potential regulation would specify. However, it may also depend heavily on the outcome of a case currently in front of the Supreme Court—Sorrell v. IMS Health Inc.—that challenges the constitutionality of a Vermont medical privacy law.

The privacy law at issue would restrict pharmacies from selling prescription drug records to data mining companies for marketing purposes without the prescribing doctor’s consent. These drug records each contain extensive details about the doctor-patient relationship, including “the prescriber’s name and address, the name, dosage and quantity of the drug, the date and place the prescription is filled and the patient’s age and gender.” A doctor’s prescription record can be tracked very accurately over time, and while patient names are redacted, each patient is assigned a unique identifier so their prescription histories may also be tracked. Pharmacies have been selling these records to commercial data miners, who in turn aggregate the data and sell compilations to pharmaceutical companies, who then engage in direct marketing back to individual doctors using a practice known as “detailing.” Sound familiar yet? It’s essentially brick-and-mortar behavioral advertising, and a Do Not Track choice mechanism, for prescription drugs.

The Second Circuit recently struck down the Vermont law on First Amendment grounds, ruling first that the law is a regulation of commercial speech and second that the law’s restrictions fall on the wrong side of the Central Hudson test—the four-step analysis used to determine the constitutionality of commercial speech restrictions. This ruling clashes explicitly with two previous decisions in the First Circuit, in Ayotte and Mills, which deemed that similar medical privacy laws in Maine and New Hampshire were constitutional. As such, the Supreme Court decided in January to take the case and resolve the disagreement, and the oral argument is set for April 26th.

I’m not a lawyer, but it seems like the outcome of Sorrell could have a wide-ranging impact on current and future information privacy laws, including possible Do Not Track regulations. Indeed, the petitioners recognize the potentially broad implications of their case. From the petition:

“Information technology has created new and unprecedented opportunities for data mining companies to obtain, monitor, transfer, and use personal information. Indeed, one of the defining traits of the so-called “Information Age” is this ability to amass information about individuals. Computers have made the flow of data concerning everything from personal purchasing habits to real estate records easier to collect than ever before.”

One central question in the case is whether a restriction on access to these data for marketing purposes is a restriction on legitimate commercial speech. The Second Circuit believes it is, reasoning that even “dry information” sold for profit—and already in the hands of a private actor—is entitled to First Amendment protection. In contrast, the First Circuit in Ayotte posited that the information being exchanged has “itself become a commodity,” not unlike beef jerky, so such restrictions are only a limitation on commercial conduct—not speech—and therefore do not implicate any First Amendment concerns.

A major factual difference here, as compared to online privacy and tracking, is that pharmacies are required by many state and federal laws to collect and maintain prescription drug records, so there may be more compelling reasons for the state to restrict access to this information.

In the case of online privacy, it could be argued that Internet users are voluntarily supplying information to the tracking servers, even though many users probably don’t intend to do this, nor do they expect that this is occurring. Judge Livingston, in her circuit dissent in Sorrell, notes that different considerations apply where the government is “prohibiting a speaker from conveying information that the speaker already possesses,” distinguishing that from situations where the government restricts access to the information itself. In applying this to online communications, at what point does the server “possess” the user’s data—when the packets are received and are sitting in a buffer or when the packets are re-assembled and the data permanently stored? Is there a constitutional difference between restrictions on collection versus restrictions on use? The Supreme Court in 1965 in Zemel v. Rusk stated that “the right to speak and publish does not carry with it the unrestrained right to gather information.” To what extent does this apply to government restrictions of online tracking?

The constitutionality of state and federal information privacy laws have historically and consistently been called into question, and things would be no different if—and it’s a big if— Congress grants the FTC authority over online tracking. When considering technical standards and what “tracking” means, it’s worth keeping in mind the possible constitutional challenges insofar as state action may be involved, as some desirable options to curb online tracking may only be possible within a voluntary or self-regulatory framework. Where that line is drawn will depend on how the Supreme Court comes down in Sorrell and how broadly they decide the case.