May 27, 2022

Archives for June 2008

New bill advances open data, but could be better for reuse

Senators Obama, Coburn, McCain, and Carper have introduced the Strengthening Transparency and Accountability in Federal Spending Act of 2008 (S. 3077), which would modify their 2006 transparency act. That first bill created, a searchable web site of government outlays.—which was based on software developed by OMB Watch and the Sunlight Foundation—allows end users to search across a variety of criteria. It has begun offering an API, an interface that lets developers query the data and display the results on their own sites. This allows a kind of reuse, but differs significantly from the approach suggested in our recent “Invisible Hand” paper. We urge that all the data be published in open formats. An API delivers search results, but that makes the search interface itself very important: having to work through an interface sometimes limits developers from making innovative, unforeseen uses of the data.

The new bill would expand the scope of information available via, adding information about federal contracts, leases, and audit disputes, among other areas. But it would also elevate the API itself to a matter of statutory mandate. I’m all in favor of mandates that make data available and reusable, but the wording here is already a prime example of why technical standards are often better left to expert regulatory bodies than etched in statute:

” (E) programmatically search and access all data in a serialized machine readable format (such as XML) via a web-services application programming interface”

A technical expert body would (I hope) recognize that there is added value in allowing the data itself to be published so that all of it can be accessed at once. This is significantly different from the site’s current attitude; addressing the list of top contractors by dollar volume, the site’s FAQ says it “does not allow the results of these tables to be downloaded in delimited or XML format because they are not standard search results.” I would argue that standardizers of search results, whomever they may be, should not be able to disallow any data from being downloaded. There doesn’t necessarily need to be a downloadable table of top contractors, but it should be possible for citizens to download all the data so that they can compose such a table themselves if they so desire. The API approach, if it substitutes for making all the data available for download, takes us away from the most vibrant possible ecosystem of data reuse, since whenever government web sites design an interface (whether it’s a regular web interface for end users, or a code-level interface for web developers), they import assumptions about how the data will be used.

All that said, it’s easy to make the data available for download, and a straightforward additional requirement that could be added to the bill. And in any cause we owe a debt of gratitude to Senators Coburn, Obama, McCain and Carper for their pioneering, successful efforts in this area.


Update, June 12: Amended the list of cosponsors to include Sens. Carper and (notably) McCain. With both major presidential candidates as cosponsors, the bill seems to reflect a political consensus. The original bill back in 2006 had 48 cosponsors and passed unanimously.

Study Shows DMCA Takedowns Based on Inconclusive Evidence

A new study by Michael Piatek, Yoshi Kohno and Arvind Krishnamurthy at the University of Washington shows that copyright owners’ representatives sometimes send DMCA takedown notices where there is no infringement – and even to printers and other devices that don’t download any music or movies. The authors of the study received more than 400 spurious takedown notices.

Technical details are summarized in the study’s FAQ:

Downloading a file from BitTorrent is a two step process. First, a new user contacts a central coordinator [a “tracker” – Ed] that maintains a list of all other users currently downloading a file and obtains a list of other downloaders. Next, the new user contacts those peers, requesting file data and sharing it with others. Actual downloading and/or sharing of copyrighted material occurs only during the second step, but our experiments show that some monitoring techniques rely only on the reports of the central coordinator to determine whether or not a user is infringing. In these cases whether or not a peer is actually participating is not verified directly. In our paper, we describe techniques that exploit this lack of direct verification, allowing us to frame arbitrary Internet users.

The existence of erroneous takedowns is not news – anybody who has seen the current system operating knows that some notices are just wrong, for example referring to unused IP addresses. Somewhat more interesting is the result that it is pretty easy to “frame” somebody so they get takedown notices despite doing nothing wrong. Given this, it would be a mistake to infer a pattern of infringement based solely on the existence of takedown notices. More evidence should be required before imposing punishment.

Now it’s not entirely crazy to send some kind of soft “warning” to a user based on the kind of evidence described in the Washington paper. Most of the people who received such warnings would probably be infringers, and if it’s nothing more than a warning (“Hey, it looks like you might be infringing. Don’t infringe.”) it could be effective, especially if the recipients know that with a bit more work the copyright owner could gather stronger evidence. Such a system could make sense, as long as everybody understood that warnings were not evidence of infringement.

So are copyright owners overstepping the law when they send takedown notices based on inconclusive evidence? Only a lawyer can say for sure. I’ve read the statute and it’s not clear to me. Readers who have an informed opinion on this question are encouraged to speak up in the comments.

Whether or not copyright owners can send warnings based on inconclusive evidence, the notification letters they actually send imply that there is strong evidence of infringement. Here’s an excerpt from a letter sent to the University of Washington about one of the (non-infringing) study computers:

XXX, Inc. swears under penalty of perjury that YYY Corporation has authorized XXX to act as its non-exclusive agent for copyright infringement notification. XXX’s search of the protocol listed below has detected infringements of YYY’s copyright interests on your IP addresses as detailed in the attached report.

XXX has reasonable good faith belief that use of the material in the manner complained of in the attached report is not authorized by YYY, its agents, or the law. The information provided herein is accurate to the best of our knowledge. Therefore, this letter is an official notification to effect removal of the detected infringement listed in the attached report. The attached documentation specifies the exact location of the infringement.

The statement that the search “has detected infringements … on your IP addresses” is not accurate, and the later reference to “the detected infringement” also misleads. The letter contains details of the purported infringement, which once again give the false impression that the letter’s sender has verified that infringement was actually occurring:

Evidentiary Information:
Notice ID: xx-xxxxxxxx
Recent Infringement Timestamp: 5 May 2008 20:54:30 GMT
Infringed Work: Iron Man
Infringing FileName: Iron Man TS Kvcd(A Karmadrome Release)KVCD by DangerDee
Infringing FileSize: 834197878
Protocol: BitTorrent
Infringing URL:
Infringers IP Address:
Infringer’s DNS Name:
Infringer’s User Name:
Initial Infringement Timestamp: 4 May 2008 20:22:51 GMT

The obvious question at this point is why the copyright owners don’t do the extra work to verify that the target of the letter is actually transferring copyrighted content. There are several possibilities. Perhaps BitTorrent clients can recognize and shun the detector computers. Perhaps they don’t want to participate in an act of infringement by sending or receiving copyrighted material (which would be necessary to know that something on the targeted computer is willing to transfer it). Perhaps it simply serves their interests better to send lots of weak accusations, rather than fewer stronger ones. Whatever the reason, until copyright owners change their practices, DMCA notices should not be considered strong evidence of infringement.

NJ Election Day: Voting Machine Status

Today is primary election day in New Jersey, for all races except U.S. President. (The presidential primary was Feb. 5.) Here’s a roundup of the voting-machine-related issues.

First, Union County found that Sequoia voting machines had difficulty reporting results for a candidate named Carlos Cedeño, reportedly because it couldn’t handle the n-with-tilde character in his last name. According to the Star-Ledger, Sequoia says that election results will be correct but there will be some kind of omission on the result tape printed by the voting machine.

Second, the voting machines in my polling place are fitted with a clear-plastic shield over the operator panel, which only allows certain buttons on the panel to be pressed. Recall that some Sequoia machines reported discrepancies in the presidential primary on Feb. 5, and Sequoia said that these happened when poll workers accidentally pressed buttons on the operator panel that were supposed to be unused. This could only have been caused by a design problem in the machines, which probably was in the software. To my knowledge, Sequoia hasn’t fixed the design problem (nor have they offered an explanation that is consistent with all of the evidence – but that’s another story), so there was likely an ongoing risk of trouble in today’s election. The plastic shield looks like a kludgy but probably workable temporary fix.

Third, voting machines were left unguarded all over Princeton, as usual. On Sunday and Monday evenings, I visited five polling places in Princeton and found unguarded voting machines in all of them – 18 machines in all. The machines were sitting in school cafeteria/gyms, entry hallways, and even in a loading dock area. In no case were there any locks or barriers stopping people from entering and walking right up to the machines. In no case did I see any other people. (This was in the evening, roughly between 8:00 and 9:00 PM). There were even handy signs posted on the street pointing the way to the polling place, showing which door to enter, and so on.

Here are some photos of unguarded voting machines, taken on Sunday and Monday: