November 25, 2024

Tinkering with Disclosed Source Voting Systems

As Ed pointed out in October, Sequoia Voting Systems, Inc. (“Sequoia”) announced then that it intended to publish the source code of their voting system software, called “Frontier”, currently under development. (Also see EKR‘s post: “Contrarianism on Sequoia’s Disclosed Source Voting System”.)

Yesterday, Sequoia made good on this promise and you can now pull the source code they’ve made available from their Subversion repository here:
http://sequoiadev.svn.beanstalkapp.com/projects/

Sequoia refers to this move in it’s release as “the first public disclosure of source code from a voting systems manufacturer”. Carefully parsed, that’s probably correct: there have been unintentional disclosures of source code (e.g., Diebold in 2003) and I know of two other voting industry companies that have disclosed source code (VoteHere, now out of business, and Everyone Counts), but these were either not “voting systems manufacturers” or the disclosures were not available publicly. Of course, almost all of the research systems (like VoteBox and Helios) have been truly open source. Groups like OSDV and OVC have released or will soon release voting system source code under open source licenses.

I wrote a paper ages ago (2006) on the use of open and disclosed source code for voting systems and I’m surprised at how well that analysis and set of recommendations has held up (the original paper is here, an updated version is in pages 11–41 of my PhD thesis).

The purpose of my post here is to highlight one point of that paper in a bit of detail: disclosed source software licenses need to have a few specific features to be useful to potential voting system evaluators. I’ll start by describing three examples of disclosed source software licenses and then talk about what I’d like to see, as a tinkerer, in these agreements.

Soghoian: 8 Million Reasons for Real Surveillance Oversight

If you’re interested at all in surveillance policy, go and read Chris Soghoian’s long and impassioned post today. Chris drops several bombshells into the debate, including an audio recording of a closed-door talk by Sprint/NexTel’s Electronic Surveillance Manager, bragging about how easy the company has made it for law enforcement to get customers’ location data — so easy that the company has serviced over eight million law enforcement requests for customer location data.

Here’s the juiciest quote:

“[M]y major concern is the volume of requests. We have a lot of things that are automated but that’s just scratching the surface. One of the things, like with our GPS tool. We turned it on the web interface for law enforcement about one year ago last month, and we just passed 8 million requests. So there is no way on earth my team could have handled 8 million requests from law enforcement, just for GPS alone. So the tool has just really caught on fire with law enforcement. They also love that it is extremely inexpensive to operate and easy, so, just the sheer volume of requests they anticipate us automating other features, and I just don’t know how we’ll handle the millions and millions of requests that are going to come in.

— Paul Taylor, Electronic Surveillance Manager, Sprint Nextel.

Chris has more quotes, facts, and figures, all implying that electronic surveillance is much more widespread that many of us had thought.

Probably, many of these surveillance requests were justified, in the sense that a fair-minded citizen would think their expected public benefit justified the intrusiveness. How many were justified, we don’t know. We can’t know — and that’s a big part of the problem.

It’s deeply troubling that this has happened without significant public debate or even much disclosure. We need to have a discussion — and quickly — about what the rules for electronic surveillance should be.

Tech Policy in the SkyMall Catalog

These days tech policy issues seem to pop up everywhere. During a recent flight delay, I was flipping through the SkyMall Catalog (“Holiday 2009” edition), and found tech policy even there.

There were lots of ads for surveillance and recording devices, some of them clearly useful for illegal purposes: the New Agent Cam HD Color Video Spy Camera (p. 14), the Original Agent Cam Color Video Spy Camera (p. 14), the Video Recording Sunglasses (p. 23), the Wireless Remote Controlled Pan and Tilt Surveillance Camera (p. 23), the Spy Pen (with hidden audio and video recorders, p. 42), the Orbitor Electronic Listening Device, and the GPS Tracking Key (p. 224)

There were also plenty of ads for media-copying technologies, of the sort that various copyright owners might find objectionable: the LP and Cassette to CD Recorder (p. 16), the Slide and Negative to Digital Picture Converter (p. 17), the Digital Photo to DVD Converter (p. 20), the Easy Ipod Media Sharer (p. 27), the One Step DVD/CD Duplicator (p. 31), the Photograph to Digital Picture Converter (p. 40), and the Crosley Encoding Turntable (converts LP records to MP3s, p. 179).

Are these things illegal? Probably not, I guess, but there are surely people out there who would want to make them illegal. And some of them are pretty good ways to get a tech policy debate started.

I’m not about to start reading the SkyMall catalog for fun. But it’s interesting to know that it offers more than just Slankets and yeti statues.

Advice on stepping up to a better digital camera

This is a bit off from the usual Freedom to Tinker post, but with tomorrow being “Black Friday” and retailers offering some steep discounts on consumer electronics, many Tinker readers will be out there buying gear or will be offering buying advice to their friends.

Over the past several months, several friends of mine have mentioned that they were considering “moving up” to a D-SLR camera and asked me for advice. I’ve been what you might term a “serious amateur” photographer since high school, when I was the head photographer for the school yearbook and newspaper. (It was a non-trivial issue for me to decide whether to make my career in photography or in computers.)

To address this, I wrote a guide to upgrading your digital camera. I’ve written this for a non-technical audience. Pass it around and enjoy.

Inaccurate Copyright Enforcement: Questionable "best" practices and BitTorrent specification flaws

[Today we welcome my Princeton Computer Science colleague Mike Freedman. Mike’s research areas include computer systems, network software, and security. He writes a technical blog about these topics at Princeton S* Network Systems — required reading for serious systems geeks like me. — Ed Felten]

In the past few weeks, Ed has been writing about targeted and inaccurate copyright enforcement. While it may be difficult to quantify the actual extent of inaccurate claims, we can at least try to understand whether copyright enforcement companies are making a “good faith” best effort to minimize any false positives. My short answer: not really.

Let’s start with a typical abuse letter that gets sent to a network provider (in this case, a university) from a “copyright enforcement” company such as the Video Protection Alliance.

This notice is intended solely for the primary Massachusetts Institute of Technology internet service account holder. Someone using this account has engaged in illegal copying or distribution (downloading or uploading) of …

Evidence:
Infringement Source: BitTorrent
Infringement Timestamp: 2009-08-28 09:33:20 PST
Infringers IP Address: 128.31.1.13
Infringers Port: 40951

The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner, its agent, or by operation of law. We swear under penalty of perjury, that we are authorized to act on behalf of DISCOUNT VIDEO CENTER INC..

You and everyone using this computer must immediately and permanently cease and desist the unauthorized copying and/or distribution (including, but not limited to, downloading, uploading, file sharing, file ‘swapping’ or other similar activities) of the videos and/or other content owned by DISCOUNT VIDEO CENTER INC., including, but is not limited to, the copyrighted material listed above.

DISCOUNT VIDEO CENTER INC. is prepared to pursue every available remedy including damages, recovery of attorney’s fees, costs and any and all other claims that may be available to it in a lawsuit filed against you.

While DISCOUNT VIDEO CENTER INC. is entitled to monetary damages, attorneys’ fees and court costs from the infringing party under 17 U.S.C. 504, DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above. To access this settlement offer, please follow the directions below.

Settlement Offer: To access your settlement offer please copy and paste the address below into a browser and follow the instructions:

https://www.videoprotectionalliance.com/?n_id=AB-XXXXXX
Password: XXXXXXX

In other words: we have a record of you (supposedly) uploading and downloading BitTorrent content. That content is copyrighted. We could pursue costly and painful litigation, but if you want us to just go away, you can pay us now.

Now, any type of IP-based identification is not going to be perfect, especially given the wide-spread use of Network Address Translation (NAT) boxes and open WiFi at homes. Especially in dense urban areas, unapproved third parties might use their neighbor’s wireless network for Internet access, potentially leading to the wrong homeowner being blamed. And IP-based identification relies on accurate ISP mappings from IP addresses to users, as these mappings change over time (although typically slowly) given dynamic address assignment (i.e., DHCP). But one could rightly claim that such sources of false positives are rare in practice and that a enforcement company is still making a best effort to accurately identify IP addresses engaging in copyright-infringing file sharing.

So what’s a reasonable strategy to identify such infringing behavior?

Let me first give a high-level overview of how BitTorrent works. To download a particular file on BitTorrent, a client first needs to discover a set of other peers that have the file. Earlier peer-to-peer systems like Napster, Gnutella, and KaZaA had peers connect to one another somewhat randomly (or, in Napster, through a more centralized directory service). These peers would then broadcast search requests for files, downloading the content directly from those peers that responded as having matching files. In the basic BitTorrent architecture, on the other hand, the global ecosystem is split into distinct groups of users that are all trying to download a particular file. Each such group—known as a swarm—is managed by a centralized server called a tracker. The tracker keeps a list of the swarm’s peers and, for each peer, a bit-vector of which file blocks it already has. When a client joins a swarm by announcing itself to the tracker, it gets a list of other peers, and it subsequently attempts to connect to them and download file blocks. How a client discovers a particular swarm is outside the scope of the system, but there are plenty of BitTorrent search engines that allow clients to perform keyword searches. These searches return .torrent files, which includes high-level meta-data about a particular swarm, including the URL(s) at which its tracker(s) can be accessed.

So there are three phases to downloading content from BitTorrent:

  1. Finding a .torrent meta-data file
  2. Registering with the .torrent’s tracker and getting a list of peer addresses
  3. Connecting to a peer, swapping the bit-vector of which file blocks each has, and potentially downloading or uploading needed blocks

Unfortunately, the verification that copyright enforcement agencies such as the VPA use stops at #2. That is, if some random BitTorrent tracker lists your IP address as being part of a swarm, then the VPA considers this to be sufficient proof to warrant a DMCA takedown notice (such as the one above), with clear instructions on how to pay a monetary settlement. Now, a very reasonable question is whether such information should indeed constitute proof.

Last year, researchers at the University of Washington published a paper with the subtitle Why My Printer Received a DMCA Takedown Notice. Their conclusions were that:

  • Practically any Internet user can be framed for copyright infringement today.
  • Even without being explicitly framed, innocent users may still receive complaints.

The title came from the fact that they “registered” the IP address of a networked printer with BitTorrent trackers, and they subsequently received 9 DMCA takedown notices claiming that their printer was engaging in illegal file sharing. (They did not, however, receive any pre-settlement offers such as the one above, which suggests a possible escalation of enforcement techniques since then.)

I have had my own repeated experiences with such false claims. This September, for instance, a research system I operate called CoralCDN received approximately 100 pre-settlement letters, including the one above. A little background: CoralCDN is an open, free, self-organizing content distribution network (CDN). CDNs are widely used by commercial high-volume websites to scalably deliver their content, such as Hulu’s use of Akamai or CNN’s use of Level 3. CoralCDN was designed to help solve the Slashdot effect, which is when portals such as slashdot.org link to underprovisioned third-party sites and cause that site to become quickly overwhelmed by the unexpected surge of resulting traffic. CoralCDN’s answer was to provide an open CDN that would cache and serve any URL that was requested from it. To use CoralCDN, one simply appends a suffix to a URL’s hostname, i.e., http://www.cnn.com/ becomes http://www.cnn.com.nyud.net/. CoralCDN’s been running on PlanetLab—a distributed research testbed of virtualized servers, spread over several hundred universities worldwide—since March 2004. It handles requests from about 2 million users per day.

Because CoralCDN provides an open platform, one can access any URL through it via an HTTP GET request (with the exception of a small number of blacklisted domains and those for content larger than 50MB). Thus, requests to BitTorrent trackers can also use CoralCDN, as these are simply HTTP GETs with a client’s relevant information encoded in the tracker URL’s query string, e.g., http://denis.stalker.h3q.com.6969.nyud.net/announce?info_hash=(hash)&peer_id=(name)&port=52864&uploaded=231374848&downloaded=2227372596&left=0&corrupt=0&key=E0591124&numwant=200&compact=1&no_peer_id=1.

Notice that the HTTP request includes a peer’s unique name (a long random string) and a port number, but notably does not include an IP address for that client. It’s an optional parameter in the specification that many BitTorrent clients don’t include. (In fact, even if the request includes this IP parameter, some trackers ignore it.) Instead, the tracker records the network-level IP address from where the HTTP request originated (the other end of the TCP connection), together with the supplied port, as the peer’s network address.

When this request is via an HTTP proxy, things go wrong. Here, the BitTorrent client is connecting to an HTTP proxy, which in turn is connecting to the tracker. So this practice results in the tracker recording an unusable address: the combination of the proxy’s IP and the client’s port. Needless to say, the proxy isn’t running BitTorrent, let alone on that particular (often randomized) port. Not only does this design damage the client’s BitTorrent experience—other clients won’t initiate communication with it, leading to fewer opportunities for “tit-for-tat” data exchanges—but this also damages the entire swarm’s performance: Others’ requests to this hybrid address will all fail (typically with an RST response to the TCP connection request). I was rather surprised to find this flaw in the BitTorrent specification.

So how is this related to CoralCDN and the VPA? For whatever reason, some publisher started including a Coralized URL for the tracker’s location, as shown above (http://denis.stalker.h3q.com.6969.nyud.net/). I could only surmise why this was done: perhaps on the (mistaken) assumption that it would reduce load on the server, or perhaps in the hope of offloading abuse complaints to CoralCDN servers. The latter might have been useful if copyright enforcement agencies were going after the trackers, instead of the participating peers. In fact, we initially thought this was the case when these pre-settlement letters from the VPA started rolling in. More careful analysis, however, exposed the above problem: when the BitTorrent URL was Coralized, peers’ requests to the tracker were issued via CoralCDN HTTP proxies. Thus, the tracker built up a list of peer addresses of the form (CoralCDN IP : peer port), where these CoralCDN IPs correspond to PlanetLab servers located at various universities.

Hence, when the VPA began sending out pre-settlement letters claiming infringement, they sent them to network operators at tens of universities, who turned around and forwarded them to PlanetLab’s central operations and me.

What is particularly striking about this case, however, is that these reports were demonstrably false! There was no BitTorrent client running at the specified address (in the above letter, 128.31.1.13:40951), for precisely the reasons I discuss. Thus, we can fairly definitively conclude that the VPA never actually tested the peer for actual infringement: not even by trying to connect to the client’s address, let alone determining whether the client was actually uploading or download any data, and let alone valid data corresponding to the copyrighted file in question.

This begs the question as to what should be required for a company to issue a DMCA notification and pre-settlement letters that assert:

Someone using this account has engaged in illegal copying or distribution (downloading or uploading)…The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner.

Of course, the incentives for the VPA to actually ensure that “this notification is accurate” are pretty clear. The cost of a false positive is currently nothing, and perhaps some innocent users will even “buy protection” to make this problem and the threat of costly litigation go away.

DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above.

It appears that the VPA and other such agencies have been rather effective at getting some settlement money. Our personal experience with DMCA takedown notices is that network operators are suitably afraid of litigation. Many will pull network access from machines as soon as a complaint is received, without any further verification or demonstrative network logs. In fact, many operators also sought “proof” that we weren’t running BitTorrent or engaging in file sharing before they were willing to restore access. We’ll leave the discussion about how we might prove such a negative to another day, but one can point to the chilling effect that such notices have had, when users are immediately considered guilty and must prove their innocence.

I am not arguing that copyright owners should not be able to take reasonable steps to protect their copyrighted material. I am arguing, however, that they should take similarly reasonable steps to ensure that any claimed infringement actually took place. When DMCA notices are accompanied by oaths under “penalty of perjury” and these claims are accepted as writ, as they have de facto become, there should some downside for agencies that demonstrably do not act in “good faith” to verify infringement. Even a simple TCP connection attempt would have been enough to dispel their flawed assumptions. That currently seems to be too much to ask.

Update (Dec 15): A follow-up post can be found here.