March 19, 2024

Archives for November 2009

Tech Policy in the SkyMall Catalog

These days tech policy issues seem to pop up everywhere. During a recent flight delay, I was flipping through the SkyMall Catalog (“Holiday 2009” edition), and found tech policy even there.

There were lots of ads for surveillance and recording devices, some of them clearly useful for illegal purposes: the New Agent Cam HD Color Video Spy Camera (p. 14), the Original Agent Cam Color Video Spy Camera (p. 14), the Video Recording Sunglasses (p. 23), the Wireless Remote Controlled Pan and Tilt Surveillance Camera (p. 23), the Spy Pen (with hidden audio and video recorders, p. 42), the Orbitor Electronic Listening Device, and the GPS Tracking Key (p. 224)

There were also plenty of ads for media-copying technologies, of the sort that various copyright owners might find objectionable: the LP and Cassette to CD Recorder (p. 16), the Slide and Negative to Digital Picture Converter (p. 17), the Digital Photo to DVD Converter (p. 20), the Easy Ipod Media Sharer (p. 27), the One Step DVD/CD Duplicator (p. 31), the Photograph to Digital Picture Converter (p. 40), and the Crosley Encoding Turntable (converts LP records to MP3s, p. 179).

Are these things illegal? Probably not, I guess, but there are surely people out there who would want to make them illegal. And some of them are pretty good ways to get a tech policy debate started.

I’m not about to start reading the SkyMall catalog for fun. But it’s interesting to know that it offers more than just Slankets and yeti statues.

Advice on stepping up to a better digital camera

This is a bit off from the usual Freedom to Tinker post, but with tomorrow being “Black Friday” and retailers offering some steep discounts on consumer electronics, many Tinker readers will be out there buying gear or will be offering buying advice to their friends.

Over the past several months, several friends of mine have mentioned that they were considering “moving up” to a D-SLR camera and asked me for advice. I’ve been what you might term a “serious amateur” photographer since high school, when I was the head photographer for the school yearbook and newspaper. (It was a non-trivial issue for me to decide whether to make my career in photography or in computers.)

To address this, I wrote a guide to upgrading your digital camera. I’ve written this for a non-technical audience. Pass it around and enjoy.

Inaccurate Copyright Enforcement: Questionable "best" practices and BitTorrent specification flaws

[Today we welcome my Princeton Computer Science colleague Mike Freedman. Mike’s research areas include computer systems, network software, and security. He writes a technical blog about these topics at Princeton S* Network Systems — required reading for serious systems geeks like me. — Ed Felten]

In the past few weeks, Ed has been writing about targeted and inaccurate copyright enforcement. While it may be difficult to quantify the actual extent of inaccurate claims, we can at least try to understand whether copyright enforcement companies are making a “good faith” best effort to minimize any false positives. My short answer: not really.

Let’s start with a typical abuse letter that gets sent to a network provider (in this case, a university) from a “copyright enforcement” company such as the Video Protection Alliance.

This notice is intended solely for the primary Massachusetts Institute of Technology internet service account holder. Someone using this account has engaged in illegal copying or distribution (downloading or uploading) of …

Evidence:
Infringement Source: BitTorrent
Infringement Timestamp: 2009-08-28 09:33:20 PST
Infringers IP Address: 128.31.1.13
Infringers Port: 40951

The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner, its agent, or by operation of law. We swear under penalty of perjury, that we are authorized to act on behalf of DISCOUNT VIDEO CENTER INC..

You and everyone using this computer must immediately and permanently cease and desist the unauthorized copying and/or distribution (including, but not limited to, downloading, uploading, file sharing, file ‘swapping’ or other similar activities) of the videos and/or other content owned by DISCOUNT VIDEO CENTER INC., including, but is not limited to, the copyrighted material listed above.

DISCOUNT VIDEO CENTER INC. is prepared to pursue every available remedy including damages, recovery of attorney’s fees, costs and any and all other claims that may be available to it in a lawsuit filed against you.

While DISCOUNT VIDEO CENTER INC. is entitled to monetary damages, attorneys’ fees and court costs from the infringing party under 17 U.S.C. 504, DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above. To access this settlement offer, please follow the directions below.

Settlement Offer: To access your settlement offer please copy and paste the address below into a browser and follow the instructions:

https://www.videoprotectionalliance.com/?n_id=AB-XXXXXX
Password: XXXXXXX

In other words: we have a record of you (supposedly) uploading and downloading BitTorrent content. That content is copyrighted. We could pursue costly and painful litigation, but if you want us to just go away, you can pay us now.

Now, any type of IP-based identification is not going to be perfect, especially given the wide-spread use of Network Address Translation (NAT) boxes and open WiFi at homes. Especially in dense urban areas, unapproved third parties might use their neighbor’s wireless network for Internet access, potentially leading to the wrong homeowner being blamed. And IP-based identification relies on accurate ISP mappings from IP addresses to users, as these mappings change over time (although typically slowly) given dynamic address assignment (i.e., DHCP). But one could rightly claim that such sources of false positives are rare in practice and that a enforcement company is still making a best effort to accurately identify IP addresses engaging in copyright-infringing file sharing.

So what’s a reasonable strategy to identify such infringing behavior?

Let me first give a high-level overview of how BitTorrent works. To download a particular file on BitTorrent, a client first needs to discover a set of other peers that have the file. Earlier peer-to-peer systems like Napster, Gnutella, and KaZaA had peers connect to one another somewhat randomly (or, in Napster, through a more centralized directory service). These peers would then broadcast search requests for files, downloading the content directly from those peers that responded as having matching files. In the basic BitTorrent architecture, on the other hand, the global ecosystem is split into distinct groups of users that are all trying to download a particular file. Each such group—known as a swarm—is managed by a centralized server called a tracker. The tracker keeps a list of the swarm’s peers and, for each peer, a bit-vector of which file blocks it already has. When a client joins a swarm by announcing itself to the tracker, it gets a list of other peers, and it subsequently attempts to connect to them and download file blocks. How a client discovers a particular swarm is outside the scope of the system, but there are plenty of BitTorrent search engines that allow clients to perform keyword searches. These searches return .torrent files, which includes high-level meta-data about a particular swarm, including the URL(s) at which its tracker(s) can be accessed.

So there are three phases to downloading content from BitTorrent:

  1. Finding a .torrent meta-data file
  2. Registering with the .torrent’s tracker and getting a list of peer addresses
  3. Connecting to a peer, swapping the bit-vector of which file blocks each has, and potentially downloading or uploading needed blocks

Unfortunately, the verification that copyright enforcement agencies such as the VPA use stops at #2. That is, if some random BitTorrent tracker lists your IP address as being part of a swarm, then the VPA considers this to be sufficient proof to warrant a DMCA takedown notice (such as the one above), with clear instructions on how to pay a monetary settlement. Now, a very reasonable question is whether such information should indeed constitute proof.

Last year, researchers at the University of Washington published a paper with the subtitle Why My Printer Received a DMCA Takedown Notice. Their conclusions were that:

  • Practically any Internet user can be framed for copyright infringement today.
  • Even without being explicitly framed, innocent users may still receive complaints.

The title came from the fact that they “registered” the IP address of a networked printer with BitTorrent trackers, and they subsequently received 9 DMCA takedown notices claiming that their printer was engaging in illegal file sharing. (They did not, however, receive any pre-settlement offers such as the one above, which suggests a possible escalation of enforcement techniques since then.)

I have had my own repeated experiences with such false claims. This September, for instance, a research system I operate called CoralCDN received approximately 100 pre-settlement letters, including the one above. A little background: CoralCDN is an open, free, self-organizing content distribution network (CDN). CDNs are widely used by commercial high-volume websites to scalably deliver their content, such as Hulu’s use of Akamai or CNN’s use of Level 3. CoralCDN was designed to help solve the Slashdot effect, which is when portals such as slashdot.org link to underprovisioned third-party sites and cause that site to become quickly overwhelmed by the unexpected surge of resulting traffic. CoralCDN’s answer was to provide an open CDN that would cache and serve any URL that was requested from it. To use CoralCDN, one simply appends a suffix to a URL’s hostname, i.e., http://www.cnn.com/ becomes http://www.cnn.com.nyud.net/. CoralCDN’s been running on PlanetLab—a distributed research testbed of virtualized servers, spread over several hundred universities worldwide—since March 2004. It handles requests from about 2 million users per day.

Because CoralCDN provides an open platform, one can access any URL through it via an HTTP GET request (with the exception of a small number of blacklisted domains and those for content larger than 50MB). Thus, requests to BitTorrent trackers can also use CoralCDN, as these are simply HTTP GETs with a client’s relevant information encoded in the tracker URL’s query string, e.g., http://denis.stalker.h3q.com.6969.nyud.net/announce?info_hash=(hash)&peer_id=(name)&port=52864&uploaded=231374848&downloaded=2227372596&left=0&corrupt=0&key=E0591124&numwant=200&compact=1&no_peer_id=1.

Notice that the HTTP request includes a peer’s unique name (a long random string) and a port number, but notably does not include an IP address for that client. It’s an optional parameter in the specification that many BitTorrent clients don’t include. (In fact, even if the request includes this IP parameter, some trackers ignore it.) Instead, the tracker records the network-level IP address from where the HTTP request originated (the other end of the TCP connection), together with the supplied port, as the peer’s network address.

When this request is via an HTTP proxy, things go wrong. Here, the BitTorrent client is connecting to an HTTP proxy, which in turn is connecting to the tracker. So this practice results in the tracker recording an unusable address: the combination of the proxy’s IP and the client’s port. Needless to say, the proxy isn’t running BitTorrent, let alone on that particular (often randomized) port. Not only does this design damage the client’s BitTorrent experience—other clients won’t initiate communication with it, leading to fewer opportunities for “tit-for-tat” data exchanges—but this also damages the entire swarm’s performance: Others’ requests to this hybrid address will all fail (typically with an RST response to the TCP connection request). I was rather surprised to find this flaw in the BitTorrent specification.

So how is this related to CoralCDN and the VPA? For whatever reason, some publisher started including a Coralized URL for the tracker’s location, as shown above (http://denis.stalker.h3q.com.6969.nyud.net/). I could only surmise why this was done: perhaps on the (mistaken) assumption that it would reduce load on the server, or perhaps in the hope of offloading abuse complaints to CoralCDN servers. The latter might have been useful if copyright enforcement agencies were going after the trackers, instead of the participating peers. In fact, we initially thought this was the case when these pre-settlement letters from the VPA started rolling in. More careful analysis, however, exposed the above problem: when the BitTorrent URL was Coralized, peers’ requests to the tracker were issued via CoralCDN HTTP proxies. Thus, the tracker built up a list of peer addresses of the form (CoralCDN IP : peer port), where these CoralCDN IPs correspond to PlanetLab servers located at various universities.

Hence, when the VPA began sending out pre-settlement letters claiming infringement, they sent them to network operators at tens of universities, who turned around and forwarded them to PlanetLab’s central operations and me.

What is particularly striking about this case, however, is that these reports were demonstrably false! There was no BitTorrent client running at the specified address (in the above letter, 128.31.1.13:40951), for precisely the reasons I discuss. Thus, we can fairly definitively conclude that the VPA never actually tested the peer for actual infringement: not even by trying to connect to the client’s address, let alone determining whether the client was actually uploading or download any data, and let alone valid data corresponding to the copyrighted file in question.

This begs the question as to what should be required for a company to issue a DMCA notification and pre-settlement letters that assert:

Someone using this account has engaged in illegal copying or distribution (downloading or uploading)…The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner.

Of course, the incentives for the VPA to actually ensure that “this notification is accurate” are pretty clear. The cost of a false positive is currently nothing, and perhaps some innocent users will even “buy protection” to make this problem and the threat of costly litigation go away.

DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above.

It appears that the VPA and other such agencies have been rather effective at getting some settlement money. Our personal experience with DMCA takedown notices is that network operators are suitably afraid of litigation. Many will pull network access from machines as soon as a complaint is received, without any further verification or demonstrative network logs. In fact, many operators also sought “proof” that we weren’t running BitTorrent or engaging in file sharing before they were willing to restore access. We’ll leave the discussion about how we might prove such a negative to another day, but one can point to the chilling effect that such notices have had, when users are immediately considered guilty and must prove their innocence.

I am not arguing that copyright owners should not be able to take reasonable steps to protect their copyrighted material. I am arguing, however, that they should take similarly reasonable steps to ensure that any claimed infringement actually took place. When DMCA notices are accompanied by oaths under “penalty of perjury” and these claims are accepted as writ, as they have de facto become, there should some downside for agencies that demonstrably do not act in “good faith” to verify infringement. Even a simple TCP connection attempt would have been enough to dispel their flawed assumptions. That currently seems to be too much to ask.

Update (Dec 15): A follow-up post can be found here.

Robots and the Law

Stanford Law School held a panel Thursday on “Legal Challenges in an Age of Robotics“. I happened to be in town so I dropped by and heard an interesting discussion.

Here’s the official announcement:

Once relegated to factories and fiction, robots are rapidly entering the mainstream. Advances in artificial intelligence translate into ever-broadening functionality and autonomy. Recent years have seen an explosion in the use of robotics in warfare, medicine, and exploration. Industry analysts and UN statistics predict equally significant growth in the market for personal or service robotics over the next few years. What unique legal challenges will the widespread availability of sophisticated robots pose? Three panelists with deep and varied expertise discuss the present, near future, and far future of robotics and the law.

The key questions are how robots differ from past technologies, and how those differences change the law and policy issues we face.

Three aspects of robots seemed to recur in the discussion: robots take action that is important in the world; robots act autonomously; and we tend to see robots as beings and not just machines.

The last issue — robots as beings — is mostly a red herring for our purposes, notwithstanding its appeal as a conversational topic. Robots are nowhere near having the rights of a person or even of a sentient animal, and I suspect that we can’t really imagine what it would be like to interact with a robot that qualified as a conscious being. Our brains seem to be wired to treat self-propelled objects as beings — witness the surprising acceptance of robot “dogs” that aren’t much like real dogs — but that doesn’t mean we should grant robots personhood.

So let’s set aside the consciousness issue and focus on the other two: acting in the world, and autonomy. These attributes are already present in many technologies today, even in the purely electronic realm. Consider, for example, the complex of computers, network equipment, and software make up Google’s data centers. Its actions have significant implications in the real world, and it is autonomous, at least in the sense that the panelists seemed to using the term “autonomous” — it exhibits complex behavior without direct, immediate human instruction, and its behavior is often unpredictable even to its makers.

In the end, it seemed to me that the legal and policy issues raised by future robots will not be new in kind, but will just be extrapolations of the issues we’re already facing with today’s complex technologies — and not a far extrapoloation but more of a smooth progression from where we are now. These issues are important, to be sure, and I was glad to hear smart panelists debating them, but I’m not convinced yet that we need a law of the robot. When it comes to the legal challenges of technology, the future will be like the past, only more so.

Still, if talking about robots will get policymakers to pay more attention to important issues in technology policy, then by all means, let’s talk about robots.

Targeted Copyright Enforcement vs. Inaccurate Enforcement

Let’s continue our discussion about copyright enforcement against online infringers. I wrote last time about how targeted enforcement can deter many possible violators even if the enforcer can only punish a few violators. Clever targeting of enforcement can destroy the safety-in-numbers effect that might otherwise shelter a crowd of would-be violators.

In the online copyright context, the implication is that large copyright owners might be able to use lawsuit threats to deter a huge population of would-be infringers, even if they can only manage to sue a few infringers at a time. In my previous post, I floated some ideas for how they might do this.

Today I want to talk about the implications of this. Let’s assume, for the sake of argument, that copyright owners have better deterrence strategies available — strategies that can deter more users, more effectively, than they have managed so far. What would this imply for copyright policy?

The main implication, I think, is to shed doubt on the big copyright owners’ current arguments in favor or broader, less accurate enforcement. These proposed enforcement strategies go by various names, such as “three strikes” and “graduated response”. What defines them is that they reduce the cost of each enforcement action, while at the same time reducing the assurance that the party being punished is actually guilty.

Typically the main source of cost reduction is the elimination of due process for the accused. For example, “three strikes” policies typically cut off someone’s Internet connection if they are accused of infringement three times — the theory being that making three accusations is much cheaper than proving one.

There’s a hidden assumption underlying the case for cheap, inaccurate enforcement: that the only way to deter infringement is to launch a huge number of enforcement actions, so that most of the would-be violators will expect to face enforcement. The main point of my previous post is that this assumption is not necessarily true — that it’s possible, at least in principle, to deter many people with a moderate number of enforcement actions.

Indeed, one of the benefits of an accurate enforcement strategy — a strategy that enforces only against actual violators — is that the better it works, the cheaper it gets. If there are few violators, then few enforcement actions will be needed. A high-compliance, low-enforcement equilibrium is the best outcome for everybody.

Cheap, inaccurate enforcement can’t reach this happy state.

Let’s say there are 100 million users, and you’re using an enforcement strategy that punishes 50% of violators, and 1% of non-violators. If half of the people are violators, you’ll punish 25 million violators, and you’ll punish 500,000 non-violators. That might seem acceptable to you, if the punishments are small. (If you’re disconnecting 500,000 people from modern communications technology, that would be a different story.)

But now suppose that user behavior shifts, so that only 1% of users are violating. Then you’ll be punishing 500,000 violators (50% of the 1,000,000 violators) along with 990,000 non-violators (1% of the 99,000,000 non-violators). Most of the people you’ll be punishing are innocent, which is clearly unacceptable.

Any cheap, inaccurate enforcement scheme will face this dilemma: it can be accurate, or it can be fair, but it can’t be both. The better is works, the more unfair it gets. It can never reach the high-compliance, low-enforcement equilibrium that should be the goal of every enforcement strategy.