June 27, 2017

Erroneous DMCA notices and copyright enforcement, part deux

A few weeks ago, I wrote about a deluge of DMCA notices and pre-settlement letters that CoralCDN experienced in late August. This article actually received a bit of press, including MediaPost, ArsTechnica, TechDirt, and, very recently, Slashdot. I’m glad that my own experience was able to shed some light on the more insidious practices that are still going on under the umbrella of copyright enforcement. More transparency is especially important at this time, given the current debate over the Anti-Counterfeiting Trade Agreement.

Given this discussion, I wanted to write a short follow-on to my previous post.

The VPA drops Nexicon

First and foremost, I was contacted by the founder of the Video Protection Alliance not long after this story broke. I was informed that the VPA has not actually developed its own technology to discover users who are actively uploading or downloading copyrighted material, but rather contracts out this role to Nexicon. (You can find a comment from Nexicon’s CTO to my previous article here.) As I was told, the VPA was contracted by certain content publishers to help reduce copyright infringement of (largely adult) content. The VPA in turn contracted Nexicon to find IP addresses that are participating in BitTorrent swarms of those specified movies. Using the IP addresses given them by Nexicon, the VPA subsequently would send pre-settlement letters to the network providers of those addresses.

The VPA’s founder also assured me that their main goal was to reduce infringement, as opposed to collecting pre-settlement money. (And that users had been let off with only a warning, or, in the cases where infringement might have been due to an open wireless network, informed how to secure their wireless network.) He also expressed surprise that there were false positives in the addresses given to them (beyond said open wireless), especially to the extent that appropriate verification was lacking. Given this new knowledge, he stated that the VPA dropped their use of Nexicon’s technology.

BitTorrent and Proxies

Second, I should clarify my claims about BitTorrent’s usefulness with an on-path proxy. While it is true that the address registered with the BitTorrent tracker is not usable, peers connecting from behind a proxy can still download content from other addresses learned from the tracker. If their requests to those addresses are optimistically unchoked, they have the opportunity to even engage in incentivized bilateral exchange. Furthermore, the use of DHT- and gossip-based discovery with other peers—the latter is termed PEX, for Peer EXchange, in BitTorrent—allows their real address to be learned by others. Thus, through these more modern discovery means, other peers may initiate connections to them, further increasing the opportunity for tit-for-tat exchanges.

Some readers also pointed out that there is good reason why BitTorrent trackers do not just accept any IP address communicated to it via an HTTP query string, but rather use the end-point IP address of the TCP connection. Namely, any HTTP query parameter can be spoofed, leading to anybody being able to add another’s IP address to the tracker list. That would make them susceptible to receiving DMCA complaints, just we experienced with CoralCDN. From a more technical perspective, their machine would also start receiving unsolicited TCP connection requests from other BitTorrent peers, an easy DoS amplification attack.

That said, there are some additional checks that BitTorrent trackers could do. For example, if the IP query string or X-Forwarded-For HTTP headers are present, only add the network IP address if it matches the query string or X-Forwarded-For headers. Additionally, some BitTorrent tracker operators have mentioned that they have certain IP addresses whitelisted as trusted proxies; in those cases, the X-Forwarded-For address is used already. Otherwise, I don’t see a good reason (plausible deniability aside) for recording an IP address that is known to be likely incorrect.

Best Practices for Online Technical Copyright Enforcement

Finally, my article pointed out a strategy that I clearly thought was insufficient for copyright enforcement: simply crawling a BitTorrent tracker for a list of registered IP addresses, and issuing a infringement notice to each IP address. I’ll add to that two other approaches that I think are either insufficient, unethical, or illegal—or all three—yet have been bandied about as possible solutions.

  • Wiretapping: It has been suggested that network providers can perform deep-packet inspection (DPI) on their customer’s traffic in order to detect copyrighted content. This approach probably breaks a number of laws (either in the U.S. or elsewhere), creates a dangerous precedent and existing infrastructure for far-flung Internet surveillance, and yet is of dubious benefit given the move to encrypted communication by file-sharing software.
  • Spyware: By surreptitiously installing spyware/malware on end-hosts, one could scan a user’s local disk in order to detect the existence of potentially copyrighted material. This practice has even worse legal and ethical implications than network-level wiretapping, and yet politicians such as Senator Orrin Hatch (Utah) have gone as far as declaring that infringers’ computers should be destroyed. And it opens users up to the real danger that their computers or information could be misused by others; witness, for example, the security weaknesses of China’s Green Dam software.

So, if one starts from the position that copyrights are valid and should be enforceable—some dispute this—what would you like to see as best practices for copyright enforcement?

The approach taken by DRM is to try to build a technical framework that restricts users’ ability to share content or to consume it in a proscribed manner. But DRM has been largely disliked by end-users, mostly in the way it creates a poor user experience and interferes with expected rights (under fair-use doctrine). But DRM is a misleading argument, as copyright infringement notices are needed precisely after “unprotected” content has already flown the coop.

So I’ll start with two properties that I would want all enforcement agencies to take when issuing DMCA take-down notices. Let’s restrict this consideration to complaints about “whole” content (e.g., entire movies), as opposed to those DMCA challenges over sampled or remixed content, which is a legal debate.

  • For any end client suspected of file-sharing, one MUST verify that the client was actually uploading or downloading content, AND that the content corresponded to a valid portion of a copyrighted file. In BitTorrent, this might be that the client sends or receives a complete file block, and that the file block hashes to the correct value specified in the .torrent file.
  • When issuing a DMCA take-down notice, the request MUST be accompanied by logged information that shows (a) the client’s IP:port network address engaged in content transfer (e.g., a record of a TCP flow); (b) the actual application request/response that was acted upon (e.g., BitTorrent-level logs); and (c) that the transferred content corresponds to a valid file block (e.g., a BitTorrent hash).

So my question to the readers: What would you add to or remove from this list? With what other approaches do you think copyright enforcement should be performed or incentivized?

Inaccurate Copyright Enforcement: Questionable "best" practices and BitTorrent specification flaws

[Today we welcome my Princeton Computer Science colleague Mike Freedman. Mike’s research areas include computer systems, network software, and security. He writes a technical blog about these topics at Princeton S* Network Systems — required reading for serious systems geeks like me. — Ed Felten]

In the past few weeks, Ed has been writing about targeted and inaccurate copyright enforcement. While it may be difficult to quantify the actual extent of inaccurate claims, we can at least try to understand whether copyright enforcement companies are making a “good faith” best effort to minimize any false positives. My short answer: not really.

Let’s start with a typical abuse letter that gets sent to a network provider (in this case, a university) from a “copyright enforcement” company such as the Video Protection Alliance.

This notice is intended solely for the primary Massachusetts Institute of Technology internet service account holder. Someone using this account has engaged in illegal copying or distribution (downloading or uploading) of …

Evidence:
Infringement Source: BitTorrent
Infringement Timestamp: 2009-08-28 09:33:20 PST
Infringers IP Address: 128.31.1.13
Infringers Port: 40951

The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner, its agent, or by operation of law. We swear under penalty of perjury, that we are authorized to act on behalf of DISCOUNT VIDEO CENTER INC..

You and everyone using this computer must immediately and permanently cease and desist the unauthorized copying and/or distribution (including, but not limited to, downloading, uploading, file sharing, file ‘swapping’ or other similar activities) of the videos and/or other content owned by DISCOUNT VIDEO CENTER INC., including, but is not limited to, the copyrighted material listed above.

DISCOUNT VIDEO CENTER INC. is prepared to pursue every available remedy including damages, recovery of attorney’s fees, costs and any and all other claims that may be available to it in a lawsuit filed against you.

While DISCOUNT VIDEO CENTER INC. is entitled to monetary damages, attorneys’ fees and court costs from the infringing party under 17 U.S.C. 504, DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above. To access this settlement offer, please follow the directions below.

Settlement Offer: To access your settlement offer please copy and paste the address below into a browser and follow the instructions:

https://www.videoprotectionalliance.com/?n_id=AB-XXXXXX
Password: XXXXXXX

In other words: we have a record of you (supposedly) uploading and downloading BitTorrent content. That content is copyrighted. We could pursue costly and painful litigation, but if you want us to just go away, you can pay us now.

Now, any type of IP-based identification is not going to be perfect, especially given the wide-spread use of Network Address Translation (NAT) boxes and open WiFi at homes. Especially in dense urban areas, unapproved third parties might use their neighbor’s wireless network for Internet access, potentially leading to the wrong homeowner being blamed. And IP-based identification relies on accurate ISP mappings from IP addresses to users, as these mappings change over time (although typically slowly) given dynamic address assignment (i.e., DHCP). But one could rightly claim that such sources of false positives are rare in practice and that a enforcement company is still making a best effort to accurately identify IP addresses engaging in copyright-infringing file sharing.

So what’s a reasonable strategy to identify such infringing behavior?

Let me first give a high-level overview of how BitTorrent works. To download a particular file on BitTorrent, a client first needs to discover a set of other peers that have the file. Earlier peer-to-peer systems like Napster, Gnutella, and KaZaA had peers connect to one another somewhat randomly (or, in Napster, through a more centralized directory service). These peers would then broadcast search requests for files, downloading the content directly from those peers that responded as having matching files. In the basic BitTorrent architecture, on the other hand, the global ecosystem is split into distinct groups of users that are all trying to download a particular file. Each such group—known as a swarm—is managed by a centralized server called a tracker. The tracker keeps a list of the swarm’s peers and, for each peer, a bit-vector of which file blocks it already has. When a client joins a swarm by announcing itself to the tracker, it gets a list of other peers, and it subsequently attempts to connect to them and download file blocks. How a client discovers a particular swarm is outside the scope of the system, but there are plenty of BitTorrent search engines that allow clients to perform keyword searches. These searches return .torrent files, which includes high-level meta-data about a particular swarm, including the URL(s) at which its tracker(s) can be accessed.

So there are three phases to downloading content from BitTorrent:

  1. Finding a .torrent meta-data file
  2. Registering with the .torrent’s tracker and getting a list of peer addresses
  3. Connecting to a peer, swapping the bit-vector of which file blocks each has, and potentially downloading or uploading needed blocks

Unfortunately, the verification that copyright enforcement agencies such as the VPA use stops at #2. That is, if some random BitTorrent tracker lists your IP address as being part of a swarm, then the VPA considers this to be sufficient proof to warrant a DMCA takedown notice (such as the one above), with clear instructions on how to pay a monetary settlement. Now, a very reasonable question is whether such information should indeed constitute proof.

Last year, researchers at the University of Washington published a paper with the subtitle Why My Printer Received a DMCA Takedown Notice. Their conclusions were that:

  • Practically any Internet user can be framed for copyright infringement today.
  • Even without being explicitly framed, innocent users may still receive complaints.

The title came from the fact that they “registered” the IP address of a networked printer with BitTorrent trackers, and they subsequently received 9 DMCA takedown notices claiming that their printer was engaging in illegal file sharing. (They did not, however, receive any pre-settlement offers such as the one above, which suggests a possible escalation of enforcement techniques since then.)

I have had my own repeated experiences with such false claims. This September, for instance, a research system I operate called CoralCDN received approximately 100 pre-settlement letters, including the one above. A little background: CoralCDN is an open, free, self-organizing content distribution network (CDN). CDNs are widely used by commercial high-volume websites to scalably deliver their content, such as Hulu’s use of Akamai or CNN’s use of Level 3. CoralCDN was designed to help solve the Slashdot effect, which is when portals such as slashdot.org link to underprovisioned third-party sites and cause that site to become quickly overwhelmed by the unexpected surge of resulting traffic. CoralCDN’s answer was to provide an open CDN that would cache and serve any URL that was requested from it. To use CoralCDN, one simply appends a suffix to a URL’s hostname, i.e., http://www.cnn.com/ becomes http://www.cnn.com.nyud.net/. CoralCDN’s been running on PlanetLab—a distributed research testbed of virtualized servers, spread over several hundred universities worldwide—since March 2004. It handles requests from about 2 million users per day.

Because CoralCDN provides an open platform, one can access any URL through it via an HTTP GET request (with the exception of a small number of blacklisted domains and those for content larger than 50MB). Thus, requests to BitTorrent trackers can also use CoralCDN, as these are simply HTTP GETs with a client’s relevant information encoded in the tracker URL’s query string, e.g., http://denis.stalker.h3q.com.6969.nyud.net/announce?info_hash=(hash)&peer_id=(name)&port=52864&uploaded=231374848&downloaded=2227372596&left=0&corrupt=0&key=E0591124&numwant=200&compact=1&no_peer_id=1.

Notice that the HTTP request includes a peer’s unique name (a long random string) and a port number, but notably does not include an IP address for that client. It’s an optional parameter in the specification that many BitTorrent clients don’t include. (In fact, even if the request includes this IP parameter, some trackers ignore it.) Instead, the tracker records the network-level IP address from where the HTTP request originated (the other end of the TCP connection), together with the supplied port, as the peer’s network address.

When this request is via an HTTP proxy, things go wrong. Here, the BitTorrent client is connecting to an HTTP proxy, which in turn is connecting to the tracker. So this practice results in the tracker recording an unusable address: the combination of the proxy’s IP and the client’s port. Needless to say, the proxy isn’t running BitTorrent, let alone on that particular (often randomized) port. Not only does this design damage the client’s BitTorrent experience—other clients won’t initiate communication with it, leading to fewer opportunities for “tit-for-tat” data exchanges—but this also damages the entire swarm’s performance: Others’ requests to this hybrid address will all fail (typically with an RST response to the TCP connection request). I was rather surprised to find this flaw in the BitTorrent specification.

So how is this related to CoralCDN and the VPA? For whatever reason, some publisher started including a Coralized URL for the tracker’s location, as shown above (http://denis.stalker.h3q.com.6969.nyud.net/). I could only surmise why this was done: perhaps on the (mistaken) assumption that it would reduce load on the server, or perhaps in the hope of offloading abuse complaints to CoralCDN servers. The latter might have been useful if copyright enforcement agencies were going after the trackers, instead of the participating peers. In fact, we initially thought this was the case when these pre-settlement letters from the VPA started rolling in. More careful analysis, however, exposed the above problem: when the BitTorrent URL was Coralized, peers’ requests to the tracker were issued via CoralCDN HTTP proxies. Thus, the tracker built up a list of peer addresses of the form (CoralCDN IP : peer port), where these CoralCDN IPs correspond to PlanetLab servers located at various universities.

Hence, when the VPA began sending out pre-settlement letters claiming infringement, they sent them to network operators at tens of universities, who turned around and forwarded them to PlanetLab’s central operations and me.

What is particularly striking about this case, however, is that these reports were demonstrably false! There was no BitTorrent client running at the specified address (in the above letter, 128.31.1.13:40951), for precisely the reasons I discuss. Thus, we can fairly definitively conclude that the VPA never actually tested the peer for actual infringement: not even by trying to connect to the client’s address, let alone determining whether the client was actually uploading or download any data, and let alone valid data corresponding to the copyrighted file in question.

This begs the question as to what should be required for a company to issue a DMCA notification and pre-settlement letters that assert:

Someone using this account has engaged in illegal copying or distribution (downloading or uploading)…The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner.

Of course, the incentives for the VPA to actually ensure that “this notification is accurate” are pretty clear. The cost of a false positive is currently nothing, and perhaps some innocent users will even “buy protection” to make this problem and the threat of costly litigation go away.

DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above.

It appears that the VPA and other such agencies have been rather effective at getting some settlement money. Our personal experience with DMCA takedown notices is that network operators are suitably afraid of litigation. Many will pull network access from machines as soon as a complaint is received, without any further verification or demonstrative network logs. In fact, many operators also sought “proof” that we weren’t running BitTorrent or engaging in file sharing before they were willing to restore access. We’ll leave the discussion about how we might prove such a negative to another day, but one can point to the chilling effect that such notices have had, when users are immediately considered guilty and must prove their innocence.

I am not arguing that copyright owners should not be able to take reasonable steps to protect their copyrighted material. I am arguing, however, that they should take similarly reasonable steps to ensure that any claimed infringement actually took place. When DMCA notices are accompanied by oaths under “penalty of perjury” and these claims are accepted as writ, as they have de facto become, there should some downside for agencies that demonstrably do not act in “good faith” to verify infringement. Even a simple TCP connection attempt would have been enough to dispel their flawed assumptions. That currently seems to be too much to ask.

Update (Dec 15): A follow-up post can be found here.