June 24, 2017

Inaccurate Copyright Enforcement: Questionable "best" practices and BitTorrent specification flaws

[Today we welcome my Princeton Computer Science colleague Mike Freedman. Mike’s research areas include computer systems, network software, and security. He writes a technical blog about these topics at Princeton S* Network Systems — required reading for serious systems geeks like me. — Ed Felten]

In the past few weeks, Ed has been writing about targeted and inaccurate copyright enforcement. While it may be difficult to quantify the actual extent of inaccurate claims, we can at least try to understand whether copyright enforcement companies are making a “good faith” best effort to minimize any false positives. My short answer: not really.

Let’s start with a typical abuse letter that gets sent to a network provider (in this case, a university) from a “copyright enforcement” company such as the Video Protection Alliance.

This notice is intended solely for the primary Massachusetts Institute of Technology internet service account holder. Someone using this account has engaged in illegal copying or distribution (downloading or uploading) of …

Evidence:
Infringement Source: BitTorrent
Infringement Timestamp: 2009-08-28 09:33:20 PST
Infringers IP Address: 128.31.1.13
Infringers Port: 40951

The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner, its agent, or by operation of law. We swear under penalty of perjury, that we are authorized to act on behalf of DISCOUNT VIDEO CENTER INC..

You and everyone using this computer must immediately and permanently cease and desist the unauthorized copying and/or distribution (including, but not limited to, downloading, uploading, file sharing, file ‘swapping’ or other similar activities) of the videos and/or other content owned by DISCOUNT VIDEO CENTER INC., including, but is not limited to, the copyrighted material listed above.

DISCOUNT VIDEO CENTER INC. is prepared to pursue every available remedy including damages, recovery of attorney’s fees, costs and any and all other claims that may be available to it in a lawsuit filed against you.

While DISCOUNT VIDEO CENTER INC. is entitled to monetary damages, attorneys’ fees and court costs from the infringing party under 17 U.S.C. 504, DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above. To access this settlement offer, please follow the directions below.

Settlement Offer: To access your settlement offer please copy and paste the address below into a browser and follow the instructions:

https://www.videoprotectionalliance.com/?n_id=AB-XXXXXX
Password: XXXXXXX

In other words: we have a record of you (supposedly) uploading and downloading BitTorrent content. That content is copyrighted. We could pursue costly and painful litigation, but if you want us to just go away, you can pay us now.

Now, any type of IP-based identification is not going to be perfect, especially given the wide-spread use of Network Address Translation (NAT) boxes and open WiFi at homes. Especially in dense urban areas, unapproved third parties might use their neighbor’s wireless network for Internet access, potentially leading to the wrong homeowner being blamed. And IP-based identification relies on accurate ISP mappings from IP addresses to users, as these mappings change over time (although typically slowly) given dynamic address assignment (i.e., DHCP). But one could rightly claim that such sources of false positives are rare in practice and that a enforcement company is still making a best effort to accurately identify IP addresses engaging in copyright-infringing file sharing.

So what’s a reasonable strategy to identify such infringing behavior?

Let me first give a high-level overview of how BitTorrent works. To download a particular file on BitTorrent, a client first needs to discover a set of other peers that have the file. Earlier peer-to-peer systems like Napster, Gnutella, and KaZaA had peers connect to one another somewhat randomly (or, in Napster, through a more centralized directory service). These peers would then broadcast search requests for files, downloading the content directly from those peers that responded as having matching files. In the basic BitTorrent architecture, on the other hand, the global ecosystem is split into distinct groups of users that are all trying to download a particular file. Each such group—known as a swarm—is managed by a centralized server called a tracker. The tracker keeps a list of the swarm’s peers and, for each peer, a bit-vector of which file blocks it already has. When a client joins a swarm by announcing itself to the tracker, it gets a list of other peers, and it subsequently attempts to connect to them and download file blocks. How a client discovers a particular swarm is outside the scope of the system, but there are plenty of BitTorrent search engines that allow clients to perform keyword searches. These searches return .torrent files, which includes high-level meta-data about a particular swarm, including the URL(s) at which its tracker(s) can be accessed.

So there are three phases to downloading content from BitTorrent:

  1. Finding a .torrent meta-data file
  2. Registering with the .torrent’s tracker and getting a list of peer addresses
  3. Connecting to a peer, swapping the bit-vector of which file blocks each has, and potentially downloading or uploading needed blocks

Unfortunately, the verification that copyright enforcement agencies such as the VPA use stops at #2. That is, if some random BitTorrent tracker lists your IP address as being part of a swarm, then the VPA considers this to be sufficient proof to warrant a DMCA takedown notice (such as the one above), with clear instructions on how to pay a monetary settlement. Now, a very reasonable question is whether such information should indeed constitute proof.

Last year, researchers at the University of Washington published a paper with the subtitle Why My Printer Received a DMCA Takedown Notice. Their conclusions were that:

  • Practically any Internet user can be framed for copyright infringement today.
  • Even without being explicitly framed, innocent users may still receive complaints.

The title came from the fact that they “registered” the IP address of a networked printer with BitTorrent trackers, and they subsequently received 9 DMCA takedown notices claiming that their printer was engaging in illegal file sharing. (They did not, however, receive any pre-settlement offers such as the one above, which suggests a possible escalation of enforcement techniques since then.)

I have had my own repeated experiences with such false claims. This September, for instance, a research system I operate called CoralCDN received approximately 100 pre-settlement letters, including the one above. A little background: CoralCDN is an open, free, self-organizing content distribution network (CDN). CDNs are widely used by commercial high-volume websites to scalably deliver their content, such as Hulu’s use of Akamai or CNN’s use of Level 3. CoralCDN was designed to help solve the Slashdot effect, which is when portals such as slashdot.org link to underprovisioned third-party sites and cause that site to become quickly overwhelmed by the unexpected surge of resulting traffic. CoralCDN’s answer was to provide an open CDN that would cache and serve any URL that was requested from it. To use CoralCDN, one simply appends a suffix to a URL’s hostname, i.e., http://www.cnn.com/ becomes http://www.cnn.com.nyud.net/. CoralCDN’s been running on PlanetLab—a distributed research testbed of virtualized servers, spread over several hundred universities worldwide—since March 2004. It handles requests from about 2 million users per day.

Because CoralCDN provides an open platform, one can access any URL through it via an HTTP GET request (with the exception of a small number of blacklisted domains and those for content larger than 50MB). Thus, requests to BitTorrent trackers can also use CoralCDN, as these are simply HTTP GETs with a client’s relevant information encoded in the tracker URL’s query string, e.g., http://denis.stalker.h3q.com.6969.nyud.net/announce?info_hash=(hash)&peer_id=(name)&port=52864&uploaded=231374848&downloaded=2227372596&left=0&corrupt=0&key=E0591124&numwant=200&compact=1&no_peer_id=1.

Notice that the HTTP request includes a peer’s unique name (a long random string) and a port number, but notably does not include an IP address for that client. It’s an optional parameter in the specification that many BitTorrent clients don’t include. (In fact, even if the request includes this IP parameter, some trackers ignore it.) Instead, the tracker records the network-level IP address from where the HTTP request originated (the other end of the TCP connection), together with the supplied port, as the peer’s network address.

When this request is via an HTTP proxy, things go wrong. Here, the BitTorrent client is connecting to an HTTP proxy, which in turn is connecting to the tracker. So this practice results in the tracker recording an unusable address: the combination of the proxy’s IP and the client’s port. Needless to say, the proxy isn’t running BitTorrent, let alone on that particular (often randomized) port. Not only does this design damage the client’s BitTorrent experience—other clients won’t initiate communication with it, leading to fewer opportunities for “tit-for-tat” data exchanges—but this also damages the entire swarm’s performance: Others’ requests to this hybrid address will all fail (typically with an RST response to the TCP connection request). I was rather surprised to find this flaw in the BitTorrent specification.

So how is this related to CoralCDN and the VPA? For whatever reason, some publisher started including a Coralized URL for the tracker’s location, as shown above (http://denis.stalker.h3q.com.6969.nyud.net/). I could only surmise why this was done: perhaps on the (mistaken) assumption that it would reduce load on the server, or perhaps in the hope of offloading abuse complaints to CoralCDN servers. The latter might have been useful if copyright enforcement agencies were going after the trackers, instead of the participating peers. In fact, we initially thought this was the case when these pre-settlement letters from the VPA started rolling in. More careful analysis, however, exposed the above problem: when the BitTorrent URL was Coralized, peers’ requests to the tracker were issued via CoralCDN HTTP proxies. Thus, the tracker built up a list of peer addresses of the form (CoralCDN IP : peer port), where these CoralCDN IPs correspond to PlanetLab servers located at various universities.

Hence, when the VPA began sending out pre-settlement letters claiming infringement, they sent them to network operators at tens of universities, who turned around and forwarded them to PlanetLab’s central operations and me.

What is particularly striking about this case, however, is that these reports were demonstrably false! There was no BitTorrent client running at the specified address (in the above letter, 128.31.1.13:40951), for precisely the reasons I discuss. Thus, we can fairly definitively conclude that the VPA never actually tested the peer for actual infringement: not even by trying to connect to the client’s address, let alone determining whether the client was actually uploading or download any data, and let alone valid data corresponding to the copyrighted file in question.

This begs the question as to what should be required for a company to issue a DMCA notification and pre-settlement letters that assert:

Someone using this account has engaged in illegal copying or distribution (downloading or uploading)…The information in this notification is accurate. We have a good faith belief that use of the material in the manner complained of herein is not authorized by the copyright owner.

Of course, the incentives for the VPA to actually ensure that “this notification is accurate” are pretty clear. The cost of a false positive is currently nothing, and perhaps some innocent users will even “buy protection” to make this problem and the threat of costly litigation go away.

DISCOUNT VIDEO CENTER INC. believes that it may be beneficial to settle this matter without the need of costly and time-consuming litigation. We have been authorized to offer a reasonable settlement to resolve the infringement of the works listed above.

It appears that the VPA and other such agencies have been rather effective at getting some settlement money. Our personal experience with DMCA takedown notices is that network operators are suitably afraid of litigation. Many will pull network access from machines as soon as a complaint is received, without any further verification or demonstrative network logs. In fact, many operators also sought “proof” that we weren’t running BitTorrent or engaging in file sharing before they were willing to restore access. We’ll leave the discussion about how we might prove such a negative to another day, but one can point to the chilling effect that such notices have had, when users are immediately considered guilty and must prove their innocence.

I am not arguing that copyright owners should not be able to take reasonable steps to protect their copyrighted material. I am arguing, however, that they should take similarly reasonable steps to ensure that any claimed infringement actually took place. When DMCA notices are accompanied by oaths under “penalty of perjury” and these claims are accepted as writ, as they have de facto become, there should some downside for agencies that demonstrably do not act in “good faith” to verify infringement. Even a simple TCP connection attempt would have been enough to dispel their flawed assumptions. That currently seems to be too much to ask.

Update (Dec 15): A follow-up post can be found here.

A Freedom-of-Speech-based Approach To Limiting Filesharing – Part III: Smoke, smoke!

Over the past two days we have seen that filesharing is vulnerable to spamming, and that as a defense, the filesharers have used the IP block list to exclude the spammers from sharing files. Today I discuss how I think lawyers and laypeople should look at the legal issues. Since I am most decidedly not a lawyer, nothing I say here should be considered definitive. Hopefully, it is at least interesting.

An analogy:

Washington Square, in New York City, was for many years a place where drugs were sold. A fellow would stand around quietly saying to passersby “Smoke, smoke!” However, this so-called “steerer” held no drugs. His role was simply to direct the buyer to the “pitcher”, who had the drugs somewhere nearby, and who kept silent.

Even the strongest defender of free-speech rights understands that the “steerer’s” words are not just speech. His words are not similar to those of this article, though both simply say that someone in the park is selling. He is as legally responsible for the sale as the “pitcher”, because they are, according to legal terminology, “acting in concert”. He is a drug dealer who may never touch any drugs. Note also that the “steerer” receives payments from the illegal transactions – though it is not in fact legally necessary to be able to prove the payments to establish that he’s “acting in concert”. All that’s required is that the “steerer” and the “pitcher” share “community of purpose” in facilitating the illegal transaction.

In the Napster case, the court held that Napster, even though it did not have any copyrighted data on its servers, was liable for contributory infringement. To use Napster, a downloader would login to Napster’s central server, which connected the user to another user who had a file that was being searched for. Since it was Napster’s role to hook up the parties illegally exchanging files, it is reasonable to see this as analogous to the “steerer” in Washington Square – Napster didn’t have the infringing materials, but that really isn’t a defense.

The gnutella network is decentralized to solve the legal problem presented by the Napster decision. Nonetheless, there is something still centralized in gnutella: the IP block list. Users of LimeWire get their block list from LimeWire and only from LimeWire. Accordingly, if Napster was like the “steerer” in Washington Square, LimeWire furthers the “community of purpose” in a different way; it is someone who gives negative information rather than affirmative. He’s someone paid to stand in the park pointing out who are cheaters selling bad drugs, allowing the purchasers to find the good stuff.

What is a legitimate P2P spam filtering authority versus one that shares “community of purpose” with infringers? The former could legitimately act to keep the network from being flooded by those selling weight loss drugs, without facilitating infringing. There is probably no bright-line rule, but it is reasonably clear that LimeWire is well on the wrong side of any possible grey area.

It’s useful to compare gnutella spam cop LimeWire with e-mail spam cop AOL.

LimeWire does not clearly advertise its spam cop role as a feature of its software, and does not discuss its block list. (The LimeWire web site has only the cryptic description “We’re always working to protect you from viruses and unwanted sharing.”) There is no discussion anywhere about what sorts of sites and files it is blocking and for what reason. No notification is given by LimeWire to a site when it is blocked, nor is there any way given to contact LimeWire to remove yourself from the block list.

In comparison, blocking e-mail spam is, for AOL, a major selling point. AOL does not block bulk e-mailers (many of which are legitimate) on a whim. Every e-mail rejected by AOL is bounced with a notification to the sender, and there are detailed instructions to bulk e-mailers as to what they need to do to avoid running afoul of AOL’s filters. There is a way to contact AOL to remove oneself from the block list, if one is legitimate. The whole process is transparent.

It is clear that a legitimate spam cop cannot block spoofers, since any search for a non-infringing file would be unmolested by spoofs, yet it appears that LimeWire does block MediaDefender. In fact, LimeWire appears to be quietly promising to do so, when it says that it protects against “unwanted sharing”, whatever that is.

Lastly, it appears that LimeWire’s statements in court conceal what it is doing.

As we mentioned in the first post, there is an ongoing case, Arista v Lime Group. In its motion for Summary Judgement, LimeWire states

Likewise, LW does not have the ability to control the manner in which users employ the LimeWire software. Unlike the Napster defendants, LW does not maintain central servers containing files or indices of files. … LW’s system is like that analysed by the Ninth Circuit in Grokster, “truly decentralized”. … LW no more controls the actions of its customers than do any of the thousands of companies that provide hardware or other software used in connection with the internet.

This omits any discussion of LimeWire’s centralized block list. LW assuredly does control the manner in which LimeWire users employ the LimeWire software, because if a site is added to the IP block list, it is no longer visible to most LimeWire users. This is very far from the normal situation applying in other software used in connection with the internet.

Moreover, the plaintiffs’ attorneys appear to be unaware of the blocking of spoofs, as their reply motion makes no mention of it (nor the other hidden features of LimeWire software discussed yesterday).

While it might be possible to run a legitimate spam-blocking service for P2P networks, it would look rather different from what LimeWire is doing.

Conclusion

The best way to regulate filesharing effectively is to analyze the various players’ roles on free-speech grounds. The individual filesharers (when they share infringing material) are certainly violating the law, but in a small way that probably can’t be reasonably controlled. The publishers of the software that allows the network to run (including LimeWire) are exercising free speech – the fact that their code can be made to do something illegal should be irrelevant. However, LimeWire is facilitating infringing because of the way it runs its IP block list. If LimeWire were shut down, the gnutella network become useless for downloading infringing music. Because of their actions to keep the network safe for infringers – their “acting in concert” – LimeWire should be liable for contributory infringement.

This course will avoid free speech restrictions that trouble many. In terms of preventing infringing, it also will be far more productive than trying to target the small fish. It is an effective measure that respects rights.

[This series of posts has been a somewhat shortened version of an article here.]

A Freedom-of-Speech Approach To Limiting Filesharing – Part I: Filesharing and Spam

[Today we kick off a series of three guest posts by Mitch Golden. Mitch was a professor of physics when, in 1995, he was bitten by the Internet bug and came to New York to become an entrepreneur and consultant. He has worked on a variety of Internet enterprises, including one in the filesharing space. As usual, the opinions expressed in these posts are Mitch’s alone. — Ed]

The battle between the record labels and filesharers has been somewhat out of the news a bit of late, but it rages on still. There is an ongoing court case Arista Records v LimeWire, in which a group of record labels are suing to have LimeWire held accountable for the copyright infringing done by its users. Though this case has attracted less attention than similar cases before it, it may raise interesting issues not addressed in previous cases. Though I am a technologist, not a lawyer, this series of posts will advocate a way of looking at the issues, including legal, using a freedom-of-speech based approach, which leads to some unusual conclusions.

Let’s start by reviewing some salient features of filesharing.

Filesharing is a way for a group of people – who generally do not know one another – to allow one another to see what files they collectively have on their machines, and to exchange desired files with each other. There are at least two components to a filesharing system: one allows a user who is looking for a particular file to see if someone has it, and another that allows the file to be transferred from one machine to the other.

One of the most popular filesharing programs in current use is LimeWire, which uses a protocol called gnutella. Gnutella is decentralized, in the sense that neither the search nor the exchange of files requires any central server. It is possible, therefore, for people to exchange copyrighted files – in violation of the law – without creating any log of the search or exchange in a central repository.

The gnutella protocol was originally created by developers from Nullsoft, the company that had developed the popular music player WinAmp, shortly after it was acquired by AOL. AOL was at that time merging with Time Warner, a huge media company, and so the idea that they would be distributing a filesharing client was quite unamusing to management. Work was immediately discontinued; however, the source for the client and the implementation of the protocol had already been released under the GPL, and so development continued elsewhere. LimeWire made improvements both to the protocol and the interface, and their client became quite popular.

The decentralized structure of filesharing does not serve a technical purpose. In general, centralized searching is simpler, quicker and more efficient, and so, for example, to search the web we use Google or Yahoo, which are gigantic repositories. In filesharing, the decentralized search structure instead serves a legal purpose: to diffuse the responsibility so no particular individual or organization can be held accountable for promoting the illegal copying of copyright materials. At the time the original development was going on, the Napster case was in the news, in which the first successful filesharing service was being sued by the record labels. The outcome of that case a few months later resulted in Napster being shut down, as the US courts held it (which was a centralized search repository) responsible for the copyright infringing file sharing its users were doing.

Whatever their legal or technical advantages, decentralized networks, by virtue of their openness, are vulnerable to a common problem: spam. For example, because anyone may send anyone else an e-mail, we are all subject to a deluge of messages trying to sell us penny stocks and weight loss remedies. Filesharing too is subject this sort of cheating. If someone is looking for, say, Rihanna’s recording Disturbia, and downloads an mp3 file that purports to be such, what’s to stop a spammer from instead serving a file with an audio ad for a Canadian pharmacy?

Spammers on the filesharing networks, however, have more than just the usual commercial motivations in mind. In general, there are four categories of fake files that find their way onto the network.

  • Commercial spam
  • Pornography and Ads for Pornography
  • Viruses and trojans
  • Spoof files

The last of these has no real analogue to anything people receive in e-mail It works as follows: if, for example, Rihanna’s record label wants to prevent you from downloading Disturbia, they might hire a company called MediaDefender. MediaDefender’s business is to put as many spoof files as possible on gnutella that purport to be Disturbia, but instead contain useless noise. If MediaDefender can succeed in flooding the network so that the real Disturbia is needle in a haystack, then the record label has thwarted gnutella’s users from violating their copyright.

Since people are still using filesharing, clearly a workable solution has been found to the problem of spoof files. In tomorrow’s post, I discuss this solution, and in the following post, I suggest its legal ramifications.