December 5, 2024

A Freedom-of-Speech Approach To Limiting Filesharing – Part I: Filesharing and Spam

[Today we kick off a series of three guest posts by Mitch Golden. Mitch was a professor of physics when, in 1995, he was bitten by the Internet bug and came to New York to become an entrepreneur and consultant. He has worked on a variety of Internet enterprises, including one in the filesharing space. As usual, the opinions expressed in these posts are Mitch’s alone. — Ed]

The battle between the record labels and filesharers has been somewhat out of the news a bit of late, but it rages on still. There is an ongoing court case Arista Records v LimeWire, in which a group of record labels are suing to have LimeWire held accountable for the copyright infringing done by its users. Though this case has attracted less attention than similar cases before it, it may raise interesting issues not addressed in previous cases. Though I am a technologist, not a lawyer, this series of posts will advocate a way of looking at the issues, including legal, using a freedom-of-speech based approach, which leads to some unusual conclusions.

Let’s start by reviewing some salient features of filesharing.

Filesharing is a way for a group of people – who generally do not know one another – to allow one another to see what files they collectively have on their machines, and to exchange desired files with each other. There are at least two components to a filesharing system: one allows a user who is looking for a particular file to see if someone has it, and another that allows the file to be transferred from one machine to the other.

One of the most popular filesharing programs in current use is LimeWire, which uses a protocol called gnutella. Gnutella is decentralized, in the sense that neither the search nor the exchange of files requires any central server. It is possible, therefore, for people to exchange copyrighted files – in violation of the law – without creating any log of the search or exchange in a central repository.

The gnutella protocol was originally created by developers from Nullsoft, the company that had developed the popular music player WinAmp, shortly after it was acquired by AOL. AOL was at that time merging with Time Warner, a huge media company, and so the idea that they would be distributing a filesharing client was quite unamusing to management. Work was immediately discontinued; however, the source for the client and the implementation of the protocol had already been released under the GPL, and so development continued elsewhere. LimeWire made improvements both to the protocol and the interface, and their client became quite popular.

The decentralized structure of filesharing does not serve a technical purpose. In general, centralized searching is simpler, quicker and more efficient, and so, for example, to search the web we use Google or Yahoo, which are gigantic repositories. In filesharing, the decentralized search structure instead serves a legal purpose: to diffuse the responsibility so no particular individual or organization can be held accountable for promoting the illegal copying of copyright materials. At the time the original development was going on, the Napster case was in the news, in which the first successful filesharing service was being sued by the record labels. The outcome of that case a few months later resulted in Napster being shut down, as the US courts held it (which was a centralized search repository) responsible for the copyright infringing file sharing its users were doing.

Whatever their legal or technical advantages, decentralized networks, by virtue of their openness, are vulnerable to a common problem: spam. For example, because anyone may send anyone else an e-mail, we are all subject to a deluge of messages trying to sell us penny stocks and weight loss remedies. Filesharing too is subject this sort of cheating. If someone is looking for, say, Rihanna’s recording Disturbia, and downloads an mp3 file that purports to be such, what’s to stop a spammer from instead serving a file with an audio ad for a Canadian pharmacy?

Spammers on the filesharing networks, however, have more than just the usual commercial motivations in mind. In general, there are four categories of fake files that find their way onto the network.

  • Commercial spam
  • Pornography and Ads for Pornography
  • Viruses and trojans
  • Spoof files

The last of these has no real analogue to anything people receive in e-mail It works as follows: if, for example, Rihanna’s record label wants to prevent you from downloading Disturbia, they might hire a company called MediaDefender. MediaDefender’s business is to put as many spoof files as possible on gnutella that purport to be Disturbia, but instead contain useless noise. If MediaDefender can succeed in flooding the network so that the real Disturbia is needle in a haystack, then the record label has thwarted gnutella’s users from violating their copyright.

Since people are still using filesharing, clearly a workable solution has been found to the problem of spoof files. In tomorrow’s post, I discuss this solution, and in the following post, I suggest its legal ramifications.

Study Shows DMCA Takedowns Based on Inconclusive Evidence

A new study by Michael Piatek, Yoshi Kohno and Arvind Krishnamurthy at the University of Washington shows that copyright owners’ representatives sometimes send DMCA takedown notices where there is no infringement – and even to printers and other devices that don’t download any music or movies. The authors of the study received more than 400 spurious takedown notices.

Technical details are summarized in the study’s FAQ:

Downloading a file from BitTorrent is a two step process. First, a new user contacts a central coordinator [a “tracker” – Ed] that maintains a list of all other users currently downloading a file and obtains a list of other downloaders. Next, the new user contacts those peers, requesting file data and sharing it with others. Actual downloading and/or sharing of copyrighted material occurs only during the second step, but our experiments show that some monitoring techniques rely only on the reports of the central coordinator to determine whether or not a user is infringing. In these cases whether or not a peer is actually participating is not verified directly. In our paper, we describe techniques that exploit this lack of direct verification, allowing us to frame arbitrary Internet users.

The existence of erroneous takedowns is not news – anybody who has seen the current system operating knows that some notices are just wrong, for example referring to unused IP addresses. Somewhat more interesting is the result that it is pretty easy to “frame” somebody so they get takedown notices despite doing nothing wrong. Given this, it would be a mistake to infer a pattern of infringement based solely on the existence of takedown notices. More evidence should be required before imposing punishment.

Now it’s not entirely crazy to send some kind of soft “warning” to a user based on the kind of evidence described in the Washington paper. Most of the people who received such warnings would probably be infringers, and if it’s nothing more than a warning (“Hey, it looks like you might be infringing. Don’t infringe.”) it could be effective, especially if the recipients know that with a bit more work the copyright owner could gather stronger evidence. Such a system could make sense, as long as everybody understood that warnings were not evidence of infringement.

So are copyright owners overstepping the law when they send takedown notices based on inconclusive evidence? Only a lawyer can say for sure. I’ve read the statute and it’s not clear to me. Readers who have an informed opinion on this question are encouraged to speak up in the comments.

Whether or not copyright owners can send warnings based on inconclusive evidence, the notification letters they actually send imply that there is strong evidence of infringement. Here’s an excerpt from a letter sent to the University of Washington about one of the (non-infringing) study computers:

XXX, Inc. swears under penalty of perjury that YYY Corporation has authorized XXX to act as its non-exclusive agent for copyright infringement notification. XXX’s search of the protocol listed below has detected infringements of YYY’s copyright interests on your IP addresses as detailed in the attached report.

XXX has reasonable good faith belief that use of the material in the manner complained of in the attached report is not authorized by YYY, its agents, or the law. The information provided herein is accurate to the best of our knowledge. Therefore, this letter is an official notification to effect removal of the detected infringement listed in the attached report. The attached documentation specifies the exact location of the infringement.

The statement that the search “has detected infringements … on your IP addresses” is not accurate, and the later reference to “the detected infringement” also misleads. The letter contains details of the purported infringement, which once again give the false impression that the letter’s sender has verified that infringement was actually occurring:

Evidentiary Information:
Notice ID: xx-xxxxxxxx
Recent Infringement Timestamp: 5 May 2008 20:54:30 GMT
Infringed Work: Iron Man
Infringing FileName: Iron Man TS Kvcd(A Karmadrome Release)KVCD by DangerDee
Infringing FileSize: 834197878
Protocol: BitTorrent
Infringing URL: http://tmts.org.uk/xbtit/announce.php
Infringers IP Address: xx.xx.xxx.xxx
Infringer’s DNS Name: d-xx-xx-xxx-xxx.dhcp4.washington.edu
Infringer’s User Name:
Initial Infringement Timestamp: 4 May 2008 20:22:51 GMT

The obvious question at this point is why the copyright owners don’t do the extra work to verify that the target of the letter is actually transferring copyrighted content. There are several possibilities. Perhaps BitTorrent clients can recognize and shun the detector computers. Perhaps they don’t want to participate in an act of infringement by sending or receiving copyrighted material (which would be necessary to know that something on the targeted computer is willing to transfer it). Perhaps it simply serves their interests better to send lots of weak accusations, rather than fewer stronger ones. Whatever the reason, until copyright owners change their practices, DMCA notices should not be considered strong evidence of infringement.

Comcast's Disappointing Defense

Last week, Comcast offered a defense in the FCC proceeding challenging the technical limitations it had placed on BitTorrent traffic in its network. (Back in October, I wrote twice about Comcast’s actions.)

The key battle line is whether Comcast is just managing its network reasonably in the face of routine network congestion, as it claims, or whether it is singling out certain kinds of traffic for unnecessary discrimination, as its critics claim. The FCC process has generated lots of verbiage, which I can’t hope to discuss, or even summarize, in this post.

I do want to call out one aspect of Comcast’s filing: the flimsiness of its technical argument.

Here’s one example (p. 14-15).

As Congresswoman Mary Bono Mack recently explained:

The service providers are watching more and more of their network monopolized by P2P bandwidth hogs who command a disproportionate amount of their network resources. . . . You might be asking yourself, why don’t the broadband service providers invest more into their networks and add more capacity? For the record, broadband service providers are investing in their networks, but simply adding more bandwidth does not solve [the P2P problem]. The reason for this is P2P applications are designed to consume as much bandwidth as is available, thus more capacity only results in more consumption.

(emphasis in original). The flaws in this argument start with the fact that the italicized segment is wrong. P2P protocols don’t aim to use more bandwidth rather than less. They’re not sparing with bandwidth, but they don’t use it for no reason, and there does come a point where they don’t want any more.

But even leaving aside the merits of the argument, what’s most remarkable here is that Comcast’s technical description of BitTorrent cites as evidence not a textbook, nor a standards document, nor a paper from the research literature, nor a paper by the designer of BitTorrent, nor a document from the BitTorrent company, nor the statement of any expert, but a speech by a member of Congress. Congressmembers know many things, but they’re not exactly the first group you would turn to for information about how network protocols work.

This is not the only odd source that Comcast cites. Later (p. 28) they claim that the forged TCP Reset packets that they send shouldn’t be called “forged”. For this proposition they cite some guy named George Ou who blogs at ZDNet. They give no reason why we should believe Mr. Ou on this point. My point isn’t to attack Mr. Ou, who for all I know might actually have some relevant expertise. My point is that if this is the most authoritative citation Comcast can find, then their argument doesn’t look very solid. (And, indeed, it seems pretty uncontroversial to call these particular packets “forged”, given that they mislead the recipient about (1) which IP address sent the packet, and (2) why the packet was sent.)

Comcast is a big company with plenty of resources. It’s a bit depressing that they would file arguments like this with the FCC, an agency smart enough to tell the difference. Is this really the standard of technical argumentation in FCC proceedings?

Could Use-Based Broadband Pricing Help the Net Neutrality Debate?

Yesterday, thanks to a leaked memo, it came to light that Time Warner Cable intends to try out use-based broadband pricing on a few of its customers. It looks like the plan is for several tiers of use, with the heaviest users possibly paying overage charges on a per-byte basis. In confirming its plans to Reuters, Time Warner pointed out that its heaviest-using five percent of customers generate the majority of data traffic on the network, but still pay as though they were typical users. Under the new proposal, pricing would be based on the total amount of data transferred, rather than the peak throughput on a connection.

If the current, flattened pricing is based on what the connection is worth to a typical customer, who makes only limited use of the connection, then the heaviest five percent of users (let’s call them super-users as shorthand) are reaping a surplus. Bandwidth use might be highly elastic with respect to price, but I think it is also true that the super users do reap a great deal more benefit from their broadband connections than other users do – think of those who pioneer video consumption online, for example.

What happens when network operators fail to see this surplus? They have marginally less incentive to build out the network and drive down the unit cost of data transfer. If the pricing model changed so that network providers’ revenue remained the same in total but was based directly on how much the network is used, then the price would go down for the lightest users and up for the heaviest. If a tiered structure left prices the same for most users and raised them on the heaviest, operators’ total revenue would go up. In either case, networks would have an incentive to encourage innovative, high-bandwidth uses of their networks – regardless of what kind of use that is.

Gigi Sohn of Public Knowledge has come out in favor of Time Warner’s move on these and other grounds. It’s important to acknowledge that network operators still have familiar, monopolistic reasons to intervene against traffic that competes with phone service or cable. But under the current pricing structure, they’ve had a relatively strong argument to discriminate in favor of the traffic they can monetize, and against the traffic they can’t. By allowing them to monetize all traffic, a shift to use based pricing would weaken one of the most persuasive reasons network operators have to oppose net neutrality.

Universal Didn't Ignore Digital, Just Did It Wrong

Techies have been chortling all week about comments made by Universal Music CEO Doug Morris to Wired’s Seth Mnookin. Morris, despite being in what is now a technology-based industry, professed extreme ignorance about the digital world. Here’s the money quote:

Morris insists there wasn’t a thing he or anyone else could have done differently. “There’s no one in the record company that’s a technologist,” Morris explains. “That’s a misconception writers make all the time, that the record industry missed this. They didn’t. They just didn’t know what to do. It’s like if you were suddenly asked to operate on your dog to remove his kidney. What would you do?”

Personally, I would hire a vet. But to Morris, even that wasn’t an option. “We didn’t know who to hire,” he says, becoming more agitated. “I wouldn’t be able to recognize a good technology person — anyone with a good bullshit story would have gotten past me.” Morris’ almost willful cluelessness is telling. “He wasn’t prepared for a business that was going to be so totally disrupted by technology,” says a longtime industry insider who has worked with Morris. “He just doesn’t have that kind of mind.”

Morris’s explanation isn’t just pathetic, it’s also wrong. The problem wasn’t that the company had no digital strategy. They had a strategy, and they had technologists on the payroll who were supposed to implement it. But their strategy was a bad one, combining impractical copy-protection schemes with locked-down subscription services that would appeal to few if any customers.

The most interesting side of the story is that Universal’s strategy is improving now – they’re selling unencumbered MP3s, for example – even though the same proud technophobe is still in charge.

Why the change?

The best explanation, I think, is a fear that Apple would use its iPod/iTunes technologies to grab control of digital music distribution. If Universal couldn’t quite understand the digital transition, it could at least recognize a threat to its distribution channel. So it responded by competing – that is, trying to give customers what they wanted.

Still, if I were a Universal shareholder I wouldn’t let Morris off the hook. What kind of manager, in an industry facing historic disruption, is uninterested in learning about the source of that disruption? A CEO can’t be an expert on everything. But can’t the guy learn just a little bit about technology?