March 28, 2024

GAO Data: Porn Rare on P2P; Filters Ineffective

P2P nets have fewer pornographic images than the Web, and P2P porn filters are ineffective, according to data in a new report from the U.S. Government Accountability Office (GAO).

Mind you, the report’s summary text says pretty much the opposite, but where I come from, data gets more credibility than spin. The data can be found on pages 58-69 of the report. (My PDF reader calls those pages 61-72. To add to the confusion, the pages include images of PowerPoint slides bearing the numbers 53-64.)

The researchers did searches for images, using six search terms (three known to be associated with porn and three innocuous ones) on three P2P systems (Warez, Kazaa, Morpheus) and three search engines (Google, MSN, Yahoo). They looked at the resulting images and classified each image as adult porn, child porn, cartoon porn, adult erotica, cartoon erotica, or other. For brevity, I’ll lump together all of the porn and erotica categories into a meta-category that I’ll call “porne”, so that there are two categories, porne and non-porne.

The first observation from the data is that P2P nets have relatively few porne images, compared to the Web. The eighteen P2P searches found a total of 277 porne images. The eighteen Web searches found at least 655 porne images. But they had to cut off the analysis after the first 100 images of each Web search, because the Web searches returned so many images, so the actual number of Web porne images might have been much larger. (No such truncation was necessary on the P2P searches.)

The obvious conclusion is that if you want to regulate communications technology to keep porne away from kids, you should start with the Web, because it’s a much bigger danger than P2P.

The report also looked at the effectiveness of the porn blocking facilities built into some of the products. The data show pretty clearly that the filters are ineffective at distinguishing porne from non-porne images.

Two of the P2P systems, Kazaa and Morpheus, have built-in porn blocking. The report did the same searches, with and without blocking enabled, and compared the results. They report the data in an odd format, but I have reorganized their data into a more enlightening form. First, let’s look at the results for the three search terms “known to be associated with pornography”. For each term, I’ll report two figures of merit: what percentage of the porne images was blocked by the filter, and what percentage of the non-porne images was (erroneously) blocked by the filter. Here are the results:

Product % Porne Blocked % Non-porne Blocked
Kazaa 100% 100%
Morpheus 83% 69%

Kazaa blocks all of the porne, by the clever expedient of blocking absolutely everything it sees. For non-porne images, Kazaa has a 100% error rate. Morpheus does only slightly better, blocking 83% of the porne, while erroneously blocking “only” 69% of the non-porne. In all, it’s a pretty poor performance.

Here are the results for searches on innocuous search terms (ignoring one term which never yielded any porne):

Product % Porne Blocked % Non-porne Blocked
Kazaa 100% -9%
Morpheus -150% 0%

You may be wondering where the negative percentages come from. According to the report, more images are found with the filters turned on when they are turned off. If the raw data are to be believed, turning on the Morpheus filter more than doubles the amount of porne you can find! There’s obviously something wrong with the data, and it appears to be that searches were done at different times, when very different sets of files were available. This is pretty sloppy experimental technique – enough to cast doubt on the whole report. (One expects better from the GAO.)

But we can salvage some value from this experiment if we assume that even though the total number of files on the P2P net changed from one measurement to the next, the fraction of files that were porne stayed about the same. (If this is not true, then we can’t really trust any of the experiments in the report.) Making this assumption, we can then calculate the percentage of available files that are porne, both with and without blocking.

Product % Porne, without Filter % Porne, with Filter
Kazaa 27% 0%
Morpheus 20% 38%

The Kazaa filter successfully blocks all of the porne, but we don’t know how much of the non-porne it erroneously blocks. The Morpheus filter does a terrible job, actually making things worse. You could do better by just flipping a coin to decide whether to block each image.

So here’s the bottom line on P2P porne filters: you can have a filter that massively overblocks innocuous images, or you can have a filter that sometimes makes things worse and can’t reliably beat a coin flip. Or you can face the fact that these filters don’t help.

(The report also looked at the effectiveness of the built-in porn filters in Web search engines, but due to methodological problems those experiments don’t tell us much.)

The policy prescription here is clear. Don’t mandate the use of filters, because they don’t seem to work. And if you want filters to improve, it might be a good idea to fully legalize research on filtering systems, so people like Seth Finkelstein can finish the job the GAO started.

BitTorrent: The Next Main Event

Few tears will be shed if Grokster and StreamCast are driven out of business as a result of the Supreme Court’s decision. The companies are far from lovable, and their technology is yesterday’s news anyway.

A much more important issue is what the rules will be for the next generation of technologies. Here the Court did not offer the clarity we might have hoped for, opting instead for what Tim Wu has described as the Miss Manners rule, under which vendors must avoid showing an unseemly interest in infringing uses of their products. This would appear to protect vendors who are honestly uninterested in forstering infringement, as well as those who are very interested but manage to hide it.

Lower courts will be left to apply the Grokster Court’s inducement rule to the facts of other file distribution technologies. How far will lower courts go? Will they go too far?

The litmus test is BitTorrent. Here is a technology that is widely used for both infringing and non-infringing purposes, with infringement probably predominating today. And yet: It was originally created to support noninfringing sharing (of concert recordings, with permission). Its creator, Bram Cohen, seems interested only in noninfringing uses, and has said all the right things about infringement – so consistently that one can only conclude he is sincere. BitTorrent is nicely engineered, offering novel benefits to infringing and noninfringing users alike. It is available for free, so there is no infringement-based business model. In short, BitTorrent looks like a clear example of the kind of dual-use technology that ought to pass the Court’s active inducement test.

A court that followed the Grokster analysis closely would have to let BitTorrent off the hook. To do otherwise, I think, would be to institute a de facto predominant-use test, finding BitTorrent liable because too many of its users infringed. This might be dressed up as an inducement analysis, but it would be clear to everybody what was going on. Given the squishiness of the Grokster analysis, we can’t rule this out.

So the stage is set for the next phase of the copyright/technology litigation war. The music and movie industries don’t want to live in a world where BitTorrent is allowed to exist. The Supreme Court didn’t give them enough yesterday to kill BitTorrent. So the industries’ goal will be to stretch the Grokster rule, just as they tried to stretch the Sony rule before hitting a sandbar in the Grokster district court. We’ll see a careful campaign of litigation against peer-to-peer services, trying to gradually stretch the noose of inducement liability until it fits around BitTorrent’s neck. Failing that, we’ll see a push to get Congress to codify (the industries’ interepretation of) the Grokster rule.

The real winners, as usual, are the copyright lawyers.

Patry: The Court Punts

William Patry (a distinguished copyright lawyer) offers an interesting take on Grokster. He says that the court was unable to come to agreement on how to apply the Sony Betamax precedent to Grokster, and so punted the issue.

Legality of Design Decisions, and Footnote 12 in Grokster

As a technologist I find the most interesting, and scariest, part of the Grokster opinion to be the discussion of product design decisions. The Court seems to say that Sony bars liability based solely on product design (p. 16):

Sony barred secondary liability based on presuming or imputing intent to cause infringement solely from the design of distribution of a product capable of substantial lawful use, which the distributor knows is in fact used for infringement.

And again (on p. 17),

Sony‘s rule limits imputing culpable intent as a matter of law from the characteristics or uses of a distributed product.

But when it comes time to lay out the evidence of intent to foster infringement, we get this (p. 22):

Second, this evidence of unlawful objective is given added significance of MGM’s showing that neither company attempted to develop filtering tools or other mechanisms to diminish the infringing activity using their software. While the Ninth Circuit treated the defendants’ failure to develop such tools as irrelevant because they lacked an independent duty to monitor their users’ activity, we think this evidence underscores Grokster’s and StreamCast’s intentional facilitation of their users’ infringement.

It’s hard to square this with the previous statements that intent is not to be inferred from the characteristics of the product. Perhaps the answer is in -footnote 12, which the court hangs off the last word in the previous quote:

Of course, in the absence of other evidence of intent, a court would be unable to find contributory infringement liability merely based on a failure to take affirmative steps to prevent infringement, if the device otherwise was capable of substantial noninfringing uses. Such a holding would tread too close to the Sony safe harbor.

So it seems that product design decisions are not to be questioned, unless there is some other evidence of bad intent to open the door.

To make things worse, the Court here criticizes Grokster and StreamCast for making a very reasonable engineering decision. There is every reason to believe that filtering technology would add to the cost and complexity of the companies’ software, without substantially reducing infringement. (We discussed this issue in the computer science professors’ brief.) In short, the Court here engages in exactly the kind of design second-guessing that technologists fear.

Legitimate technologists will still worry that a well-funded plaintiff can cook up a stew of product design second-guessing, business model second-guessing, and occasional failures of copyright compliance by low-level employees, into an active inducement case. This risk existed before, and the Court today hasn’t done much to reduce it.

BitTorrent Search

BitTorrent.com released a new search facility yesterday, making it slightly easier to find torrent files on the Net. This is an odd strategic move by BitTorrent.com – it doesn’t help the company’s customers much, but mostly just muddles the company’s public messaging.

[Backstory about BitTorrent: The BitTorrent technology allows efficient Internet distribution of large files to many recipients, without creating a central network bottleneck. In current released versions of BitTorrent, you locate content by getting a torrent file from a standard web server. The torrent file points to the location of a “tracker” which in turn keeps track of where on the net you can go to get pieces of the content. (A new beta version eliminates the tracker, which is an interesting development but is largely irrelevant to the issues I’m discussing today.)]

The term “BitTorrent” is used to refer to three separate things:

  • a company, which I’ll call “BitTorrent.com”,
  • a software product called BitTorrent, distributed for free, with source code, by BitTorrent.com,
  • the communication protocols that enable users’ systems to communicate, which are implemented by the BitTorrent software but can be implemented by other software programs.

Blending the three together is sometimes a harmless rhetorical shortcut, but at other times leads to faulty reasoning. For example, a court could hypothetically shut down BitTorrent.com (if the company were found to be a lawbreaker) but it could neither undistribute the software code that was already in users’ hands, nor uncreate the protocol. Critics who are thinking sloppily (or want their audiences to think sloppily) sometimes ignore these distinctions. BitTorrent.com, the company, may have a business incentive to blur the distinctions, in order to make the company’s role seem more important than it really is.

The new BitTorrent.com search facility seems to be entirely separate, functionally, from the BitTorrent software and protocols. Anybody could have created this search facility; and indeed others have. Google, for instance, happens to offer a fairly complete torrent search facility. A BitTorrent.com search for “sith” returns quite a few files claiming to be the new Star Wars movie; but so does a Google search for “sith filetype:torrent”. There’s no reason, functionally, why BitTorrent.com had to be the one to offer a torrent search engine. An independent search engine would work just as well.

Is BitTorrent.com search legal? I’ll leave that one to the lawyers; but I’ll point out two things. First, the DMCA provides a safe harbor against indirect infringement for search engines that follow certain takedown procedures on receiving infringement complaints. BitTorrent.com will apparently follow those procedures, and so the safe harbor may apply. Second, the connection from BitTorrent.com to any infringing content is quite indirect: a BitTorrent.com search result gives the address of a torrent file; the torrent file gives the address of a tracker, the tracker gives the addresses of client computers, and the client computers are the ones that actually distribute infringing content. (The new trackerless version of BitTorrent changes the details, but doesn’t reduce the number of steps.) There are at least three intermediaries between BitTorrent.com and any infringing material.

Even if the search facility is legal, it seems like a bad strategic move by BitTorrent.com. Large copyright interests have been trying to paint BitTorrent as having a pro-infringement agenda; but thus far their efforts have had only limited success because Bram Cohen (the software’s creator) and BitTorrent.com have carefully dissociated themselves from infringement and have conspicuously designed their technology for the benefit of noninfringing users.

As Joe Gratz argues, the new BitTorrent.com search facility, regardless of the merits, will make it easier for BitTorrent.com’s critics to paint the company as having a secret pro-infringement agenda. And that by itself is enough to make an in-house search engine a big mistake for the company.

BitTorrent.com needs to remember that it can be killed by Washington politics. But politicians need to remember, too, that it is the BitTorrent protocol, not the company, that is changing the world. Killing the company will not kill the protocol. A protocol is an idea; and in a free society ideas cannot be killed.