August 23, 2017

GAO Data: Porn Rare on P2P; Filters Ineffective

P2P nets have fewer pornographic images than the Web, and P2P porn filters are ineffective, according to data in a new report from the U.S. Government Accountability Office (GAO).

Mind you, the report’s summary text says pretty much the opposite, but where I come from, data gets more credibility than spin. The data can be found on pages 58-69 of the report. (My PDF reader calls those pages 61-72. To add to the confusion, the pages include images of PowerPoint slides bearing the numbers 53-64.)

The researchers did searches for images, using six search terms (three known to be associated with porn and three innocuous ones) on three P2P systems (Warez, Kazaa, Morpheus) and three search engines (Google, MSN, Yahoo). They looked at the resulting images and classified each image as adult porn, child porn, cartoon porn, adult erotica, cartoon erotica, or other. For brevity, I’ll lump together all of the porn and erotica categories into a meta-category that I’ll call “porne”, so that there are two categories, porne and non-porne.

The first observation from the data is that P2P nets have relatively few porne images, compared to the Web. The eighteen P2P searches found a total of 277 porne images. The eighteen Web searches found at least 655 porne images. But they had to cut off the analysis after the first 100 images of each Web search, because the Web searches returned so many images, so the actual number of Web porne images might have been much larger. (No such truncation was necessary on the P2P searches.)

The obvious conclusion is that if you want to regulate communications technology to keep porne away from kids, you should start with the Web, because it’s a much bigger danger than P2P.

The report also looked at the effectiveness of the porn blocking facilities built into some of the products. The data show pretty clearly that the filters are ineffective at distinguishing porne from non-porne images.

Two of the P2P systems, Kazaa and Morpheus, have built-in porn blocking. The report did the same searches, with and without blocking enabled, and compared the results. They report the data in an odd format, but I have reorganized their data into a more enlightening form. First, let’s look at the results for the three search terms “known to be associated with pornography”. For each term, I’ll report two figures of merit: what percentage of the porne images was blocked by the filter, and what percentage of the non-porne images was (erroneously) blocked by the filter. Here are the results:

Product % Porne Blocked % Non-porne Blocked
Kazaa 100% 100%
Morpheus 83% 69%

Kazaa blocks all of the porne, by the clever expedient of blocking absolutely everything it sees. For non-porne images, Kazaa has a 100% error rate. Morpheus does only slightly better, blocking 83% of the porne, while erroneously blocking “only” 69% of the non-porne. In all, it’s a pretty poor performance.

Here are the results for searches on innocuous search terms (ignoring one term which never yielded any porne):

Product % Porne Blocked % Non-porne Blocked
Kazaa 100% -9%
Morpheus -150% 0%

You may be wondering where the negative percentages come from. According to the report, more images are found with the filters turned on when they are turned off. If the raw data are to be believed, turning on the Morpheus filter more than doubles the amount of porne you can find! There’s obviously something wrong with the data, and it appears to be that searches were done at different times, when very different sets of files were available. This is pretty sloppy experimental technique – enough to cast doubt on the whole report. (One expects better from the GAO.)

But we can salvage some value from this experiment if we assume that even though the total number of files on the P2P net changed from one measurement to the next, the fraction of files that were porne stayed about the same. (If this is not true, then we can’t really trust any of the experiments in the report.) Making this assumption, we can then calculate the percentage of available files that are porne, both with and without blocking.

Product % Porne, without Filter % Porne, with Filter
Kazaa 27% 0%
Morpheus 20% 38%

The Kazaa filter successfully blocks all of the porne, but we don’t know how much of the non-porne it erroneously blocks. The Morpheus filter does a terrible job, actually making things worse. You could do better by just flipping a coin to decide whether to block each image.

So here’s the bottom line on P2P porne filters: you can have a filter that massively overblocks innocuous images, or you can have a filter that sometimes makes things worse and can’t reliably beat a coin flip. Or you can face the fact that these filters don’t help.

(The report also looked at the effectiveness of the built-in porn filters in Web search engines, but due to methodological problems those experiments don’t tell us much.)

The policy prescription here is clear. Don’t mandate the use of filters, because they don’t seem to work. And if you want filters to improve, it might be a good idea to fully legalize research on filtering systems, so people like Seth Finkelstein can finish the job the GAO started.

Comments

  1. Clinton Blackmore says:

    Interesting.

    It seems to me there are different categories of people who want to put filters in place:

    1. Those who think filters should (or do) work, and want to solve the problem (the problem being that people can access porn).
    2. Those who know it doesn’t work (or don’t care), but want to appear to be trying to solve the problem.

    Even if filters did work, it becomes, like so many things, an arms race. So, if a site that says “porn”, it is blocked, it will, for example, change to saying “pr0n”. [Incidently, I wonder if this blog article will be blocked by some tools.]

    The most effective filter by far is the person doing the search. You have a much higher chance of not finding porn if you don’t go looking for it. [True, it does occassionally come up; that is why it is really nice that Google gives you a blurb from the site in context — you can usually tell at a glance and avoid it.]

  2. Of course, some p2p networks will only return a certain number of queries to stem network congestion due to queries.

  3. Anonymous says:

    With the filter on, the results turned up LESS normal pornography and MORE cartoon and child porn. Way to keep our children safe.

  4. It seems rather obvious that there are better filters available
    then the ones being implemented on the p2p netowks described in this article. 150 percent error is not a very good filter.

  5. Maybe the research did not account for pr0n-movies disguised as popular software packages. This is common practice for kids who want to conceal these files from their parents, however the hash mechanism in some P2P systems makes these files show up when you search either for the pr0n or the software.

  6. If you want to keep your children safe stop letting them download on p2p networks. That seems pretty careless to me. besides that there are worse things on the net than p2p networks and porn–there are idiots with nothing better to do which is more dangerous than any other living creature.

    Fact: Most murders are caused out of boredom.

    Learn how to raise your kids…