December 28, 2024

Archives for 2003

Online Porn and Bad Science

Declan McCullagh reports
on yesterday’s House Government Reform Committee hearings on porn and
peer-to-peer systems. (I’m sure there is some porn on these systems,
as there is in every place where large groups of people gather.)
There’s plenty to chew on in the story; Frank Field says it “sounds
like a nasty meeting.”

But I want to focus on the factual claims made by one witness. Declan writes:

Randy Saaf, president of P2P-tracking firm MediaDefender, said his
investigations of child pornography on P2P networks found over 321,000
files “that appeared to be child pornography by their names and file
types,” and said that “over 800 universities had files on their
networks that appeared to be child pornography.”

But MediaDefender, and one of the government studies released on
Thursday, reviewed only the file names and not the actual contents of
the image files. A similar approach used in a 1995 article [i.e., the
now-notorious Rimm study – EWF] that appeared in the Georgetown
University law journal drew strong criticism from academics for having
a flawed methodology that led to incorrect estimates of the amount of
pornography on the Internet.

Characterizing a file as porn based on its name alone is obviously
lame, if your goal is to make an accurate estimate of how much porn is
out there. (And that is the goal, isn’t it?)

It’s no excuse to say that it’s infeasible to sample 321,000 files
by hand to see if they are really porn. Because if you actually care
whether 321,000 is even close to correct, you can examine a small
random sample of the files. If you sample, say, ten randomly chosen
files and only five of them are really porn, then you can be pretty
sure that 321,000 is far wrong. There’s no excuse for not doing this,
if your goal is to give the most accurate testimony to Congress.

UPDATE (8:30 AM, March 18): According to a Dawn Chmielewski story at the San Jose Mercury News, a government study found that 42% of files found on Kazaa via “search terms known to be associated with child porn” were actually child porn.

Too Late

Julian Bigelow, who was chief engineer on the IAS computer (the architectural forerunner of today’s machines) died about three weeks ago at the age of 89. Today I learned where he had lived.

For the last seven years I sat at the breakfast table each morning and looked out at the red house behind mine. I never knew – until it was too late – that the man who lived there was one of the pioneers of my field.

Grimmelmann on the Berkeley DRM Conference

James Grimmelmann at LawMeme offers a typically insightful and entertaining summary of the recent Berkeley DRM Conference. Here’s my favorite part:

And thus, the sixty-four dollar question: Is any of this [DRM technology] really going to work? The question tends to come up about once per panel; most of the panelists do their best to avoid dealing with it. The techies are split. The ones who go to great pains to say that they don’t speak for their companies say “no, DRM is a pipe dream.” The ones who don’t include these disclaimers either avoid the question or say “well, we’re doing our best.” The content industry reps treat effective DRM as almost a foregone conclusion. It must exist, because if it doesn’t, well, that would be too horrible a future to contemplate.

The lawyers in attendance, strangely enough, don’t seem to care whether DRM can work. I would have thought that the technical feasibility of effective mass-market DRM was the critical threshold question, but apparently not. I suppose it’s because they’re so accustomed to speaking in hypotheticals.

Reader Replies on Congestion and the Commons

Thanks to all of the readers who responded to my query about why the Internet’s congestion control mechanisms aren’t destroyed by selfish noncompliance. Due to the volume of responses, I can’t do all of you credit here, but I’ll do my best to summarize.

Jordan Lampe, Grant Henninger, and David Spalding point out that “Internet accelerator” utilizes (like these) do exist, but they don’t seem to come from mainstream vendors. Users may be leery of some of these products or some of these vendors. Wim Lewis suggests that these utilities may work by opening multiple connections to download material in parallel, which probably qualifies as a way of gaming the system to get more than one’s “fair share” of bandwidth.

Many readers argue that the incentive to cheat is less than I had suggested.

Aaron Swartz, Russell Borogove, and Kevin Marks argue that it’s not so easy to cheat the congestion control system. You can’t just transmit at full speed, since you don’t want to oversaturate any network links with your own traffic. Still, I think that it’s possible to get some extra bandwidth by backing off more slowly than normal, and by omitting certain polite features such as the so-called “slow start” rule.

Aaron Swartz, Wim Lewis, Mark Gritter, and Seth Finkelstein argue that most congestion happens at the endpoints of the Net: either at the link connecting directly to the server, or at the “last mile” link to the user’s desktop. These links are not really shared, since they exist only for the benefit of one party (the server or the user, respectively); so the bandwidth you gained by cheating would be stolen only from yourself.

Carl Witty and Karl-Friedrich Lenz argue that most of the relevant Net activity consists of downloads from big servers, so these server sites are the most likely candidates for cheating. Big servers have a business incentive to keep the Net running smoothly, so they are less likely to cheat.

Mark Gritter argues that most Net connections are short-lived and so don’t give congestion control much of a chance to operate, one way or the other.

All of these arguments imply that the incentive to cheat is not as large as I had suggested. Still, if the incentive is still nonzero, at least for some users, we would expect to see more cheating than we do.

Russell Borogove and John Gilmore argue that if cheating became prevalent, ISPs and backbone providers could deploy countermeasures to selectively drop cheaters’ packets, thereby lowering the benefit of cheating. This is plausible, but it doesn’t explain the apparent lack of cheating we see. The greedy strategy for users is to cheat now, and then stop cheating when ISPs start fighting back. But users don’t cheat much now.

Wim Lewis and Carl Witty suggest that if we’re looking for cheaters, we might look first at users who are already breaking or stretching the rules, such as porn sites or peer-to-peer systems.

Finally, Mark Gritter observes that defections do happen now, though in indirect ways. Some denial of service attacks operate by causing congestion, and some protocols related to streaming video or peer-to-peer queries appear to bend the rules. Perhaps the main vehicle for cheating will be through new protocols and services and not by modification of existing ones.

Thanks to all of you for an amazing demonstration of the collective mind of the Net at work.

Ultimately, I think there’s still a mystery here, though it’s smaller than I originally imagined.

Congestion Control and the Tragedy of the Commons

I have been puzzling lately over why the Internet’s congestion control mechanisms work. They are a brilliant bit of engineering, but they fail utterly to account for the incentives of the Internet’s users. By any rational analysis, they ought to fail spectacularly, causing the Net to grind to a halt. And yet, for some unfathomable reason, these mechanisms do work.

Let me explain. As a starting point, think about the cars on a busy highway. If there aren’t many cars, the road is underutilized, carrying only a fraction of its capacity. Add more cars, and the road is used more efficiently, carrying more cars per minute past any given point. Add too many cars, though, and you’ll cause a traffic jam. Traffic slows, and the road becomes much less efficient as only a few cars per minute manage to crawl past each point. The road is in congestion.

Now think of the Internet as a highway, and each packet of data on the Net as a car. Adding more traffic increases the Net’s throughput, but only up to a point. Adding too much traffic leads to congestion, with a rapid dropoff in efficiency. If too many people are sending too much data, the Net slows to a crawl.

To address this problem, the TCP protocol (upon which are built most of the popular Net services, including email and the web) includes a “congestion control” mechanism. The mechanism is subtle in its details but pretty simple in its basic concept. Whenever two computers are talking via TCP, and they detect possible congestion on the path between them, they slow down their conversation. If everybody does this, congestion is avoided, since the onset of congestion causes everybody to back off enough to stave off an Internet traffic jam.

If you back off in response to congestion, you’re making the Internet a better place. You’re accepting a slowdown in your communication, in order to make the Internet faster for everybody else.

This is a perfect Tragedy of the Commons setup. We’re all better off if everybody backs off. But backing off is voluntary, and we each have a selfish motive to skip the backoff and just grab as much bandwidth as we can.

The mystery is this: Why hasn’t the tragedy happened? Virtually everybody does back off, and the Net doesn’t collapse under congestion. This happens despite the fact that a Net inhabited by rationally self-interested people should apparently behave otherwise. What’s going on?

Nobody seems to have an adequate explanation. One theory is that the average person doesn’t know how to cheat; but others could make and sell products that offer better Net performance by not backing off. Another theory is that Microsoft supplies most of the Net’s software and is making the choice for most consumers; and Microsoft’s self-interest is in having a useful Net. But again, why don’t others show up selling add-on “booster” products that cheat? A third theory is that people really are altruistic on the Net, behaving in a more civil and community-minded fashion than they do in real life. That seems pretty unlikely.

I’m stumped. Do any of you have an explanation for this?