May 8, 2024

Who'll Stop the Spam-Bots?

The FTC has initiated Operation Spam Zombies, a program that asks ISPs to work harder to detect and isolate spam-bots on their customers’ computers. Randy Picker has a good discussion of this.

A bot is a malicious, long-lived software agent that sits on a computer and carries out commands at the behest of a remote badguy. (Bots are sometimes called zombies. This makes for more colorful headlines, but the cognoscenti prefer “bot”.) Bots are surprisingly common; perhaps 1% of computers on the Internet are infected by bots.

Like any successful parasite, a bot tries to limit its impact on its host. A bot that uses too many resources, or that too obviously destabilizes its host system, is more likely to be detected and eradicated by the user. So a clever bot tries to be unobtrusive.

One of the main uses of bots is for sending spam. Bot-initiated spam comes from ordinary users’ machines, with only a modest volume coming from each machine; so it is difficult to stop. Nowadays the majority of spam probably comes from bots.

Spam-bots exhibit the classic economic externality of Internet security. A bot on your machine doesn’t bother you much. It mostly harms other people, most of whom you don’t know; so you lack a sufficient incentive to find and remove bots on your system.

What the FTC hopes is that ISPs will be willing to do what users aren’t. The FTC is urging ISPs to monitor their networks for telltale spam-bot activity, and then to take action, up to and including quarantining infected machines (i.e., cutting off or reducing their network connectivity).

It would be good if ISPs did more about the spam-bot problem. But unfortunately, the same externality applies to ISPs as to users. If an ISP’s customer hosts a spam-bot, most the spam sent by the bot goes to other ISPs, so the harm from that spam-bot falls mostly on others. ISPs will have an insufficient incentive to fight bots, just as users do.

A really clever spam-bot could make this externality worse, by making sure not to direct any spam to the local ISP. That would reduce the local ISP’s incentive to stop the bot to almost zero. Indeed, it would give the ISP a disincentive to remove the bot, since removing the bot would lower costs for the ISP’s competitors, leading to tougher price competition and lower profits for the ISP.

That said, there is some hope for ISP-based steps against bot-spam. There aren’t too many big ISPs, so they may be able to agree to take steps against bot-spam. And voluntary steps may help to stave off unpleasant government regulation, which is also in the interest of the big ISPs.

There are interesting technical issues here too. If ISPs start monitoring aggressively for bots, the bots will get stealthier, kicking off an interesting arms race. But that’s a topic for another day.

GAO Data: Porn Rare on P2P; Filters Ineffective

P2P nets have fewer pornographic images than the Web, and P2P porn filters are ineffective, according to data in a new report from the U.S. Government Accountability Office (GAO).

Mind you, the report’s summary text says pretty much the opposite, but where I come from, data gets more credibility than spin. The data can be found on pages 58-69 of the report. (My PDF reader calls those pages 61-72. To add to the confusion, the pages include images of PowerPoint slides bearing the numbers 53-64.)

The researchers did searches for images, using six search terms (three known to be associated with porn and three innocuous ones) on three P2P systems (Warez, Kazaa, Morpheus) and three search engines (Google, MSN, Yahoo). They looked at the resulting images and classified each image as adult porn, child porn, cartoon porn, adult erotica, cartoon erotica, or other. For brevity, I’ll lump together all of the porn and erotica categories into a meta-category that I’ll call “porne”, so that there are two categories, porne and non-porne.

The first observation from the data is that P2P nets have relatively few porne images, compared to the Web. The eighteen P2P searches found a total of 277 porne images. The eighteen Web searches found at least 655 porne images. But they had to cut off the analysis after the first 100 images of each Web search, because the Web searches returned so many images, so the actual number of Web porne images might have been much larger. (No such truncation was necessary on the P2P searches.)

The obvious conclusion is that if you want to regulate communications technology to keep porne away from kids, you should start with the Web, because it’s a much bigger danger than P2P.

The report also looked at the effectiveness of the porn blocking facilities built into some of the products. The data show pretty clearly that the filters are ineffective at distinguishing porne from non-porne images.

Two of the P2P systems, Kazaa and Morpheus, have built-in porn blocking. The report did the same searches, with and without blocking enabled, and compared the results. They report the data in an odd format, but I have reorganized their data into a more enlightening form. First, let’s look at the results for the three search terms “known to be associated with pornography”. For each term, I’ll report two figures of merit: what percentage of the porne images was blocked by the filter, and what percentage of the non-porne images was (erroneously) blocked by the filter. Here are the results:

Product % Porne Blocked % Non-porne Blocked
Kazaa 100% 100%
Morpheus 83% 69%

Kazaa blocks all of the porne, by the clever expedient of blocking absolutely everything it sees. For non-porne images, Kazaa has a 100% error rate. Morpheus does only slightly better, blocking 83% of the porne, while erroneously blocking “only” 69% of the non-porne. In all, it’s a pretty poor performance.

Here are the results for searches on innocuous search terms (ignoring one term which never yielded any porne):

Product % Porne Blocked % Non-porne Blocked
Kazaa 100% -9%
Morpheus -150% 0%

You may be wondering where the negative percentages come from. According to the report, more images are found with the filters turned on when they are turned off. If the raw data are to be believed, turning on the Morpheus filter more than doubles the amount of porne you can find! There’s obviously something wrong with the data, and it appears to be that searches were done at different times, when very different sets of files were available. This is pretty sloppy experimental technique – enough to cast doubt on the whole report. (One expects better from the GAO.)

But we can salvage some value from this experiment if we assume that even though the total number of files on the P2P net changed from one measurement to the next, the fraction of files that were porne stayed about the same. (If this is not true, then we can’t really trust any of the experiments in the report.) Making this assumption, we can then calculate the percentage of available files that are porne, both with and without blocking.

Product % Porne, without Filter % Porne, with Filter
Kazaa 27% 0%
Morpheus 20% 38%

The Kazaa filter successfully blocks all of the porne, but we don’t know how much of the non-porne it erroneously blocks. The Morpheus filter does a terrible job, actually making things worse. You could do better by just flipping a coin to decide whether to block each image.

So here’s the bottom line on P2P porne filters: you can have a filter that massively overblocks innocuous images, or you can have a filter that sometimes makes things worse and can’t reliably beat a coin flip. Or you can face the fact that these filters don’t help.

(The report also looked at the effectiveness of the built-in porn filters in Web search engines, but due to methodological problems those experiments don’t tell us much.)

The policy prescription here is clear. Don’t mandate the use of filters, because they don’t seem to work. And if you want filters to improve, it might be a good idea to fully legalize research on filtering systems, so people like Seth Finkelstein can finish the job the GAO started.

Content Filtering and Security

Buggy security software can make you less secure. Indeed, a growing number of intruders are exploiting bugs in security software to gain access to systems. Smart system administrators have known for a long time to be careful about deploying new “security” products.

A company called Audible Magic is trying to sell “content filtering” systems to universities and companies. The company’s CopySense product is a computer that sits at the boundary between an organization’s internal network and the Internet. CopySense watches the network traffic going by, and tries to detect P2P transfers that involve infringing content, in order to log them or block them. It’s not clear how accurate the system’s classifiers are, as Audible Magic does not allow independent evaluation. The company claims that CopySense improves security, by blocking dangerous P2P traffic.

It seems just as likely that CopySense makes enterprise networks less secure. CopySense boxes run general-purpose operating systems, so they are prone to security bugs that could allow an outsider to seize control of them. And a compromised CopySense system would be very bad news, an ideal listening post for the intruder, positioned to watch all incoming and outgoing network traffic.

How vulnerable is CopySense? We have no way of knowing, since Audible Magic doesn’t allow independent evaluation of the product. You have to sign an NDA to get access to a CopySense box.

This in itself should be cause for suspicion. Hard experience shows that companies that are secretive about the design of their security technology tend to have weaker systems than companies that are more open. If I were an enterprise network administrator, I wouldn’t trust a secret design like CopySense.

Audible Magic could remedy this problem and show confidence in their design by lifting their restrictive NDA requirements, allowing independent evaluation of their product and open discussion of its level of security. They could do this tomorrow. Until they do, their product should be considered risky.

Intellectual Property, Innovation, and Decision Architectures

Tim Wu has an interesting new draft paper on how public policy in areas like intellectual property affects which innovations are pursued. It’s often hard to tell in advance which innovations will succeed. Organizational economists distinguish centralized decision structures, in which one party decides whether to proceed with a proposed innovation, from decentralized structures, in which any one of several parties can decide to proceed.

This distinction gives us a new perspective on when intellectual property rights should be assigned, and what their optimal scope is. In general, economists favor decentralized decision structures in economic systems, based on the observation that free market economies perform better than planned centralized economies. This suggests – even accepting the useful incentives created by intellectual property – at least one reason to be cautious about the assignment of broad rights. The danger is that centralization of investment decision-making may block the best or most innovative ideas from coming to market. This concern must be weighed against the desirable ex ante incentives created by an intellectual property grant.

This is an interesting observation that opens up a whole series of questions, which Wu discusses briefly. I can’t do his discussion justice here, so I’ll just extract two issue he raises.

The first issue is whether the problems with centralized management can be overcome by licensing. Suppose Alice owns a patent that is needed to build useful widgets. Alice has centralized control over any widget innovation, and she might make bad decisions about which innovations to invest in. Suppose Bob believes that quabbling widgets will be a big hit, but Alice doesn’t like them and decides not to invest in them. If Bob can pay Alice for the right to build quabbling widgets, then perhaps Bob’s good sense (in this case) can overcome Alice’s doubts. Alice is happy to take Bob’s money in exchange for letting him sell a product that she thinks will fail; and quabbling widgets get built. If the story works out this way, then the centralization of decisionmaking by Alice isn’t much of a problem, because anyone who has a better idea (or thinks they do) can just cut a deal with Alice.

But exclusive rights won’t always be licensed efficiently. The economic literature considers the conditions under which efficient licensing will occur. Suffice it to say that this is a complicated question, and that one should not simply assume that efficient licensing is a given. Disruptive technologies are especially likely to go unlicensed.

Wu also discusses, based on his analysis, which kinds of industries are the best candidates for strong grants of exclusive rights.

An intellectual property regime is most clearly desirable for mature industries, by definition technologically stable, and with low or negative economic growth…. [I]f by definition profit margins are thin in a declining industry, it will be better to have only the very best projects come to market…. By the same logic, the case for strong intellectual property protections may be at its weakest in new industries, which can be described as industries that are expanding rapidly and where technologies are changing quickly…. A [decentralized] decision structure may be necessary to uncover the innovative ideas that are the most valuable, at the costs of multiple failures.

As they say in the blogosphere, read the whole thing.

Is the FCC Ruling Out VoIP on PCs?

The FCC has issued an order requiring VoIP systems that interact with the old-fashioned phone network to provide 911service. Carriers have 120 days to comply.

It won’t be easy for VoIP carriers to provide the 911 service that people have come to expect from the traditional phone system. The biggest challenge in providing 911 on VoIP is knowing where the caller is located.

In the traditional phone system, it’s easy to know the caller’s location. The phone company strings wires from its facility to customers’ homes and offices. Every call starts on a phone company wire, and the phone company knows where each of those wires originates; so they know the caller’s location. The phone company routes 911 calls to the appropriate local emergency call center, and they provide the call center with the caller’s location. One big advantage of this system is that it works even if the caller doesn’t know his location precisely (or can’t communicate it clearly).

Things are different in the VoIP world. Suppose I’m running a VoIP application on my laptop. I can make and receive VoIP calls whenever my laptop is connected to the Internet, whether I’m at home, or in my office, or in a hotel room in Zurich. My VoIP endpoint and my VoIP phone number can be used anywhere. No longer can the carrier map my phone number to a single, fixed location. My number goes wherever my laptop goes.

How can a VoIP carrier know where my laptop is at any given moment? I’m not sure. The carrier could try to see which IP address (i.e., which address on the Internet) my packets are coming from, and then figure out the physical location of that IP address. That will work well if I connect to the Net in the simplest possible way; but more sophisticated connection methods will foil this method. For example, my VoIP network packets will probably appear to come from the Princeton computer science department, regardless of whether I’m at my office, at home, or in a hotel somewhere. How will my VoIP carrier know where I am?

Another approach is to have my laptop try to figure out where it is, by looking at its current IP address (and other available information). This won’t work too well, either. Often all my laptop can deduce from its IP address is that there is a fancy firewall between it and the real Internet. That’s true for me at home, and in most hotels. I suppose you could put a GPS receiver in future laptops, but that won’t help me today.

We could try to invent some kind of Internet-location-tracking protocol, which would be quite complicated, and would raise significant privacy issues. It’s not clear how to let 911 call centers track me, without also making me trackable by many others who have no business knowing where I am.

Tim Lee at Technology Liberation Front suggests creating a protocol that lets Internet-connected devices learn their geographic location. (It might be an extension of DHCP.) This is probably feasible technically, but it take a long time to be adopted. And it surely won’t be deployed widely within 120 days.

All in all, this looks like a big headache for VoIP providers, especially for ones who use existing standard software and hardware. Maybe VoIP providers will take a best-effort approach and then announce their compliance; but that will probably fail as stories about VoIP 911 failures continue to show up in the media.

Of course, VoIP carriers can avoid these rules by avoiding interaction with the old-fashioned phone network. VoIP systems that don’t provide a way to make and receive calls with old-fashioned phone users, won’t be required to provide 911 service. So the real effect of the FCC’s order may be to cut off interaction between the old and new phone systems, which won’t really help anyone.