January 6, 2025

Debate: Will Spam Get Worse?

This week I participated in Business Week Online’s Debate Room feature, where two people write short essays on opposite sides of a proposition.

The proposition: “Regardless of how hard IT experts work to intercept the trillions of junk e-mails that bombard hapless in-boxes, the spammers will find ways to defeat them.” I argued against, concluding that “We’ll never be totally free of spam, but in the long run it’s a nuisance—not a fundamental threat—to the flourishing of the Internet.”

Wikipedia Leads; Will Search Engines NoFollow?

Wikipedia has announced that all of its outgoing hyperlinks will now include the rel=”nofollow” attribute, which instructs search engines to disregard the links. Search engines infer a page’s importance by seeing who links to it – pages that get many links, especially from important sites, are deemed important and are ranked highly in search results. A link is an implied endorsement: “link love”. Adding nofollow withholds Wikipedia’s link love – and Wikipedia, being a popular site, has lots of link love to give.

Nofollow is intended as an anti-spam measure. Anybody can edit a Wikipedia page, so spammers can and do insert links to their unwanted sites, thereby leeching off the popularity of Wikipedia. Nofollow will reduce spammers’ incentives by depriving them of any link love. Or that’s the theory, at least. Bloggers tried using nofollow to attack comment spam, but it didn’t reduce spam: the spammers were still eager to put their spammy text in front of readers.

Is nofollow a good idea for Wikipedia? It depends on your general attitude toward Wikipedia. The effect of nofollow is to reduce Wikipedia’s influence on search engine rankings (to zero). If you think Wikipedia is mostly good, then you want it to have influence and you’ll dislike its use of nofollow. If you think Wikipedia is unreliable and random, then you’ll be happy to see its influence reduced.

As with regular love, it’s selfish to withhold link love. Sometimes Wikipedia links to a site that competes with it for attention. Without Wikipedia’s link love, the other site will rank lower, and it could lose traffic to Wikipedia. Whether intended or not, this is one effect of Wikipedia’s action.

There are things Wikipedia could do to restore some of its legitimate link love without helping spammers. It could add nofollow only to links that are suspect – links that are new, or were added by an user without a solid track record on the site, or that have survived several rewrites of a page, or some combination of such factors. Even a simple policy of using nofollow for the first two weeks might work well enough. Wikipedia has the data to make these kinds of distinctions, and it’s not too much to ask for a site of its importance to do the necessary programming.

But the one element missing so far in this discussion is the autonomy of the search engines. Wikipedia is asking search engines not to assign link love, but the search engines don’t have to obey. Wikipedia is big enough, and quirky enough, that the search engines’ ranking algorithms probably have Wikipedia-specific tweaks already. The search engines have surely studied whether Wikipedia’s link love is reliable enough – and if it’s not, they are surely compensating, perhaps by ignoring (or reducing the weight of) Wikipedia links, or perhaps by a rule such as ignoring links for the first few weeks.

Whether or not Wikipedia uses nofollow, the search engines are free to do whatever they think will optimize their page ranking accuracy. Wikipedia can lead, but the search engines won’t necessarily nofollow.

Spam is Back

A quiet trend broke into the open today, when the New York Times ran a story by Brad Stone on the recent increase in email spam. The story claims that the volume of spam has doubled in recent months, which seems about right. Many spam filters have been overloaded, sending system administrators scrambling to buy more filtering capacity.

Six months ago, the conventional wisdom was that we had gotten the upper hand on spammers by using more advanced filters that relied on textual analysis, and by identifying and blocking the sources of spam. One smart venture capitalist I know declared spam to be a solved problem.

But now the spammers have adopted new tactics: sending spam from botnets (armies of compromised desktop computers), sending images rather than text, adding randomly varying noise to the messages to make them harder to analyze, and providing fewer URLs in messages. The effect of these changes is to neutralize the latest greatest antispam tools; and so the spammers are pulling back ahead, for now.

In the long view, not much has changed. The arms race will continue, with each side deploying new tricks in response to the other side’s moves, unless one side is forced out by economics, which looks unlikely.

To win, the good guys must make the cost of sending a spam message exceed the expected payoff from that message. A spammer’s per-message cost and payoff are both very small, and probably getting smaller. The per-message payoff is probably decreasing as spammers are forced to new payoff strategies (e.g., switching from selling bogus “medical” products to penny-stock manipulation). But their cost to send a message is also dropping as they start to use other people’s computers (without paying) and those computers get more and more capable. Right now the cost is dropping faster, so spam is increasing.

From the good guys’ perspective, the cost of spam filtering is increasing. Organizations are buying new spam-filtering services and deploying more computers to run them. The switch to image-based spam will force filters to use image analysis, which chews up a lot more computing power than the current textual analysis. And the increased volume of spam will make things even worse. Just as the good guys are trying to raise the spammers’ costs, the spammers’ tactics are raising the good guys’ costs.

Spam is growing problem in other communication media too. Blog comment spam is rampant – this blog gets about eight hundred spam comments a day. At the moment our technology is managing them nicely (thanks to akismet), but that could change. If the blog spammers get as clever as the email spammers, we’ll be in big trouble.

Why So Little Attention to Botnets?

Our collective battle against botnets is going badly, according to Ryan Naraine’s recent article in eWeek.

What’s that? You didn’t know we were battling botnets? You’re not alone. Though botnets are a major cause of Internet insecurity problems, few netizens know what they are or how they work.

In this context, a “bot” is a malicious software agent that gets installed on an unsuspecting user’s computer. Bots get onto computers by exploiting security flaws. Once there, they set up camp and wait unobtrusively for instructions. Bots work in groups, called “botnets”, in which many thousands of bots (hundreds of thousands, sometimes) all over the Net work together at the instruction of a remote badguy.

Botnets can send spam or carry out coordinated security attacks on targets elsewhere on the Net. Attacks launched by botnets are very hard to stop because they come from so many places all at once, and tracking down the sources just leads to innocent users with infected computers. There is an active marketplace in which botnets are sold and leased.

Estimates vary, but a reasonable guess is that between one and five percent of the computers on the net are infected with bots. Some computers have more than one bot, although bots nowadays often try to kill each other.

Bots exploit the classic economic externality of network security. A well-designed bot on your computer tries to stay out of your way, only attacking other people. An infection on your computer causes harm to others but not to you, so you have little incentive to prevent the harm.

Nowadays, bots often fight over territory, killing other bots that have infected the same machine, or beefing up the machine’s defenses against new bot infections. For example, Brian Krebs reports that some bots install legitimate antivirus programs to defend their turf.

If bots fight each other, a rationally selfish computer owner might want his computer to be infected by bots that direct their attacks outward. Such bots would help to defend the computer against other bots that might harm the computer owner, e.g. by spying on him. They’d be the online equivalent of the pilot fish that swim into sharks’ mouths with impunity, to clean the sharks’ teeth.

Botnets live today on millions of ordinary users’ computers, leading to nasty attacks. Some experts think we’re losing the war against botnets. Yet there isn’t much public discussion of the problem among nonexperts. Why not?

Spamhaus Tests U.S. Control Over Internet

In a move sure to rekindle debate over national control of the Internet, a US court may soon issue an order stripping London-based spamhaus.org of its Internet name.

Here’s the backstory. Spamhaus, an anti-spam organization headquartered in London, publishes ROKSO, the “Register of Known Spam Operations”. Many sites block email from ROKSO-listed sites, as an anti-spam tactic. A US company called e360 sued Spamhaus, claiming that Spamhaus had repeatedly and wrongly put e360 on the ROKSO, and asking the court to award monetary damages and issue an injunction ordering e360’s removal from ROKSO.

Spamhaus lost the case, apparently due to bad legal maneuvering. Faced with a U.S. lawsuit, Spamhaus had two choices: it could challenge the court’s jurisdiction over it, or it could accept jurisdiction and defend the case on the merits. It started to defend on the merits, but then switched strategies, declaring the court had no jurisdiction and refusing to participate in the proceedings. The court said that Spamhaus had accepted its jurisdiction, and it proceeded to issue a default judgment against Spamhaus, ordering it to pay $11.7M in damages (which it apparently can’t pay), and issuing an injunction ordering Spamhaus to (a) take e360 off ROKSO and keep it off, and (b) post a notice saying that previous listings of e360 had been erroneous.

Spamhaus has ignored the injunction. As I understand it, courts have broad authority to enforce their injunctions against noncompliant parties. In this case, the court is considering (but hasn’t yet issued) an order that would revoke Spamhaus’s use of the spamhaus.org name; the order would require ICANN and the Tucows domain name registry to shut off service for the spamhaus.org name, so that anybody trying to go to spamhaus.org would get a domain-not-found error. (ICANN says it’s up to Tucows to comply with any such order.)

There are several interesting questions here. (1) Is it appropriate under U.S. law for the judge to do this? (2) If the spamhaus.org is revoked, how will spamhaus and its users respond? (3) If U.S. judges can revoke domain name registrations, what are the international implications?

I’ll leave Question 1 for the lawyers to argue.

The other two questions are actually interrelated. Question 3 is about how much extra power (if any) the US has by virtue of history and of having ICANN, the central naming authority, within its borders. The relevance of any US power depends on whether affected parties could work around any assertion of US power, which gets us back to Question 2.

Suppose that spamhaus.org gets shut down. Spamhaus could respond by registering spamhaus.uk. Would the .uk registry, which is run or chartered by the UK government, comply with a US court order to remove Spamhaus’s registration? My guess would be no. But even if the .uk registry complied and removed spamhaus.uk, that decision would not depend on any special US relationship to ICANN.

The really sticky case would be a dispute over a valuable name in .com. Suppose a US court ordered ICANN to yank a prominent .com name belonging to a non-US company. ICANN could fight but being based in the US it would probably have to comply in the end. Such a decision, if seen as unfair outside the US, could trigger a sort of constitutional crisis for the Net. The result wouldn’t be pretty. As I’ve written before, ICANN is far from perfect but the alternatives could be a lot worse.

(via Slashdot)