July 2, 2022

Archives for June 2004


Today I’ll be speaking on a panel at the USENIX Conference in Boston, on “The Politicization of [Computer] Security.” The panel is 10:30-noon, Eastern time. The other panelists are Jeff Grove (ACM), Gary McGraw (Cigital), and Avi Rubin (Johns Hopkins).

If you’re attending the panel, feel free to provide real-time narration/feedback/discussion in the comments section of this post. I’ll be reading the comments periodically during the panel, and I’ll encourage the other panelists to do so too.

Victims of Spam Filtering

Eric Rescorla wrote recently about three people who must have lots of trouble getting their email through spam filters: Jose Viagra, Julia Cialis, and Josh Ambien. I feel especially sorry for poor Jose, who through no fault of his own must get nothing but smirks whenever he says his name.

Anyway, this reminded me of an interesting problem with Bayesian spam filters: they’re trained by the bad guys.

[Background: A Bayesian spam filter uses human advice to learn how to recognize spam. A human classifies messages into spam and non-spam. The Bayesian filter assigns a score to each word, depending on how often that word appears in spam vs. non-spam messages. Newly arrived messages are then classified based on the scores of the words they contain. Words used mostly in spam, such as “Viagra”, get negative scores, so messages containing them tend to get classified as spam. Which is good, unless your name is Jose Viagra.]

Many spammers have taken to lacing their messages with sections of “word salad” containing meaningless strings of innocuous-looking words, in the hopes that the word salad will trigger positive associations in the recipient’s Bayesian filter.

Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users’ Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.

This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user’s filter.

Of course, this won’t work, or will be less effective, for words that have appeared frequently in a user’s legitimate messages in the past. But it might work for a word that is about to become more frequent, such as the name of a person in the news, or a political party. For example, somebody could have tried to poison “Fahrenheit” just before Michael Moore’s movie was released, or “Whitewater” in the early days of the Clinton administration.

There is a general lesson here about the use of learning methods in security. Learning is attractive, because it can adapt to the bad guys’ behavior. But the fact that the bad guys are teaching the system how to behave can also be a serious drawback.

"Tech" Lobbyists Slow to Respond to Dangerous Bills

Dan Gillmor, among others, bemoans the lack of effective lobbying by technology companies. Exhibit A is their weak and disorganized response to various bills, such as the Hatch INDUCE/IICA Act, that would give the movie and music industries veto power over the development of new technology. It’s true that large tech companies have been slow and clumsy in addressing these issues; but that’s not the whole story.

The other part of the story is that the interests of a few large tech companies don’t necessarily coincide with those of the technology industry as a whole, or of the users of technology. Giving the entertainment industry a veto over new technologies would have two main effects: it would slow the pace of technical innovation, and it would create barriers to entry in the tech markets. Incumbent companies may be perfectly happy to see slower innovation and higher barriers to entry, especially if the entertainment-industry veto contained some kind of grandfather clause, either implicit or explicit, that allowed incumbent products to stay in the market – as seems likely should such a veto be imposed.

Just to be clear, an entertainment-industry veto would surely hurt the tech incumbents. It’s just that it would hurt their upstart competitors more. So it’s not entirely surprising that the incumbents would have some mixed feelings about veto proposals, though it is disappointing that the incumbents aren’t standing up for the industry as a whole.

What can be done about this? I don’t see an easy answer. In Washington, it seems to be standard procedure to mistake the voices of a few incumbents for those of a whole industry. Certainly, the incumbents have no interest in contradicting that assumption. Our best hope is that the incumbents will see it in their own long-term interest to foster a fast-moving, highly competitive industry.