Eric Rescorla wrote recently about three people who must have lots of trouble getting their email through spam filters: Jose Viagra, Julia Cialis, and Josh Ambien. I feel especially sorry for poor Jose, who through no fault of his own must get nothing but smirks whenever he says his name.
Anyway, this reminded me of an interesting problem with Bayesian spam filters: they’re trained by the bad guys.
[Background: A Bayesian spam filter uses human advice to learn how to recognize spam. A human classifies messages into spam and non-spam. The Bayesian filter assigns a score to each word, depending on how often that word appears in spam vs. non-spam messages. Newly arrived messages are then classified based on the scores of the words they contain. Words used mostly in spam, such as “Viagra”, get negative scores, so messages containing them tend to get classified as spam. Which is good, unless your name is Jose Viagra.]
Many spammers have taken to lacing their messages with sections of “word salad” containing meaningless strings of innocuous-looking words, in the hopes that the word salad will trigger positive associations in the recipient’s Bayesian filter.
Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users’ Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.
This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user’s filter.
Of course, this won’t work, or will be less effective, for words that have appeared frequently in a user’s legitimate messages in the past. But it might work for a word that is about to become more frequent, such as the name of a person in the news, or a political party. For example, somebody could have tried to poison “Fahrenheit” just before Michael Moore’s movie was released, or “Whitewater” in the early days of the Clinton administration.
There is a general lesson here about the use of learning methods in security. Learning is attractive, because it can adapt to the bad guys’ behavior. But the fact that the bad guys are teaching the system how to behave can also be a serious drawback.
I don’t think word filter could stop spam. The problem is that the spammers have also had a good hard look at how filtering works. Spammers have gone out of there way to find ways to get their messages past the filtering concept.
The other thing that keeps this from being an issue is that most filters are “train on error”. Independent of the issues others have raised about whether the attack could even be effective, in order to execute the attack you have to send a spam that:
1) is unambiguously spam such that the person reading it will train it as spam
2) evaded the filters the first time and needs to be retrained
3) contains the attack words
In order to affect the filters, the attacker needs to send multiple attack spams all of which fit these criteria. In other words, the attacker has to be able to evade the filters at will with specially crafter spam messages, which is a tall order. It’s interesting to think about, but I think this is not something that could be practically executed in the real world.
I’ve got my own somewhat more detailed comment is available here.
Summary, though, is that I don’t agree that this “poisoning” approach would necessarily work on Bayesian filters — my (somewhat crude) understanding is that they actually minimize the possibility of one word poisoning an otherwise good message.
From A Plan for Spam:
Because it is measuring probabilities, the Bayesian approach considers all the evidence in the email, both good and bad. Words that occur disproportionately rarely in spam (like “though” or “tonight” or “apparently”) contribute as much to decreasing the probability as bad words like “unsubscribe” and “opt-in” do to increasing it. So an otherwise innocent email that happens to include the word “sex” is not going to get tagged as spam.
The argument is flawed for very simple reason: only really stupid Bayesian filters decide about spamicity based just on one word. So even Mr. Viagra will probably word “Viagra” in most of his emails, and so Bayesian filter trained on his own messages would recognize this word as neutral (probability of being spam around 0.5).
To understand better the mechanism by which a real Bayesian filter (bogofilter in this case) calculates spamicity read for example this thread thread on bogofilter email list and the original article about using Bayesian method for catching spam
Have a nice day,
Matej
Isn’t it clear, then, that the atomic unit for identifying spam shouldn’t be the word? My understanding of how Razor works is that it hashes a putative spam (submitted by a Razor user), or uses something very much like a hash that also allows for a “similarity percentage.” That is, this putative spam may not look exactly like an earlier spam, but it’s 90% like an earlier spam.
Point being: the atomic unit ought to be the message itself, not individual words.
Why can’t I find _this_ filter in Eudora (or Lotus Notes, which I use at work – I don’t know about Outlook).
“if the sender is in an address book I can access, then put the e-mail in my in-box, otherwise put it in a ‘possible-spam’ folder to be further filtered”
This is an “opt-in” plan of sorts, since I automatically opt to accept mail from co-workers in the organization’s address book, outside contractors in the organization’s address book (people we do business with, maybe), and personal contacts in my own address book. As long as the address books are protected from theft, I shouldn’t get a lot of things passing the filter.
“You could argue that the spammers are simply trying to convince users not to use the filters and to view the spam. Its a stretch, but sure.”
Greg, as the admin of a mail server for a small web-hosting company, they’ve convinced me to turn off the Bayes filter for this very reason.
I sent a question to one of the candidates in the Canadian election about a month ago. A few days ago, I finally got a reply, starting off with this :
>Many apologies for not replying to your questions sooner!! I must have across as quite dismissive. Sorry. I had been receiving so many junk emails from companies carrying “Brand name” vitamins and software etc. that I blocked the word “brand” with my spam filter…lesson learned!
At least she eventually checked her “possible spam” folder.
“What good does it do spammers to prevent messages about Moore’s movie from getting to users?”
Uh… the fact that the user has to turn off his spam filter to be able to receive legitimate emails?
Your argument is pretty heavily flawed, as Greg point out.
As a practical example, consider all the spam concerning Mortgages. I recently bought a home and took out a mortgage, communicating regularly with several different lenders by email. Not once was an email even close to being marked as Spam by SpamAssassin — Indeed, its Bayesian component marked them all as having a bayes score of 0.
So don’t worry. Spammer don’t have that much power.
Rod.
This isn’t really much of a concern.
The scenario you propose where a word that is about to be in emails more frequently doesn’t benefit spammers. What good does it do spammers to prevent messages about Moore’s movie from getting to users? You could argue that the spammers are simply trying to convince users not to use the filters and to view the spam. Its a stretch, but sure.
Even in this case, the saturation of one strongly negative word would only miss the user’s filter if there were an email that didn’t match any positive words as well. If the user recieved a message with their friend’s email address in it, that would be a highly positive word – more so than the negativity of fahrenheit. The same would be true if the user recieved regular messages from their DVD store or a movie review website.
To argue that the spammer could send large volumes of seeded messages to make this word so bad it would override the other words positive scores, would also be a fallacy. Bayesian statistics takes into account the probability of the event independent of the data as well. So, if the user was recieving 1000 spams for every 1 legit message, a legit message with the word farenheit would be far more meaningful.
A downside to Bayesian analysis on spam
BoingBoing has a link up to this article up by Edward Felton. In it, he mentions that it is possible for spammers to poison the net with a bias against certain words if they want. By using a certain word…