A quiet trend broke into the open today, when the New York Times ran a story by Brad Stone on the recent increase in email spam. The story claims that the volume of spam has doubled in recent months, which seems about right. Many spam filters have been overloaded, sending system administrators scrambling to buy more filtering capacity.
Six months ago, the conventional wisdom was that we had gotten the upper hand on spammers by using more advanced filters that relied on textual analysis, and by identifying and blocking the sources of spam. One smart venture capitalist I know declared spam to be a solved problem.
But now the spammers have adopted new tactics: sending spam from botnets (armies of compromised desktop computers), sending images rather than text, adding randomly varying noise to the messages to make them harder to analyze, and providing fewer URLs in messages. The effect of these changes is to neutralize the latest greatest antispam tools; and so the spammers are pulling back ahead, for now.
In the long view, not much has changed. The arms race will continue, with each side deploying new tricks in response to the other side’s moves, unless one side is forced out by economics, which looks unlikely.
To win, the good guys must make the cost of sending a spam message exceed the expected payoff from that message. A spammer’s per-message cost and payoff are both very small, and probably getting smaller. The per-message payoff is probably decreasing as spammers are forced to new payoff strategies (e.g., switching from selling bogus “medical” products to penny-stock manipulation). But their cost to send a message is also dropping as they start to use other people’s computers (without paying) and those computers get more and more capable. Right now the cost is dropping faster, so spam is increasing.
From the good guys’ perspective, the cost of spam filtering is increasing. Organizations are buying new spam-filtering services and deploying more computers to run them. The switch to image-based spam will force filters to use image analysis, which chews up a lot more computing power than the current textual analysis. And the increased volume of spam will make things even worse. Just as the good guys are trying to raise the spammers’ costs, the spammers’ tactics are raising the good guys’ costs.
Spam is growing problem in other communication media too. Blog comment spam is rampant – this blog gets about eight hundred spam comments a day. At the moment our technology is managing them nicely (thanks to akismet), but that could change. If the blog spammers get as clever as the email spammers, we’ll be in big trouble.
Unless we find a way to make spam UN DELIVERABLE to the sender, we’re going to be stuck with it. If I send an e mail to a bad address I get it bounced back as un-deliverable. If I could somehow make my address appear in-valid to any sender I choose, that would be the way to go. Someone might know how to write a piece of software that would do this
Frank:
It won’t deter it. But your example is not “unsolicited bulk email” or whatever we are calling spam right now.
Your example is thousands of individuals deciding to spend the money/cpu cycles/whatever to send the text message – which is a different problem.
In snail mail terms you are talking about Tom Hank’s bags of fan mail, while this discussion is about the “You could already have won $1 million…” junk mail that we all receive. They are quite different problems, with different solutions.
My opinion of all this is that it is down to the ISPs. They could essentially stop this by setting as customer defaults:
1. Block SMTP connections to anything other than the ISPs mail server
2. Rate limit email sending
3. Reduce the rate as the volume goes up
4. Allow customers to opt out of these limits on a case by case basis.
5. Give users properly configured ADSL/Cable routers instead of USB modems (to slow down infection in the first place).
How many home broadband users really need to send 100+ emails per hour, 24 hours a day? Lets face it, most of the PCs in these botnets are clueless Joe Bloggs users who bought a broadband connection to plug straight into their never-patched Dell PC. These are the users who have never heard of email, but do send a few “hotmails” to their friends from time to time.
Get rid of these low-hanging fruit and you make things much harder for spammers to build their bot-nets.
“Also – Here is what CPU cycles will not deter.”
I noticed that you did not respond to this?
“If you are Amazon, DoubleClick, or Digital Impact – and it takes you 1 millisecond to send a message…”
You seem to assume that what these large spammers will do is buy more hardware to continue to send bulk email at their current rates, rather than reduce the volume they send out and target it better. But it’s easy to see which of those two would be more cost-effective for them. And it isn’t the hardware upgrades.
You also forget that the cost is incurred only when sending potentially unwanted email. If someone okays your mail, sending them more is as cheap as it is now. It’s making new contacts (or harassing the hell out of people who have already rejected your previous umpteen messages) that would increase, *only*.
[proceeds to push micropayments]
Won’t wash as I already said. Forget the banking/credit card companies. The added user mental work and the added user hoop-jumping entailed by the proliferation of logins and registrations and privacy and financial risks are showstoppers *all by themselves*.
“If these assumptions were correct companies such as bitpass.com and peppercoin.com would not be in business today.”
I had never heard of either of those until two seconds ago. Why? Because they’re really obscure. Why? Because they’re just the latest couple of examples of a long history of repeated attempts to launch new micropayment systems on the net, and they all keep blowing up on the pad, that’s why.
Also – Here is what CPU cycles will not deter.
While CPU cycles might slow down the illegitimate marketer sending tons of crap it has no ability to allow mid or high profile individuals to relax the guards on their digital points of contact.
Read this and (Look for the reference to Bill Gates) and tell me how CPU cycles would address it:
http://www.msnbc.msn.com/Default.aspx?id=9557182&site=newsweek&uart=6&uarc=Rating
I’m sorry but there is indeed an environmental cost consequence to CPU burning. Here’s an example: If you are Amazon, DoubleClick, or Digital Impact – and it takes you 1 millisecond to send a message and now a new system artificially increases that time to 1 second you will invest in higher power machines (or more machines) to accomplish the same work you did yesterday. It is no different than ISP’s having to “add” CPU power to process inbound spam. These CPUs cost “money” to run, they cost money to buy and they cost money to cool. None of us need this added cost to our virtual or physical society. (Prove this for yourself? – For those of you that have a CPU “speed” control on their laptop try lower the speed. Yes, things will get slower but the laptop will also run cooler. – And cooler does mean less electrical energy all around.)
Now, At first glance it would seem logical to shift the cost to the sender via CPU overhead. And if that were the best solution I would applaud it and given in that the “waste” of costly CPU cycle burning was simply a part of doing business.
But, let’s think outside the box for minute. What I am asking you to consider is a method that does not result in an added artificial cost at all. In fact, it results in zero cost for the legitimate desirable message. Therefore – It is not a system of a payment – (pay/message would destroy email!) – what it is rather is a system to guarantee “with cash” respect.
Now, if an occasional guarantee is exercises by a recipient (it is unlikely but will happen on rare occasion) the act of moving the cash supporting the guarantee is very doable. The problem many have with Micropayment processing is that they make assumptions regarding the transaction cost using data from traditional banking & credit card companies, then they self impose limiting logic with these banking model limitations. If these assumptions were correct companies such as bitpass.com and peppercoin.com would not be in business today.
But then again – the world is covered with statements like: “Who would ever buy from an auction, from someone they don’t know and probably can’t trust, over the Internet?”
What environmental cost? This isn’t paper mail we’re talking about here.
The idea is to make mass mailing expensive, right? Can you suggest a better way? Actually paying money is out of the question; not only won’t people stand for it, but the cost of sending an email would be dominated by the transaction costs unless the “postage” was insanely high. Talk about waste. That’s why micropayment schemes never take off — the transaction costs. Consider the example of a Web site going from free viewing of any page to 1 cent per page view. What actually happens at the user’s end?
They go from just click and you see the page to click and see a login/register page; deep linking’s kaput.
They have to register at every site that does this, which quickly adds up to a trillion userid/password pairs to memorize. Unless they do something dumb, like use the same password everywhere; then it’s just a trillion userids (since they are surely not going to be able to use the same one everywhere without encountering cases of it already being taken), which is really just as bad.
Next they have to think carefully about every link they follow or page they view to be frugal, or they’ll be nickle-and-dimed to death before they know it. The mental stress transaction cost alone just skyrocketed.
Then the processing of all these penny transfers probably costs about a dime per page. So they actually have to charge 11 cents per pageview or be losing money, not just 1, but most of the money gets pocketed by Paypal or some equivalent.
To top it off, before long all the forgotten passwords and similar headaches create gripes and the solution pushed is, no doubt, not going back to free pageviews but going forward to a universal logon and universal online banking system.
Then the privacy and fraud problems we’ve had online so far start to look like tempests in assorted teapots compared to what happens next. Just for starters, everyone and their marketing department will insist on having input into what information you have to disclose to get a universal id. The result will be like previous attempts at a universal logon. Remember M$ Passport? Ten thousand forms to fill in, including having to check dozens of “I don’t want to receive foobar’s special offers” boxes or be spammed to death. More personal disclosure and forms to fill in than for a $20,000 car loan. Hours just to create a new throwaway Hotmail account to use for some dumb site that demanded registration info and probably planned to spam you. What a pain! And of course all of this information gets harvested by some kind of super-ChoicePoint which proceeds to sell it to anyone who can pay their price — marketers, politicians, identity thieves, con artists, lawyers, spammers … in short, scum varying only in the precise shade of green and level of stench emanating therefrom.
And changing the id is a huge pain, as much so as changing your phone number or primary email is now, and this stress-barrier-to-exit is exploited to push shoddy behavior at users, and with it being tied into your money and all … well, if you thought phone company nickle-and-diming was bad, just you wait.
End result: a penny per pageview actually makes the Web a lot more expensive, slow, annoying, and unsafe to use.
Transaction costs, my friend. Transaction costs.
The “point” is to make it expensive for the spammer not just any “big guy”. Why do you think email became the killer app in the first place.
Waste is waste, no matter how you present it. And if there is no reason for it, waste should not be artificially created. “especially” if there is an environmental cost connected with it.
Isn’t the whole *point* to make it expensive for “big guys” to do mass e-mailings?
The CPU work is performed by the sender’s computer, so only if the sender is a big datacenter does this need “extra heating and cooling” and so forth. 😛
Crosbie has a good idea too: make the task something useful in the DC sense. Your cycles to a worthy cause become the postage. It has to be work that can be checked quickly though. NP-complete problems make sense for this, since then it’s quick to verify a piece of work relative to the power needed to do it to begin with. Prime number finding is an example (since factoring bignums is NP-complete), but maybe not too practical in value. Helping the CIA crack North Korean crypto would be another example. 🙂 (Different servers might provide different sorts of work. There’d need to be a language for describing scientific computations that could be easily interpreted and couldn’t carry viruses and other malware. A bytecode-interpreted, sandboxed LISP or OCAML perhaps, or even fortrash…something stripped down and functional (in the formal sense of “a functional language”) incapable of any kind of side effects (such as disk or registry access) and easy to implement cross-platform. Or you could just use Java, which would be sure to bog down a would-be spammer’s computer until it creaked along like a stone age abacus!)
CPU work only needs to be expended when attempting to form new relationships – and it doesn’t need to be performed by intermediaries – only the originator.
It may even be possible to provide a work dispensary that enables useful work to be performed, but where the worker is certified uncompensated.
Neo – That sounds a great deal like MS’s PennyBlack project.
What I don’t understand is that organizations like Google. Amazon, Yahoo, and Citibank send millions of email a day. Why on earth would you want to tax them with a require to provide all this extra processing power?
Remember that all these CPU cycles your advocating we burn take electrons. So that means we must burn more coal, heat more data centers and therefore cool more data centers. Not to mention you’ll be forcing the big guys to buy bigger hardware just to do what they did before with a fraction of CPU power. Does this not sound wasteful?
A commonly used definition is “unsolicited BULK email”. Commercial or non-commercial, doesn’t matter. Of course, what threshold of #recipients makes it “bulk” depends.
How about we replace the existing email system by gradually phasing in CMTP (complex mail transport protocol :)), where the following occurs:
* Sending machine contacts recipient’s CMTP server.
* CMTP server responds with a cryptographic challenge that contains a) a random string (new each time) and b) an encryption of that string and another random, secret string (also new) using a known algorithm. The encryption is deliberately weak enough to crack in a few minutes with a modern computer.
* Sending machine has to brute-force the encryption to determine the secret string.
* Sending machine sends the secret string to the server.
* Server will now accept mail.
This must be done once PER RECIPIENT.
Sending to targeted mailing lists will be slow enough to limit sending frequency. Sending untargeted mail to everyone would take forever or a massive botnet; the volume of mail a given machine (or botnet) can generate is drastically reduced. Sending mail is time-consuming enough that it is no longer worth a spammer’s while sending untargeted crap. Your software that will save 250 people thousands would be worth the time it takes to email these companies using CMTP. Gen@r1c v1agr@ would not be.
Technical stuff:
a) CMTP would need an algorithm that can quickly convolve two known strings, and whose output is unique to the pair of strings, but where recovering the second string from the first and the output is slow. (If the convolution is too slow the CMTP server is very easy to DoS.)
b) CMTP has to keep pace with Moore’s Law. Lengthening the strings is the obvious way to keep it computationally expensive to send mail, and that’s controlled by the server that doesn’t want to be spammed, you’ll note.
c) Fine tuning would be needed to make spam too expensive to be worthwhile, but not make legitimate use of email likewise too expensive.
d) There’s room for individual variation in this system; a CMTP server delivering mail for a group of people that really should not be and don’t wish to be bothered with anything not narrowly and specifically on topic can use slightly longer strings than everyone else currently is, making sending mail to them take a while.
e) Whitelisted senders might be permitted through, if CMTP includes optional sender authentication. Unauthenticated mail still requires the computational hoop-jumping, as does authenticated mail from senders not specifically authorized. The latter may get a quicker computational challenge than the former. It should be a requirement that the system accept unauthenticated mail after a not-unreasonable challenge (one that takes many hours to solve on a mid-range PC would be considered unreasonable) however.
Until recently I’ve accepted the term “unsolicited commercial email”. But I’m beginning to think that this definition or terminology is inadequate. I have had several people email me their resumes (related to commercial activity, unsolicited, but valuable and appropriate use of email from a stranger).
I have been carefully researching companies involved in CNC machining and custom manufacturing that may want to make use of my company’s (GreenTree Software) Workorder Management System, and I have identified about 150 potential customers who would specifically benefit from the software. I don’t know these companies, but I suspect the software would save them thousands a year, and I suspect that some of them would be interested in purchasing the software. As a really small company I do want to keep my costs low, and at the same time I want to respect everyone I’m doing business with. I see a difference between MASS quantities of untargetted emails (spammers with CDs holding millions of email addresses) and companies that legitimately have identified another company through careful research and investigation.
I agree that spam is really bad, and I’ve seen 87% of my incoming email is currently spam, but I’ve employed a number of defensive strategies and am using a really effective anti-spam tool. Perhaps some of the readers here will benefit from reading the spamtips I’ve posted online here: http://www.greentreesoftware.ca/support/nospam.php
I would suggest that SPAM is by definition “unsolicited” + “untargetted” + “antisocial”
It doesn’t need to be commercial, it could be political or ideological.
The antisocial part illustrates how spam cares more about selfish benefit than respecting the wishes of others.
My thoughts…
Here my point in a nutshell – All of the technology we’ve thrown at this problem has been very creative stuff. But, the problem is that all the cost is shouldered by the network, the ISPs, the ESPs and the recipients. And when the technology dujor is foiled the cost again is absorbed by the recipients. Yet for the spammer – it’s all gain and no pain.
Now, many many propose a redesign of SMTP so we can have accurate knowledge of the sender’s ID. My analogy to this is – Just because I know the phone numbers (and addresses) of everyone in my phone book it doesn’t mean I want to (or have the time to) take a call from each of them. Again, the expense of dealing with this is mine and it shouldn’t be. If the legal route is your desired approach these costs are also shouldered at my expense.
However, the one thing I am near certain of, is that you will never see someone offering you a real cash guarantee for any product or service unless they are near certain you will have an appreciation for their offering. Now, in this example, the cost is shouldered by the abuser – and you are in complete control. A) The sender’s costs are directly tied to the frequency and level of their abusive behavior B) If the recipient abuses the sender (by taking a guarantee unjustly) it will be less likely they would receive a future guarantee. In this model email remains untaxed for those of us that respect the channel, therefore the majority of us will still benefit from all that email (as it was designed) has to offer.
Tel –
You state – “if some low-priority stranger misses out on a reply then tough luck”
What if there were one message of interest mixed in among the many “low” priority messages? And how are you proposing a “key word” search to ensure relevance? This sounds like a technology approach to understanding what’s going on in life (and in your head) at this very moment in time – aka filters (rules or statistical), which ever have proven to be very much prone to the “cat & mouse” dilemma.
Getting back to Identification – Think of the number of legitimate business level internet users, now add to them the number of illegitimate businesses. Are you saying that knowing the source ID gives you the ability to defer contact with zero interruption on your part? Again, if you are stating there is some form of technology that can sort through the content and can determine (unattended) “reasonable relevance” I believe that the same technology would have been able to stop the onslaught we are all seeing today.
The law is not an effective tool for this job, but making it easy for the receiver to decide what they are interested in should do the trick. SMTP identification in itself isn’t enough… the point is that the receiver must have the ability to both identify the sender and to arbitrarily defer fetching the body of the message should they so choose. They pay first attention to the senders that they know are interesting and then keyword search the remaining proposals to pull in enough to provide some “reasonable” level of unknown senders.
People can only cope with so much input signal anyhow so if some low-priority stranger misses out on a reply then tough luck — worse things happen in this world every day.
It is possible to do whitelist filtering at the SMTP level because the sender must announce their own email address BEFORE the main message. Slam the connection shut before the message body comes through and save the bandwidth. Operating a junk-mail account does waste bandwidth but that’s always a cost you can defer to gmail. Having said that, a better designed transport mechanism could do even more to reduce bandwidth wasted on useless messages.
OK, I’m late to this party but, this is great thread! Now, let’s think outside the box here. The web is a new medium. One that has yielded numerous new benefits (and will yield countless more), but also obstacles that will require the use of the power within in to resolve.
First, my comments thus far –
For – Jim Horning :
Ah, yes, economics is the key, BUT the value & control of that value must be personal (eg: under your control) – Otherwise, there will always be some level of abusive behavior.
Email does want to be free – BUT what if we could keep it that way?
For Devonavar:
e-postage: You are correct, e-postage does tax the wrong people. Free email is part of what allowed the Internet to explode. While I do not condone the idea of e-stamps, I do disagree with the argument that bot-nets will cost the sender hundreds of dollars. It will cost little and save us all. Why? A) First of all, your cost would be limited to the number of estamps you purchased. (How many of you run around with hundreds of dollars of Postal Stamps in your pocket?) and B) When you do purchase $5 dollars worth of postage you care for them like cash. If you lose them, you take better care of them next time. So, if you purchased $5 dollars worth of e-stamps and then didn’t protect your computer, then shame on whom? While it’s only 5 dollars, you will find a better way to protect your computer.
Peter Clay: the cost of pollution – Exactly, the cost should be directly connected to the frequency of the abuse HOWEVER, laws simply won’t work. You need to put the owner of the email box in control of the “potential” fine. If you send junk, then you should be subject to shoulder the cost of Undesirable Interruptions.
Avi Flamholz – Good solutions are proprietary & costly: Good point – If they are not proprietary, they would be useless but again, this expense should not be shouldered by the recipient. Rather, you are the mailbox owner – It is the “inbound” unsolicited information that should come to you with a personal guarantee of value.
Tel – Fix SMTP so Identification is guaranteed – Identification will not work. Think about it. Sure, ID will stop those within our jurisdiction from breaking the law. But there are two problems – A) what about those outside the jurisdiction and b) what about those NOT breaking the law? Let’s return to economics for a moment. What would happen to the amount of junk snail mail you get today if the cost of postal stamps dropped to one penny (which is about a hundred times more costly than sending spam.) Ah, and to make the analogy fair, let’s assume the USPS is throwing in the paper and the printing costs for duplication. How would your post box look now?
Lesage – The cost of bandwidth abuse – Perfectly stated!
Mike – The cost of false positives – Finally, the fact that I, as a sender, can not guarentee delivery of my message (without e-postage) is cost us all a fortune in wasted time.
In short – this entire problem is born by the fact that “your contact points” are valuable to the commercial world. No other commercial media has this problem – because – they control the value of their channel. And if they do have the problem it is a by-product of how they are poorly controlling their value.
So, your email address (and cell number, SMS id, IM & VoIP) are your personally owned portals (real-estate, just like a bill board.) As long you do not have control over the “value” of that real-estate, people will find a way to push stuff at you as fast as humanly possible – and the very beauty of the Internet is that is has given them the power to push at “light speed”.
Don’t throw the baby out with the bath water here. Just give “everyone” personal value control.
There was an excellent article in this week’s Forbes “Why Spam Won’t go Away” – http://www.forbes.com/2006/12/11/spam-security-email-tech-security-cz_bs_1212spam.html?partner=alerts%22
The only thing this writer missed is that Personal Message Bonding does not require a re-engineering email. It is in fact fully functional and very much in use today – We’ve just had to wait our turn because others (like e-stamp promoters) have given economics such a bad name and we are often confused with them. Additionally, there are still many others that believed spam could be controlled with technology – and that path is clearly being seen as a very costly black hole to everyone.
As always, I invite you to push the envelope of Personal Value Control as discussing the details helps me ensure that I’ve not missed something in truly correcting the email value chain for all parties.
Legitimate email from sun.com regarding a Java 1.6.0 beta bug report landed in gmail’s spam filter. Fortunately I review it — all 100+ messages per day.
It needs to be easier to whitelist senders. Anything you submit on a web site that may result in legitimate mail should produce a response page containing a code you can use to whitelist the mail. The mail that gets sent contains the code. The code is random for each submission, so the spammers can’t confuse it. Likewise if you sign up at anywhere like ebay or paypal part of the signup process should include a code that’s not exposed later on (as your buyer or seller ID for instance) but appears in all legitimate email sent by or via the site.
Mail clients need a quick way to accept such codes. I’d suggest making them handle a “whitelist:szEjd83Kd90” sort of URL, with codes like suggested above being links of this format. Create an account somewhere, get a code; submit a bug report, get a code; etc. A compliant mail client will let you just click the link and have email from that source pass your spam filtering automatically. Spammers may include random instances in their spams, but only a minuscule fraction will happen to coincide with a code the target’s whitelisted. Whitelist code links could include an expiration date too, so one-offs like bug reports that won’t generate ongoing mail don’t permanently bloat your whitelist. The code links could also apply the code to an IP mask. Mail clients receiving mail with the code from outside the specified address range would blackhole it instead of pass it as being a near-certain sporgery. Mail from legitimate ISP mail exchange nodes could contain a header X-Whitelist: with the same code (and the ISP’s IP range). Compliant clients would use this to whitelist mail from the sender if possible, doing so a) automatically if you replied and b) manually upon request. If the header were missing, the client would fall back on whitelisting the sender, though that’s more spoof-susceptible. (The code in the header would convolve the sender’s account ID with the recipient email address, so the code from the sender’s mail to someone else couldn’t be used to spoof mail from them. The sender’s mails to each different recipient would have a different code, though to the same recipient a consistent code.)
Once the above was widely adopted, autonuking nonwhitelisted mail that contained rich media would become doable.
The final complexity is dealing with server-side spam filtering, where it’s more efficient but harder for end-users to control. There would need to be a process for propagating a whitelisting from the mail client upstream to the POP or IMAP server — an extension to the POP protocol perhaps. This way that whitelisted Java bug report would not be bollixed by gmail’s spam filter (even if you don’t POP your gmail, as long as you configure your mail client to be able to do so, when you click the whitelist-our-reply link on the successful-submission form your client tells gmail to whitelist it) or your ISP’s.
The client for my Symantec spam filter at work is broken so I have been observing the spam evolve over the last few months. Most of it is pharmaceutical e-mails. First I noticed they went from text to images. Now just recently the images went from solid backgrounds to multi-colored and gradient backgrounds. Guess it’s so that the letters don’t look like letters to a computer that might be anaylizing them.
My problem with spam is not spammers getting through a whitelist, etc. it’s the false positives from the spam filter. I don’t like the thought that the existence of this high volume of spam might affect my life. I almost missed several legitimate e-mails regarding my mortgage because the spam filter had already blacklisted references to my bank. So even though gMail’s spam filter has only let through a handful of spam into my inbox, I am more concerned that I still have to review all of the messages in the junk mail box to make sure I’m not missing something.
A lot of the suggestions that involve responding to requests based on summaries etc. don’t solve the problem because the spammers just need to spoof the summaries into something appropriately realistic that you will accept it. But I think the real crime is simply the fact that we have to deal with this stuff. I think the spammers should all go to jail, although I accept the fact that my wishes are difficult to judiciously enforce.
Widespread, successful filtering including default filtering of new users’ mail would reduce the amount of revenue to the spammers enough to make them have to scale back and better target their operations.
Some comments miss the point: the problem is that spam is clogging the Internet. Just Google-News “Spam” to see how much. Our business, which relies on quick email communication between N America and Australasia, has been severely impacted in the last few days by emails either being delayed, or not getting through at all. So even if YOU have a successful strategy for de-spamming your inbox, you are still being hurt by spam.
I’ve found the best way to deal with spam is white list filtering. This is where you only accept email from known senders, and it blocks over 99% of spam.
It takes time to set up your white list, but it is definitely worth the effort. You don’t need to change your email program, and it costs you nothing, other than time, to set it up.
It works, because no matter how smart the spammers are, they can’t know the email addresses of all your friends!
Spammers rely on the fact that no blocking filtering system is perfect, especially when their message is in an image.
You can also avoid having to read their lousy messages, by reading your emails in plain text instead of html format.
http://www.hiphil.ws/mmo/blog.html
Greg,
This is really interesting. My experience with Akismet has been great. The discrepancy must be caused by some difference between our blogs. I would love to know what that is.
I started using Akismet a month ago because the trio of antispam plugins for MT weren’t doing a damn thing, for some reason. But I’ve found that Akismet tends toward sending legitimate e-mail to my junk folder even on its default settings. At the moment it’s letting through probably 20 spams a day, which I can handle, and no one has complained to me their comments were being “held for moderation.” But when I stopped getting comments from my friends shortly after Akismet was installed, and I couldn’t leave a comment on my own blog, I knew something was amiss. So I’m not thrilled with Akismet but it’s decent.
I have a number of email accounts. One is a junk account that collects heaps and I don’t read it I just search for keywords and do a massive purge now and then… that’s fine for all the things that insist I don’t exist unless I have an email address.
I also have a personal mailbox that is 100% whitelist and people who are interested tend to put themselves onto the whitelist, I also take the trouble to whitelist people myself when I think of it. That covers enough people to fill a mailbox which is all I get time to pay attention to anyhow. I’ve found it a very nice solution, easy to implement, easy to use, mostly compatible with existing email. Other people can find their own methods if they don’t like doing it my way but the point is that technical solutions do exist.
I’ll also point out that the whole design of email is broken. A three-way handshake is necessary:
[1] sender fires a UDP packet to the receiver containing only an essential summary of the message contents plus an identification for collecting the message.
[2] receiver decides which of the recently collected summaries are interesting and makes a TCP connection to request collection of some particular message.
[3] sender delivers the full message to the TCP connection.
This forces the sender to properly identify themselves and to maintain a constant identification for a reasonable time period. It also gives the receiver the chance to sort their collection priorities based on the message summaries. The sender can also check to see if their messages were collected (with a retransmit of the original UDP if they think it might help). Someone please pull-finger and implement this.
I have my Princeton mail forwarded to a GMail acct. So I have both Proofpoint and GMail spam filtering going over everything I get. Personally, I haven’t had a spam message hit my inbox in at least a month. Maybe the problem is not that spammers are winning, but that the good solutions are proprietary and (in the case of Proofpoint) quite expensive?
Also, I think this image-spam thing is, at least for now, a bit of a red herring. Gmail, for example, blocks out images. So if get some such email I won’t be able to read it unless I ask to – defeating most of the purpose of the spam. The presense of an inline image in an HTML email also seems to be a strong indication that a message is spam. The combination of that fact with whitelisting seems like it could take us pretty far in blocking image-spam. Unless we get to the point where the majority of real email also contains inline pictures.
Michael,
Apparently spam does pay. Surveys show that 3-5 percent of email users have purchased something from a spammer at some point, and that the stocks named in buy-this-stock-now spams do show a price bump. Unfortunately even a miniscule amount of revenue per message is enough.
Regarding the gibberish text in current spams, that is designed to confuse text-based analysis tools. Think of it as colorful camoflage.
“While I can see how we manage to reduce the spammer’s profit margin, how we go about increasing their operating costs?”
ISPs blanket-blocking outbound SMTP not through their gateway would be a start. (Anything with destination port 25, destination IP not their MX box. So much for a lotta botnet spam. ISPs throttling throughput per customer at their MX to a small number of emails per hour will not harm the vast majority of legitimate users, but will kill most spam that does go through their MX. Users that want to run legitimate mailing lists can be accomodated by the ISP hosting the list itself, and managing subscriptions (and honoring unsubscribe requests) itself so that these lists are certainly legitimately opt-in and unsubscribable, rather than being abusable to send spam.
Next we can have mail clients that filter out all rich-media mail not from an address you’ve previously sent mail to. Unsolicited mail from unknown senders (or known ones using new addresses) would have to be plain text (and pass Bayesian filtering) and stimulate a reply before they could send rich media and expect it to get through. This would become widely understood netiquette once such clients became widespread. Webmail providers (e.g. gmail) could deploy this right now. New versions of Thunderbird, (yuck) Eudora, and (double yuck) Outlook would cover nearly everyone else.
Bayesian filtering needs smartening up, too. Strings of single letters separated by whitespace need to be treated as words, and substrings of “high spam quotient” words that don’t occur in non-spam words need to be indicators of spam, e.g. “harmacy” to catch “fharmacy” and all the zillions of other variations on that word popping up lately. Currently, Thunderbird’s filter doesn’t recognize “p h a r m a c y” at all, nor (until encountered at least once) a new “foo-armacy” variation. The fact that this comment made it through this blog’s filter is proof enough, I trust. 😉 (Of course, so that “spammy” topics can be discussed, all such filters need whitelists and fairly-automatic populating of those with messages from a consistent sender or IP that has a large proportion of past messages passing the filters and then not being explicitly deleted either.)
Penny stocks (or any other stocks) are traded in a world full of regulation. Those regulations could potentially be changed to make it more difficult to profit from pump-and-dump schemes. Of course, it’s just a matter of time before some enterprising Wall Street fund starts watching for these pump-and-dump spams and shorts the stocks. Then you start having an interesting game theoretic competition between the spammers and the smart Wall Street types. If I had to make a guess at it, the net effect of anti-spam investing would be to neutralize any possible gains that a spammer could effect.
The solution has to be legal not technical, and it has to be fair and supported by a majority of people involved in email. Personally I think it should be seen like pollution: all this junk being dumped on people has a cleanup cost, and I think spammers should be made to pay that cost.
Then you run into the problem of doing this internationally..
http://www-static.cc.gatech.edu/~avr/publications/p396-ramachandran-sigcomm06.pdf
Ed,
Do spammers really make any money? Penny stocks, viagra — is anyone fooled at this point (even 1 in million)? What explains the complete gibberish text email spams? It seems to me that a significant portion of large-scale spam has become an anti-social attack on email as a system of communication, and is not profit-motivated.
And qualitatively, spam is different today than before. Rather than having a small number of victims of Nigerian wire scams, the principle impact of the current generation of spam is just lost productivity (both in actively deleting unwanted emails and losing valuable ones.) As each round of spy versus spy continues, it seems harder to make money and more likely to extract social costs.
The only people that I can see who benefit obviously financially from spam are the providers of spam filters….
e-Postage: This doesn’t fare very well with bot-nets. Not only do the Spammers not get charged, but all of a sudden, dozens of innocent, non-tech-savvy users get dinged hundreds of dollars for all of the SPAM their computers have been sending out without their knowledge.
Internet usage prohibition: I disconnecting the companies whose products are being advertised would have the inadvertant effect of putting more or less “random” companies out of business. Typically, companies do not contract a “SPAM” company to advertise for them. They contract an advertising firm, which subcontracts to a company which subcontracts to another company which subcontracts to Spammers. The end result is that the companies are being sold via spam are often unaware of the form that their marketing is taking. They know that they have hired someone to give them exposure, and they probably have some sense of how well it is working, but they don’t necessarily know the nuts and bolts of how it works.
Also, Ed implied that the oh-so-common penny stock spam doesn’t even come from the company that is advertised — these are typically day traders looking to make a quick buck off of some innocent company’s stock. It is completely unfair to punish these companies for being advertised in some completely unrelated trader’s scam.
I’m surprised that there haven’t been more (any?) `insider’ stories from the spam bad guys. There have been kiss-and-tell stories in nearly every other criminal enterprise; why not here?
I should post a pointer to “Why Your Spam Solution Doesn’t Work”
E-postage will kill mailing-lists. And nobody has figured out how to make it work anyway. Plus big mailers don’t want to pay it.
User-education is running up against the fact that spammers look for the idiots anyway. It’s an imperfect world.
[…] require postage.
While there have been many conceptual arguments over “e-stamps” over the years, a pinch of empirical evidence is probably worth a ton of blather.
According to an April 2006 survey (Pew Internet and American Life Project, the Associated Press and AOL):
(p.3)
As Isaac Scarborough notes:
Does anyone have any good numbers on what percentage of SMS spam is being injected via free internet gateways?
The only mail that needs a cost is unsolicited mail.
Mail between mutually familiar correspondents doesn’t need such a cost.
I want to accept unsolicited e-mail from complete strangers who can demonstrate they have correspondence of value to me.
That demonstration needn’t benefit me, it could simply be the expenditure of a cent of CPU time.
I may miss out on interesting correspondence from unfamiliar senders with large audiences, but I suggest such senders should publish their messages and I may find them if they meet my interest criteria.
Pardon me, if I repeat myself: The prohibition should have told you. Nancy Reagan’s war on drugs should have told you. Just like with drugs and users, there will always be spam, as long as there are people, that buy off spammers.
I imagine large billboards, and TV-ads: Don’t click and don’t buy. Its not a technical problem.
I’ve said it before: The solution is economic.
We must raise the cost of sending spam. The easiest way to do that is to require postage. Peering ISPs could pay for the difference between inbound and outbound traffic. It would be up to them how to pass on the costs of large outbound flows, but in any case, the largest burden would fall on the spam-friendly ISPs.
I would be personally very happy to spend a penny or a nickel per email I sent as the price of radically reducing spam, should my ISP decide to charge senders directly.
Email doesn’t “want to be free” any more than information does.
Jim H.
Unless you want to “dumb down†what email means.
cm,
For a fairly long time (“internet time”; but see Odlyzko) many email users expressed strong resistance to anything but plain text (7-bit ascii). This was normally justified on grounds of interoperability…. those users didn’t want to receive Microsoft®-proprietary coded email on their boxen.
billswift: Some emails come with background images, images in the signature, or custom character fonts. The standard allows inline deposition, mail clients are rendering images and other content (compare also problems with MS Outlook automatically executing scripts or programs), hence they can be considered legit parts of messages. Unless you want to “dumb down” what email means.
I’m curious. Why not just filter anything that’s not text?
We’re talking e-mail after all, not web pages.
I agree that the solution to spam is to reduce the spammer’s profit margin, the question becomes how to effectively do that. Given the recent shift from bogus sales and Nigerian payoff scams to pump-and-dump stock scams it’s evident that educating users not to respond to spam is working but not working well enough to eradicate the problem. While I can see how we manage to reduce the spammer’s profit margin, how we go about increasing their operating costs?
It’s a pretty good illustration of the flaws of hoping for a technological solution to a social problem – there can be technologists on the side of evil too :-(.
“But now the spammers have adopted new tactics: sending spam from botnets (armies of compromised desktop computers), sending images rather than text, adding randomly varying noise to the messages to make them harder to analyze, and providing fewer URLs in messages.”
But this also happened earlier this year. I’m not receiving radically different spam now than I did half a year ago. It’s just that the volume has gone up significantly.