October 22, 2018

Cheap CAPTCHA Solving Changes the Security Game

ZDNet’s “Zero Day” blog has an interesting post on the gray-market economy in solving CAPTCHAs.

CAPTCHAs are those online tests that ask you to type in a sequence of characters from a hard-to-read image. By doing this, you prove that you’re a real person and not an automated bot – the assumption being that bots cannot decipher the CAPTCHA images reliably. The goal of CAPTCHAs is to raise the price of access to a resource, by requiring a small quantum of human attention, in the hope that legitimate human users will be willing to expend a little attention but spammers, password guessers, and other unwanted users will not.

It’s no surprise, then, that a gray market in CAPTCHA-solving has developed, and that that market uses technology to deliver CAPTCHAs efficiently to low-wage workers who solve many CAPTCHAs per hour. It’s no surprise, either, that there is vigorous competition between CAPTCHA-solving firms in India and elsewhere. The going rate, for high-volume buyers, seems to be about $0.002 per CAPTCHA solved.

I would happily pay that rate to have somebody else solve the CAPTCHAs I encounter. I see two or three CAPTCHAs a week, so this would cost me about twenty-five cents a year. I assume most of you, and most people in the developed world, would happily pay that much to never see CAPTCHAs. There’s an obvious business opportunity here, to provide a browser plugin that recognizes CAPTCHAs and outsources them to low-wage solvers – if some entrepreneur can overcome transaction costs and any legal issues.

Of course, the fact that CAPTCHAs can be solved for a small fee, and even that most users are willing to pay that fee, does not make CAPTCHAs useless. They still do raise the cost of spamming and other undesired behavior. The key question is whether imposing a $0.002 fee on certain kinds of accesses deters enough bad behavior. That’s an empirical question that is answerable in principle. We might not have the data to answer it in practice, at least not yet.

Another interesting question is whether it’s good public policy to try to stop CAPTCHA-solving services. It’s not clear whether governments can actually hinder CAPTCHA-solving services enough to raise the price (or risk) of using them. But even assuming that governments can raise the price of CAPTCHA-solving, the price increase will deter some bad behavior but will also prevent some beneficial transactions such as outsourcing by legitimate customers. Whether the bad behavior deterred outweighs the good behavior deterred is another empirical question we probably can’t answer yet.

On the first question – the impact of cheap CAPTCHA-solving – we’re starting a real-world experiment, like it or not.

30th Anniversary of First Spam Email; No End in Sight

Today marks the 30th anniversary of (what is reputed to be) the first spam email. Here’s the body of the email:

DIGITAL WILL BE GIVING A PRODUCT PRESENTATION OF THE NEWEST MEMBERS OF THE DECSYSTEM-20 FAMILY; THE DECSYSTEM-2020, 2020T, 2060, AND 2060T. THE DECSYSTEM-20 FAMILY OF COMPUTERS HAS EVOLVED FROM THE TENEX OPERATING SYSTEM AND THE DECSYSTEM-10 (PDP-10) COMPUTER ARCHITECTURE. BOTH THE DECSYSTEM-2060T AND 2020T OFFER FULL ARPANET SUPPORT UNDER THE TOPS-20 OPERATING SYSTEM. THE DECSYSTEM-2060 IS AN UPWARD EXTENSION OF THE CURRENT DECSYSTEM 2040 AND 2050 FAMILY. THE DECSYSTEM-2020 IS A NEW LOW END MEMBER OF THE DECSYSTEM-20 FAMILY AND FULLY SOFTWARE COMPATIBLE WITH ALL OF THE OTHER DECSYSTEM-20 MODELS.

WE INVITE YOU TO COME SEE THE 2020 AND HEAR ABOUT THE DECSYSTEM-20 FAMILY AT THE TWO PRODUCT PRESENTATIONS WE WILL BE GIVING IN CALIFORNIA THIS MONTH. THE LOCATIONS WILL BE:

TUESDAY, MAY 9, 1978 – 2 PM
HYATT HOUSE (NEAR THE L.A. AIRPORT)
LOS ANGELES, CA

THURSDAY, MAY 11, 1978 – 2 PM
DUNFEY’S ROYAL COACH
SAN MATEO, CA
(4 MILES SOUTH OF S.F. AIRPORT AT BAYSHORE, RT 101 AND RT 92)

A 2020 WILL BE THERE FOR YOU TO VIEW. ALSO TERMINALS ON-LINE TO OTHER DECSYSTEM-20 SYSTEMS THROUGH THE ARPANET. IF YOU ARE UNABLE TO ATTEND, PLEASE FEEL FREE TO CONTACT THE NEAREST DEC OFFICE FOR MORE INFORMATION ABOUT THE EXCITING DECSYSTEM-20 FAMILY.

This is relatively mild by the standards of today’s spam. The message announced legitimate events relating to legitimate products in which the recipients might plausibly be interested. The sender was apparently unaware that this kind of message was against the rules.

Yet this message has much in common with today’s spam. The message used ALL CAPS, which was more common in those days but not the universal practice for email. The list of recipients was long. The message was incorrectly formatted – the original had more recipients than the email software of the day could handle, so what was supposed to be the recipient list actually spilled over into the body of the email, apparently unnoticed by the sender.

At the time, the Net’s rules forbade commercial activity, so the message was against the rules. Beyond the rule violation,the message’s propriety was widely questioned, and people debated what to do about it. (Brad Templeton has posted parts of the debate.)

Thirty years later, there is more spam than ever and no end is in sight. This shouldn’t be surprising, because the spam problem is fundamentally driven by economics. If anyone can send to anyone, and the cost of sending is nearly zero, many messages will be sent. Distinguishing unwanted email from wanted email is notoriously difficult – often you have to read a message to decide whether reading it was a waste of time. In this environment, spam will be a fact of life. The surprise, if anything, is that we have done as well as we have in coping with it.

spammers gone wild

I’m sure this sort of behavior is old news, but it’s still really annoying.  Starting last night and continuing as I’m writing this, some annoying spammer has been forging my email address as the “From” line of a variety of spams.  This is causing a staggering volume of backscatter, mostly of the “Delivery Status Notification (failure)” variety.  Sampling these messages, I’m seeing several interesting things.

  1. The spammer is using my proper email address (dwallach@…) on each message, but a different “real” name on each one.  The name “Dan Wallach” does not appear anywhere.
  2. I forward everything to Gmail.  Gmail considers all of this backscatter to be spam.  That’s probably the correct answer, but I’m not sure I want to train my own DSPAM to do the same thing.  (DSPAM runs locally, and then I save a local copy and forward to Gmail.)  If I send a real message and it legitimately bounces, I want to know about it.  If I train DSPAM that all of these delivery status notifications are spam, it will inevitably throw away anything from “mailer-daemon”.  I’m unclear on whether that’s good or bad.
  3. You could easily build a bounce-message validator.  Every backscatter seems to have the original message ID in it, somewhere.  If the backscatter mentions a message ID that my system actually generated, then the backscatter is allowed.  Otherwise it’s dropped.  (This idea appears to be a variation of VERP; I’d make the message ID be a keyed MAC of a sequence number.)
  4. A large number of these spams have a message body consisting entirely of “Take a look at yourself :)”  and linking to “video.exe” on a variety of different web sites.  Gmail helpfully rewrites those links such that they can track that I clicked on it.  This would also seem to give them an opportunity to give me an anti-virus warning, but they don’t do any such thing.  (“video.exe” is one of the common names used by the Storm worm.)
  5. Many spams include links that redirect through Google’s PageAd server to yet another server.  I clicked on one of them.  It appears that the PageAd redirector worked, but then Firefox’s “badware” detector caught the destination as being bad, ultimately taking me to stopbadware.org.  Go Firefox!
  6. Some legit antispam firewall products (including Barracuda) are helpfully telling me my message “was blocked by our Spam Firewall. The email you sent with the following subject has NOT BEEN DELIVERED”.  This is clearly broken behavior.  Just drop it and move on!
  7. Several of the backscatter messages are actually validation messages (sender address verification).  This has been largely discredited due to a variety of practical problems, never mind common-case annoyance to normal users.
  8. One of the spammers seems to be quite keen to sell replicas of expensive wristwatches, and those links take you to some kind of seemingly real online store, albeit with a funky DNS name.  Somehow, even if I did want a fake expensive watch, I’m not sure I’d be comfortable typing my credit card number into a web site whose name is a list of random characters and who (clearly) is closely related to the underworld of lecherous spammers.

EDIT: fixed post that had gone out before it was done.