November 26, 2024

30th Anniversary of First Spam Email; No End in Sight

Today marks the 30th anniversary of (what is reputed to be) the first spam email. Here’s the body of the email:

DIGITAL WILL BE GIVING A PRODUCT PRESENTATION OF THE NEWEST MEMBERS OF THE DECSYSTEM-20 FAMILY; THE DECSYSTEM-2020, 2020T, 2060, AND 2060T. THE DECSYSTEM-20 FAMILY OF COMPUTERS HAS EVOLVED FROM THE TENEX OPERATING SYSTEM AND THE DECSYSTEM-10 (PDP-10) COMPUTER ARCHITECTURE. BOTH THE DECSYSTEM-2060T AND 2020T OFFER FULL ARPANET SUPPORT UNDER THE TOPS-20 OPERATING SYSTEM. THE DECSYSTEM-2060 IS AN UPWARD EXTENSION OF THE CURRENT DECSYSTEM 2040 AND 2050 FAMILY. THE DECSYSTEM-2020 IS A NEW LOW END MEMBER OF THE DECSYSTEM-20 FAMILY AND FULLY SOFTWARE COMPATIBLE WITH ALL OF THE OTHER DECSYSTEM-20 MODELS.

WE INVITE YOU TO COME SEE THE 2020 AND HEAR ABOUT THE DECSYSTEM-20 FAMILY AT THE TWO PRODUCT PRESENTATIONS WE WILL BE GIVING IN CALIFORNIA THIS MONTH. THE LOCATIONS WILL BE:

TUESDAY, MAY 9, 1978 – 2 PM
HYATT HOUSE (NEAR THE L.A. AIRPORT)
LOS ANGELES, CA

THURSDAY, MAY 11, 1978 – 2 PM
DUNFEY’S ROYAL COACH
SAN MATEO, CA
(4 MILES SOUTH OF S.F. AIRPORT AT BAYSHORE, RT 101 AND RT 92)

A 2020 WILL BE THERE FOR YOU TO VIEW. ALSO TERMINALS ON-LINE TO OTHER DECSYSTEM-20 SYSTEMS THROUGH THE ARPANET. IF YOU ARE UNABLE TO ATTEND, PLEASE FEEL FREE TO CONTACT THE NEAREST DEC OFFICE FOR MORE INFORMATION ABOUT THE EXCITING DECSYSTEM-20 FAMILY.

This is relatively mild by the standards of today’s spam. The message announced legitimate events relating to legitimate products in which the recipients might plausibly be interested. The sender was apparently unaware that this kind of message was against the rules.

Yet this message has much in common with today’s spam. The message used ALL CAPS, which was more common in those days but not the universal practice for email. The list of recipients was long. The message was incorrectly formatted – the original had more recipients than the email software of the day could handle, so what was supposed to be the recipient list actually spilled over into the body of the email, apparently unnoticed by the sender.

At the time, the Net’s rules forbade commercial activity, so the message was against the rules. Beyond the rule violation,the message’s propriety was widely questioned, and people debated what to do about it. (Brad Templeton has posted parts of the debate.)

Thirty years later, there is more spam than ever and no end is in sight. This shouldn’t be surprising, because the spam problem is fundamentally driven by economics. If anyone can send to anyone, and the cost of sending is nearly zero, many messages will be sent. Distinguishing unwanted email from wanted email is notoriously difficult – often you have to read a message to decide whether reading it was a waste of time. In this environment, spam will be a fact of life. The surprise, if anything, is that we have done as well as we have in coping with it.

spammers gone wild

I’m sure this sort of behavior is old news, but it’s still really annoying.  Starting last night and continuing as I’m writing this, some annoying spammer has been forging my email address as the “From” line of a variety of spams.  This is causing a staggering volume of backscatter, mostly of the “Delivery Status Notification (failure)” variety.  Sampling these messages, I’m seeing several interesting things.

  1. The spammer is using my proper email address (dwallach@…) on each message, but a different “real” name on each one.  The name “Dan Wallach” does not appear anywhere.
  2. I forward everything to Gmail.  Gmail considers all of this backscatter to be spam.  That’s probably the correct answer, but I’m not sure I want to train my own DSPAM to do the same thing.  (DSPAM runs locally, and then I save a local copy and forward to Gmail.)  If I send a real message and it legitimately bounces, I want to know about it.  If I train DSPAM that all of these delivery status notifications are spam, it will inevitably throw away anything from “mailer-daemon”.  I’m unclear on whether that’s good or bad.
  3. You could easily build a bounce-message validator.  Every backscatter seems to have the original message ID in it, somewhere.  If the backscatter mentions a message ID that my system actually generated, then the backscatter is allowed.  Otherwise it’s dropped.  (This idea appears to be a variation of VERP; I’d make the message ID be a keyed MAC of a sequence number.)
  4. A large number of these spams have a message body consisting entirely of “Take a look at yourself :)”  and linking to “video.exe” on a variety of different web sites.  Gmail helpfully rewrites those links such that they can track that I clicked on it.  This would also seem to give them an opportunity to give me an anti-virus warning, but they don’t do any such thing.  (“video.exe” is one of the common names used by the Storm worm.)
  5. Many spams include links that redirect through Google’s PageAd server to yet another server.  I clicked on one of them.  It appears that the PageAd redirector worked, but then Firefox’s “badware” detector caught the destination as being bad, ultimately taking me to stopbadware.org.  Go Firefox!
  6. Some legit antispam firewall products (including Barracuda) are helpfully telling me my message “was blocked by our Spam Firewall. The email you sent with the following subject has NOT BEEN DELIVERED”.  This is clearly broken behavior.  Just drop it and move on!
  7. Several of the backscatter messages are actually validation messages (sender address verification).  This has been largely discredited due to a variety of practical problems, never mind common-case annoyance to normal users.
  8. One of the spammers seems to be quite keen to sell replicas of expensive wristwatches, and those links take you to some kind of seemingly real online store, albeit with a funky DNS name.  Somehow, even if I did want a fake expensive watch, I’m not sure I’d be comfortable typing my credit card number into a web site whose name is a list of random characters and who (clearly) is closely related to the underworld of lecherous spammers.

EDIT: fixed post that had gone out before it was done.

Bizarre Undervote on iVotronic in France

In France, most municipalities use paper ballots in elections, but a few places have begun using DRE (direct-recording electronic) machines. Pierre Muller, a French computer scientist, has recently sent me a report of a malfunction by an ES&S iVotronic machine in a recent municipal election.

In this spring’s elections (and he believes this also happened last year), there have been some unexplained “undervotes” on iVotronic machines. Below is a printout from an iVotronic machine. There’s a line “UnderVotes For Above Contest: 1”. Since the voter is required by the user-interface to choose between a candidate and the choice “vote blanc” [none of the above], undervotes should not be possible.

This event is similar in some ways to the Sequoia AVC Advantage bug observed in New Jersey on February 5, 2008. In both cases it appears that the machine is producing results that should not be possible, and in both cases local election officials are unable to explain how these results could legitimately be obtained.

Here is the relevant portion of the printout:

I’ve also prepared a larger image of the full printout, annotated with my English translation.

voting ID requirements and the Supreme Court

Last week, I posted here about voter ID requirements.  There was a case pending before the U.S. Supreme Court on the same topic.  It seems Indiana was trying to require voters to present ID in order to vote.  Lawsuit.  In the end, the court found that the requirement wasn’t particularly onerous (the New York Times’s article is as good as any for a basic summary, or go straight to the ruling).

Unsurprisingly, there has been a lot of hang-wringing on this (see, for example, this New York Times unsigned editorial).  We can expect similar legislation elsewhere now that the Court has made it pretty difficult to challenge these sorts of laws (see, for example, the ongoing battle to pass this sort of legislation in Texas).

As I wrote last time, I’m not particularly opposed to voters being required to present ID.  However, ID needs to be easy to get for anybody who is elgible to vote.  For most people, this is easy.  The big question we’d all like to know is the size of the population for which it’s not easy.  Consider, as a hypothetical example, an elderly Texas woman who never drove a car.  If she’s over 75 years old, the state’s centralized birth certificate registry won’t (officially) have her records.  It could well require detective work to produce sufficient documentation to get her a state ID card.  Who’s going to pay for that?

The big technical question, of course, is whether the root desires behind the voter ID requirement can be addressed in some more effective fashion than ID requirement.  What are those root desires?

  1. Prevent legitimate citizens from registering to vote and voting in more than one locale
  2. Prevent registered voters from casting multiple votes in their own name
  3. Prevent registered voters from impersonating other registered voters
  4. Prevent anyone, including malicious poll workers, from casting votes on behalf of registered voters who have chosen not to vote
  5. Prevent non-eligible people (non-citizens, felons, etc.) from registering to vote
  6. Detect changes in registered voters’ eligibility status, quickly and accurately

Which problems can be solved by purple ink on a voter’s thumb?  #1 and #2 are readily solved, since a second attempt to vote will be forbidden.  #3 is disincentivized, because the impersonator will be unable to vote under his or her own name.  #4-6 will require other technologies.

Okay, which problems can be solved by having required voter ID?  Let’s assume, for the sake of discussion, we have a centralized state database keyed off the voter’s ID card number, but individual polling places do not have real-time access to this database.  Also, let’s assume that voter ID cards do not have any computational power: no smart cards, no crypto, etc.  #1 is ostensibly solved by the central database.  #2 cannot be prevented (at least, in a world with early voting or voting centers, where a voter has multiple places where he or she can legitimately vote), but it can be detected, and is thus disincentivized.  #3 is solved.  #4 is largely unsolved: if malicious poll workers want to forge signatures in the poll book, they may or may not be detected.  (In a recount situation, written signatures should be verified, but it’s unclear what the accuracy of that checking process might be.)

You could try to solve #4 with smartcards that issue digital signatures, but that’s a whole different can of worms.  Since the smartcard doesn’t really know what it’s being asked to sign, this could be exploited by an attacker.  (Example: you need to present your ID in a variety of different circumstances, such as proving your age to enter a bar.  The bouncer could “swipe” your card and use that as a way of getting a forged signature on an election record.)

What about #5 and #6?  These are really back-end database problems.  Requiring voters to present ID doesn’t have any impact.  However, having a database that is keyed off the voters’ ID cards significantly improves #5 and #6 and could ostensibly help reduce a variety of errors in the process.

Curiously, it seems that most of the benefit of requiring ID occurs in the back-end database, rather than on the day of the election.  The only real benefit of presenting ID, on election day, occurs in vote centers, early voting locations, and so forth.  When there may be millions of eligible voters who could use a vote center, traditional paper poll books are unworkable.  With a database keyed from ID card numbers, a voter’s records can be efficiently looked up and verified.  While this isn’t a security problem, improving the efficiency of the voting process is still a worthwhile goal.

Future of News Workshop, May 14-15 in Princeton

We’ve got a great lineup of speakers for our upcoming “Future of News” workshop. It’s May 14-15 in Princeton. It’s free, and if you register we’ll feed you lunch.

Agenda

Wednesday, May 14, 2008

9:30 – 10:45 Registration
10:45 – 11:00 Welcoming Remarks
11:00 – 12:00 Keynote talk by Paul Starr
12:00 – 1:30 Lunch, Convocation Room
1:30 – 3:00 Panel 1: The People Formerly Known as the Audience
3:00 – 3:30 Break
3:30 – 5:00 Panel 2: Economics of News
5:00 – 6:00 Reception

Thursday, May 15, 2008

8:15 – 9:30 Continental Breakfast
9:30 – 10:30 Featured talk by David Robinson
10:30 – 11:00 Break
11:00 – 12:30 Panel 3: Data Mining, Interactivity and Visualization
12:30 – 1:30 Lunch, Convocation Room
1:30 – 3:00 Panel 4: The Medium’s New Message
3:00 – 3:15 Closing Remarks

Panels

Panel 1: The People Formerly Known as the Audience:

How effectively can users collectively create and filter the stream of news information? How much of journalism can or will be “devolved” from professionals to networks of amateurs? What new challenges do these collective modes of news production create? Could informal flows of information in online social networks challenge the idea of “news” as we know it?

Panel 2: Economics of News:

How will technology-driven changes in advertising markets reshape the news media landscape? Can traditional, high-cost methods of newsgathering support themselves through other means? To what extent will action-guiding business intelligence and other “private journalism”, designed to create information asymmetries among news consumers, supplant or merge with globally accessible news?

  • Gordon Crovitz, former publisher, The Wall Street Journal
  • Mark Davis, Vice President for Strategy, San Diego Union Tribune
  • Eric Alterman, Distinguished Professor of English, Brooklyn College, City University of New York, and Professor of Journalism at the CUNY Graduate School of Journalism

Panel 3: Data Mining, Visualization, and Interactivity:

To what extent will new tools for visualizing and artfully presenting large data sets reduce the need for human intermediaries between facts and news consumers? How can news be presented via simulation and interactive tools? What new kinds of questions can professional journalists ask and answer using digital technologies?

Panel 4: The Medium’s New Message:

What are the effects of changing news consumption on political behavior? What does a public life populated by social media “producers” look like? How will people cope with the new information glut?

  • Clay Shirky, Adjunct Professor at NYU and author of Here Comes Everybody: The Power of Organizing Without Organizations.
  • Markus Prior, Assistant Professor of Politics and Public Affairs in the Woodrow Wilson School and the Department of Politics at Princeton University.
  • JD Lasica, writer and consultant, co-founder and editorial director of Ourmedia.com, president of the Social Media Group.

Panelists’ bios.

For more information, including (free) registration, see the main workshop page.