January 10, 2025

Dissecting the Witty Worm

A clever new paper by Abhishek Kumar, Vern Paxson, and Nick Weaver analyzes the Witty Worm, which infected computers running certain security software in March 2004. By analyzing the spray of random packets Witty sent around the Internet, they managed to learn a huge amount about Witty’s spread, including exactly where the virus was injected into the net, and which sites might have been targeted specially by the attacker. They can track with some precision exactly who infected whom as Witty spread.

They did this using data from a “network telescope”. The “telescope” sits in a dark region of the Internet: a region containing 0.4% of all valid IP addresses but no real computers. The telescope records every single network packet that shows up in this dark region. Since there are no ordinary computers in the region, any packets that do show up must be sent in error, or sent to a randomly chosen address.

Witty, like many worms, spread by sending copies of itself to randomly chosen IP addresses. An infected machine would send 20,000 copies of the worm to random addresses, then do a little damage to the local machine, then send 20,000 more copies of the worm to random addresses, then do more local damage, and so on, until the local machine died. When one of the random packets happened to arrive at a vulnerable machine, that machine would be infected and would join the zombie army pumping out infection packets.

Whenever an infected machine happened to send an infection packet into the telescope’s space, the telescope would record the packet and the researchers could deduce that the source machine was infected. So they could figure out which machines were infected, and when each infection began.

Even better, they realized that infected machines were generating the sequence of “random” addresses to attack using something called a Linear Congruential PseudoRandom Number Generator, which is a special kind of deterministic procedure that is sometimes used to crank out a sequence of numbers that looks random, but isn’t really random in the sense that coin-flips are random. Indeed, a LCPRNG has the property that if you can observe its output, then you can predict which “random” numbers it will generate in the future, and you can even calculate which “random” numbers it generated in the past. Now here’s the cool part: the infection packets arriving at the telescope contained “random” addresses that were produced by a LCPRNG, so the researchers could reconstruct the exact state of the LCPRNG on each infected machine. And from that, they could reconstruct the exact sequence of infection attempts that each infected machine made.

Now they knew pretty much everything there was to know about the spread of the Witty worm. They could even reconstruct the detailed history of which machine infected which other machine, and when. This allowed them to trace the infection back to its initial source, “Patient Zero”, which operated from a particular IP address owned by a “European retail ISP”. They observed that Patient Zero did not follow the usual infection rules, meaning that it was running special code designed to launch the worm, apparently by spreading the worm to a “hit list” of machines suspected to be vulnerable. A cluster of machines on the hit list happened to be at a certain U.S. military installation, suggesting that the perpetrator had inside information that machines at that installation would be vulnerable.

The paper goes on from there, using the worm’s spread as a natural experiment on the behavior of the Internet. Researchers often fantasize about doing experiments where they launch packets from thousands of random sites on the Internet and measure the packets’ propagation, to learn about the Internet’s behavior. The worm caused many machines to send packets to the telescope, making it a kind of natural experiment that would have been very difficult to do directly. Lots of useful information about the Internet can be extracted from the infection packets, and the authors proceeded to deduce facts about the local area networks of the infected machines, how many disks they had, the speeds of various network bottlenecks, and even when each infected machine had last been rebooted before catching the worm.

This is not a world-changing paper, but it is a great example of what skilled computer scientists can do with a little bit of data and a lot of ingenuity.

Coming: Mobile Phone Viruses

Clive Thompson at Slate has a scary-sounding new piece about cellphone viruses. As phones get smart – as they start running general-purpose operating systems and having complex software interfaces – they will tend to develop the kinds of software bugs that viruses can exploit. And as phones become more capable, virus-infected phones will be able to do more harm.

What will the viruses do after they break in? Thompson predicts that they’ll make expensive calls to overseas pay-services, running up the victim’s phone bill and transferring money to the pay-service owners, who presumably will be in cahoots with the virus authors. That might happen, but I don’t think it’s the most likely scenario.

The best bet, I think, is that cellphone viruses will look like PC viruses. In the PC world, many viruses are written for kicks, with no specific intent to cause harm (though harm often results when the virus spreads out of control). I would expect to see such mostly-harmless viruses in the cellphone world; and indeed that is what we apparently see with the CommWarrior virus described in the article. Other PC viruses aim to spy on the user, or to install a bot on the computer so that it can be commandeered later to send spam or launch denial of service attacks. All of this is likely in the cellphone world.

Will cellphones be able to resist viruses more effectively than PCs do? Thompson suspects they will:

Phone executives like to say that it’s easy for them to contain worms because their networks are gated communities. Verizon and Sprint can install antivirus software on their servers to automatically delete infected multimedia messages before they reach their victims.

The mobile-phone industry could solve the viral problem by developing an open-source, Linux-style cellular operating system. But that’s about as likely as Motorola and Nokia announcing that all your cell phone calls are going to be free.

I’m not as hopeful. Phone execs like to think of their networks as gated communities; but in the smartphone world all of the action is on the smartphone devices, not in the networks themselves, and the providers have less control over smartphone software than they think. Their communities may be gated, but the gates will have well-known holes (that’s how viruses will get in), and there will be plenty of third-party application software coming in and out. A smart device is only useful if it is configurable, and configurability is the enemy of the sort of regimented configuration control that they are invoking. Third-party services and applications provide tremendous value to users, but as users switch to such services the network providers lose control over users’ data.

The open-source argument is pretty weak too. An open-source operating system may have fewer security flaws (and even that is subject to debate) but the claim that it will have no known flaws, or nearly none, isn’t credible.

The more useful smartphones get, the more they will adopt a software structure like that of PCs, with all of the benefits and problems that come with such a structure – including viruses.

Cornell Researchers on P2P Quality Control

Kevin Walsh and Emin Gün Sirer, of Cornell University, have a new paper on Credence, a system for detecting unwanted files in P2P networks. It’s a kind of reputation system for files, designed to detect in advance that certain files are not what they claim to be. One use of this technology is to detect spoofed files inserted into P2P nets by copyright owners.

Credence is really a reputation system for files. Users cast votes, which are simple thumbs-up or thumbs-down verdicts on particular files, saying whether a file is what it claims to be. Every vote is digitally signed by the user who cast it, so that recipients can verify the authenticity of votes they are given. If you’re not sure whether a file is genuine, you can ask people to send you votes, and then you can combine the votes in a special way to yield a quality score for the file.

P2P systems are open, and they generally don’t register their users (or at least they don’t prevent fraudulent or repeated registrations) so users cannot reliably be identified by their real names. Instead, users make up pseudonyms for themselves. Suppose Alice wants to join the system. She makes a pseudonym for herself, by generating a cryptographic key-pair, consisting of a private key and a public key. Alice keeps the private key secret, and uses it to put digital signatures on the votes she casts. Along with each vote, she distributes a copy of the public key, which anybody can use to verify the digital signature. The public key also serves as a “name” for Alice’s pseudonym. The key attribute of this system is that if Bob wants to forge a vote, that is, to create a vote that appears to have come from Alice’s pseudonym, he must somehow determine Alice’s private key, which is essentially impossible if Alice does her cryptography correctly. In short, the cryptography ensures that anybody can make a pseudonym, but only the creator of a pseudonym can cast votes on its behalf.

This only solves half of the problem, because an adversary can create as many pseudonyms as he likes, and have them cast false votes (i.e, votes in favor of the validity of files that are actually invalid, or against the validity of files that are actually valid). So you can’t just add up all of the votes you receive; you need some way to tell whose votes to trust and whose to ignore. Here Credence uses a simple rule – trust people who tend to vote the same way that you do. Suppose Alice knows that files A, B, and C are valid, and that files X, Y, and Z are not valid. If some pseudonym “Bob” has votes in favor of A, B, and C, and against X, Y, and Z, then Alice concludes that “Bob” tends to vote accurately. If another pseudonym “Charlie” votes the opposite way on those six files, then Alice concludes that votes from “Charlie” tend to be the opposite of the truth. So if she sees some new file that “Bob” says is valid and “Charlie” says is invalid, Alice will conclude that the file is valid. Each party’s vote on the new file gets a weight, equal to the correlation between that party’s votes and Alice’s votes on other files. (The paper hints at further mechanisms that assign trust to people whose votes correlate with those of other people Alice trusts.)

This scheme presents would-be adversaries with a dilemma. If Alice’s votes are truthful, then if you want to mislead Alice about one file, you have to earn her trust by telling her the truth about other files. You can tell occasional lies, but on the whole you have to be a truth-teller. (You can achieve the same effect by lying about almost everything, and telling the truth about just one file. Then Alice will conclude that you are a habitual liar, and will count your votes with negative weight, giving credence to the opposite of what you say. Again, you have to provide Alice with many useful-to-her votes in order to trick her once.)

It looks like this method will work, if it can be implemented efficiently in a real network. The real question, I think, is whether it will scale up to enormous P2P networks containing huge numbers of files. Here I have serious doubts. The paper’s authors don’t claim that users have to know about all of the votes cast in the system. But they’re not entirely clear on how individual users can efficiently get the votes they need to make good decisions, if the network is very large. In the long run, I don’t think this scaling problem is insurmountable; but more research is required to solve it.

Pharm Policy

I wrote Monday about pharming attacks, in which a villain corrupts the DNS system, which translates textual names (like “www.freedom-to-tinker.com”) into the IP addresses (like “216.157.129.231”) that are used to route traffic on the Internet. By doing this, the villain can impersonate an Internet site convincingly. Today I want to talk about how to address this problem.

The best approach would be to secure the DNS system. We know how to do this. Solutions involve having authoritative DNS servers put some kind of digital signature on the information they give out, so that a computer receiving DNS translation information can verify that the information is endorsed by an authoritative server. Such a system, if universally deployed, would put the pharmers out of business. Unfortunately, secure DNS is not widely deployed.

A partial solution, for web access at least, is to access websites via secure (HTTPS) connections. The user, on seeing a valid site, would notice the lock icon on his browser, and would know that his machine was connected to the legitimate owner of the URL that his browser was displaying. A pharmer could make accesses to “www.citibank.com” go to his evil site, but he couldn’t fool the secure-connection mechanism, so he could not make the lock icon on the user’s browser light up.

This approach works fine, as long as users notice the lock icon and refuse to deal with sites that don’t use secure connections. Will users be so vigilant? Probably not. In practice, many sites fail to use secure connections, and browsers give subtle indications of whether a connection is secure but don’t scream about insecure connections. (How could they, when insecure connections are so common?)

One drawback of relying on secure web connections is that it doesn’t protect other communication services, such as email and instant messaging. Pharmers might try to attract a user’s email or IM connections to hostile servers. We know how to secure email, by assigning encryption keys to individuals and having them encrypt and digitally sign their email. Standard email programs even know how to handle encryption and signing. But, again, few people use these facilities.

You may have noticed a common pattern here: each of these mechanisms would be effective if widely adopted, but aren’t used much in practice. In each case, we have a collective action problem. If nearly everybody adopted one of these technologies, then the holdouts would have an incentive to adopt it too; but until a critical mass of adoption is reached, there is little incentive for others to join.

Consider secure web connections. If nearly every website used secure connections, then insecure connections would be rare enough that browsers could issue prominent warnings whenever they saw an insecure connection. This would give legitimate websites a strong incentive to use secure connections, in order to avoid scaring users away. Today, insecure connections are so common that they don’t attract any suspicion. (An online banking site that used insecure connections would be odd, and might arouse suspicion from alert users; but we’re far from the point when browsers expect secure connections from everybody.)

A similar problem holds for secure email. I could digitally sign my outgoing email, but this wouldn’t do much to prevent forged messages in practice. A forged message would of course be unsigned, but unless unsigned messages were rare, nobody would be taken aback on seeing one. But if almost all messages were digitally signed, than an unsigned message would be rare enough to arouse suspicion, and might trigger a prominent warning from the user’s email program.

In all of these cases, there is a tipping point, where the authentication technology is used so widely that failing to use it attracts suspicion. Once the tipping point is reached, the remaining holdouts will switch to using the technology. Assuming we agree that it would be good to adopt one of these technologies, how can we get to the tipping point?

Unwanted Calls and Spam on VoIP

Fred Cohen is predicting that VoIP will bring with it a flood of unsolicited commercial phone calls. (VoIP, or “Voice over Internet Protocol,” systems deliver telephone-like service, making connections via the Internet rather than using the wires of the plain old telephone system.) Cohen argues that VoIP will drive down the cost of international calling to nearly zero, thereby making international telemarketing calls very cheap. He also argues that small overseas call centers will violate the U.S. Do Not Call List with impunity.

This comes on top of concerns about SPIT, or Spam over Internet Telephony. SPIT sends machine-generated voice calls to the phones or voicemail boxes of VoIP users; Cohen worries about VoIP-mediated calls from live people. In a previous article about SPIT, VoIP vendors argue, unconvincingly, that they can handle the SPIT problem.

The root cause of this problem is the same as for email spam. Whenever a communication technology (1) allows anybody to communicate with anybody else, (2) at very low cost, unsolicited and unwanted communication will be a problem. We saw it with spam, and now we’ll see it with SPIT and VoIP telemarketing.

End-users can try to protect themselves from VoIP annoyances by using some of the same methods used against email spam. Whitelists (lists of trusted people), blacklists (lists of suspected spammers), challenge-response, and ultimately even automatic classification and filtering of voice messages, all seem likely to be tried at some point in the future. But as with email spam, don’t expect them to solve the problem, but only to reduce the annoyance level somewhat.

An even more interesting question is whether service providers can address the problem, perhaps by ejecting bad actors from their networks. This depends on how a particular network is structured. Some networks are closely controlled; these will have some chance of ejecting villains. Some networks rely on open protocols, so that nobody is in a position of control – the villains will just connect to the network as they please, and perhaps reconnect periodically under new names. Things get more challenging when different networks connect to each other, so that their legitimate clients can talk to each other. If a closed network connects to an open one, villains on the open network may be able to reach customers of the closed network, despite the best efforts of the closed network’s administrator.

Can’t we just use closed networks instead of open ones? If only it were so simple. Open networks have important advantages over closed ones; and many people will choose open networks because of these advantages, and in spite of the possibly heavier spam load on open networks. They may well be right to make that choice.

Because all of this calling will be done on the Internet, an open and tremendously flexible network, there are many creative attacks on these problems. For example, an open authentication infrastructure might provide a kind of CallerID service for VoIP, or even a certification of non-spammerness. Expect the technological battle to go on for years.