A clever new paper by Abhishek Kumar, Vern Paxson, and Nick Weaver analyzes the Witty Worm, which infected computers running certain security software in March 2004. By analyzing the spray of random packets Witty sent around the Internet, they managed to learn a huge amount about Witty’s spread, including exactly where the virus was injected into the net, and which sites might have been targeted specially by the attacker. They can track with some precision exactly who infected whom as Witty spread.
They did this using data from a “network telescope”. The “telescope” sits in a dark region of the Internet: a region containing 0.4% of all valid IP addresses but no real computers. The telescope records every single network packet that shows up in this dark region. Since there are no ordinary computers in the region, any packets that do show up must be sent in error, or sent to a randomly chosen address.
Witty, like many worms, spread by sending copies of itself to randomly chosen IP addresses. An infected machine would send 20,000 copies of the worm to random addresses, then do a little damage to the local machine, then send 20,000 more copies of the worm to random addresses, then do more local damage, and so on, until the local machine died. When one of the random packets happened to arrive at a vulnerable machine, that machine would be infected and would join the zombie army pumping out infection packets.
Whenever an infected machine happened to send an infection packet into the telescope’s space, the telescope would record the packet and the researchers could deduce that the source machine was infected. So they could figure out which machines were infected, and when each infection began.
Even better, they realized that infected machines were generating the sequence of “random” addresses to attack using something called a Linear Congruential PseudoRandom Number Generator, which is a special kind of deterministic procedure that is sometimes used to crank out a sequence of numbers that looks random, but isn’t really random in the sense that coin-flips are random. Indeed, a LCPRNG has the property that if you can observe its output, then you can predict which “random” numbers it will generate in the future, and you can even calculate which “random” numbers it generated in the past. Now here’s the cool part: the infection packets arriving at the telescope contained “random” addresses that were produced by a LCPRNG, so the researchers could reconstruct the exact state of the LCPRNG on each infected machine. And from that, they could reconstruct the exact sequence of infection attempts that each infected machine made.
Now they knew pretty much everything there was to know about the spread of the Witty worm. They could even reconstruct the detailed history of which machine infected which other machine, and when. This allowed them to trace the infection back to its initial source, “Patient Zero”, which operated from a particular IP address owned by a “European retail ISP”. They observed that Patient Zero did not follow the usual infection rules, meaning that it was running special code designed to launch the worm, apparently by spreading the worm to a “hit list” of machines suspected to be vulnerable. A cluster of machines on the hit list happened to be at a certain U.S. military installation, suggesting that the perpetrator had inside information that machines at that installation would be vulnerable.
The paper goes on from there, using the worm’s spread as a natural experiment on the behavior of the Internet. Researchers often fantasize about doing experiments where they launch packets from thousands of random sites on the Internet and measure the packets’ propagation, to learn about the Internet’s behavior. The worm caused many machines to send packets to the telescope, making it a kind of natural experiment that would have been very difficult to do directly. Lots of useful information about the Internet can be extracted from the infection packets, and the authors proceeded to deduce facts about the local area networks of the infected machines, how many disks they had, the speeds of various network bottlenecks, and even when each infected machine had last been rebooted before catching the worm.
This is not a world-changing paper, but it is a great example of what skilled computer scientists can do with a little bit of data and a lot of ingenuity.
Sean,
You’re right to point out that they haven’t actually trace the who-infected-whom tree all the way back to Patient Zero, although it seems likely they could do that. I shorted the Patient Zero story a bit for brevity. It is true that the original injector could have spoofed its IP address. The main worm code didn’t do that, but the source could well have done so.
The authors describe their analysis of “patient zero”, an apparent IP address that was the origin of the attacks. However, this wasn’t determined by actually tracking back contacts between hosts to an origin; rather, it appears, they detected packets coming from an IP address that didn’t fit the original pattern, presumably because it was running different software, designed for worm-injection. They don’t say explicitly, but they presumably identified the source host there from the IP address in the packets themselves. However, since they know this program was running different software than the worm, couldn’t that program have been using IP address spoofing, thus casting doubt on the likelihood that that IP address was actually the source of the worm?
Note that it’s also possible for an attacker who knows the telescope’s address to feed the telescope deliberately misleading information about what is happening.
I don’t know if the telescope’s addresses have been published. But I’ll bet they wouldn’t be too hard to find out.
Are the IP addresses of the telescope network publicly known? If so, it ought to be an easy fix for future worms to avoid detection.
I must say that this was one of the more interesting academic papers I’ve read. Of particular note was their breakdown of the LC PRNG used and how it was broken. It will be interesting to see whether they did, in fact, identify the host computer and the worm author.
Jordan,
As I understand it, their primary analysis used one telescope that listened on a /8, which is 0.4% of the address space. Some parts of their analysis used a second telescope listening on another /8 (another 0.4%), but this second telescope records only 10% of the packets (randomly chosen, I think) that it sees. As you say, those who care about such details should read the whole paper, which has more good stuff than I could summarize in a few hundred words.
Ed,
Thanks for pointing to this paper. Haven’t finished reading the whole thing yet, but I believe they were watching .8% of the net (they had two telescopes that they worked with, if I read correctly). What you said also fits with this, but as there are different types of telescopes (I believe they used /8’s here meaning that they contained 16.7 million adresses (e.g. 141.*.*.*)), it’s a little misleading. But then, those who are concerned about more of the details will be reading the whole thing, so…