A clever new paper by Abhishek Kumar, Vern Paxson, and Nick Weaver analyzes the Witty Worm, which infected computers running certain security software in March 2004. By analyzing the spray of random packets Witty sent around the Internet, they managed to learn a huge amount about Witty’s spread, including exactly where the virus was injected into the net, and which sites might have been targeted specially by the attacker. They can track with some precision exactly who infected whom as Witty spread.
They did this using data from a “network telescope”. The “telescope” sits in a dark region of the Internet: a region containing 0.4% of all valid IP addresses but no real computers. The telescope records every single network packet that shows up in this dark region. Since there are no ordinary computers in the region, any packets that do show up must be sent in error, or sent to a randomly chosen address.
Witty, like many worms, spread by sending copies of itself to randomly chosen IP addresses. An infected machine would send 20,000 copies of the worm to random addresses, then do a little damage to the local machine, then send 20,000 more copies of the worm to random addresses, then do more local damage, and so on, until the local machine died. When one of the random packets happened to arrive at a vulnerable machine, that machine would be infected and would join the zombie army pumping out infection packets.
Whenever an infected machine happened to send an infection packet into the telescope’s space, the telescope would record the packet and the researchers could deduce that the source machine was infected. So they could figure out which machines were infected, and when each infection began.
Even better, they realized that infected machines were generating the sequence of “random” addresses to attack using something called a Linear Congruential PseudoRandom Number Generator, which is a special kind of deterministic procedure that is sometimes used to crank out a sequence of numbers that looks random, but isn’t really random in the sense that coin-flips are random. Indeed, a LCPRNG has the property that if you can observe its output, then you can predict which “random” numbers it will generate in the future, and you can even calculate which “random” numbers it generated in the past. Now here’s the cool part: the infection packets arriving at the telescope contained “random” addresses that were produced by a LCPRNG, so the researchers could reconstruct the exact state of the LCPRNG on each infected machine. And from that, they could reconstruct the exact sequence of infection attempts that each infected machine made.
Now they knew pretty much everything there was to know about the spread of the Witty worm. They could even reconstruct the detailed history of which machine infected which other machine, and when. This allowed them to trace the infection back to its initial source, “Patient Zero”, which operated from a particular IP address owned by a “European retail ISP”. They observed that Patient Zero did not follow the usual infection rules, meaning that it was running special code designed to launch the worm, apparently by spreading the worm to a “hit list” of machines suspected to be vulnerable. A cluster of machines on the hit list happened to be at a certain U.S. military installation, suggesting that the perpetrator had inside information that machines at that installation would be vulnerable.
The paper goes on from there, using the worm’s spread as a natural experiment on the behavior of the Internet. Researchers often fantasize about doing experiments where they launch packets from thousands of random sites on the Internet and measure the packets’ propagation, to learn about the Internet’s behavior. The worm caused many machines to send packets to the telescope, making it a kind of natural experiment that would have been very difficult to do directly. Lots of useful information about the Internet can be extracted from the infection packets, and the authors proceeded to deduce facts about the local area networks of the infected machines, how many disks they had, the speeds of various network bottlenecks, and even when each infected machine had last been rebooted before catching the worm.
This is not a world-changing paper, but it is a great example of what skilled computer scientists can do with a little bit of data and a lot of ingenuity.