January 20, 2025

Dissecting the Witty Worm

A clever new paper by Abhishek Kumar, Vern Paxson, and Nick Weaver analyzes the Witty Worm, which infected computers running certain security software in March 2004. By analyzing the spray of random packets Witty sent around the Internet, they managed to learn a huge amount about Witty’s spread, including exactly where the virus was injected into the net, and which sites might have been targeted specially by the attacker. They can track with some precision exactly who infected whom as Witty spread.

They did this using data from a “network telescope”. The “telescope” sits in a dark region of the Internet: a region containing 0.4% of all valid IP addresses but no real computers. The telescope records every single network packet that shows up in this dark region. Since there are no ordinary computers in the region, any packets that do show up must be sent in error, or sent to a randomly chosen address.

Witty, like many worms, spread by sending copies of itself to randomly chosen IP addresses. An infected machine would send 20,000 copies of the worm to random addresses, then do a little damage to the local machine, then send 20,000 more copies of the worm to random addresses, then do more local damage, and so on, until the local machine died. When one of the random packets happened to arrive at a vulnerable machine, that machine would be infected and would join the zombie army pumping out infection packets.

Whenever an infected machine happened to send an infection packet into the telescope’s space, the telescope would record the packet and the researchers could deduce that the source machine was infected. So they could figure out which machines were infected, and when each infection began.

Even better, they realized that infected machines were generating the sequence of “random” addresses to attack using something called a Linear Congruential PseudoRandom Number Generator, which is a special kind of deterministic procedure that is sometimes used to crank out a sequence of numbers that looks random, but isn’t really random in the sense that coin-flips are random. Indeed, a LCPRNG has the property that if you can observe its output, then you can predict which “random” numbers it will generate in the future, and you can even calculate which “random” numbers it generated in the past. Now here’s the cool part: the infection packets arriving at the telescope contained “random” addresses that were produced by a LCPRNG, so the researchers could reconstruct the exact state of the LCPRNG on each infected machine. And from that, they could reconstruct the exact sequence of infection attempts that each infected machine made.

Now they knew pretty much everything there was to know about the spread of the Witty worm. They could even reconstruct the detailed history of which machine infected which other machine, and when. This allowed them to trace the infection back to its initial source, “Patient Zero”, which operated from a particular IP address owned by a “European retail ISP”. They observed that Patient Zero did not follow the usual infection rules, meaning that it was running special code designed to launch the worm, apparently by spreading the worm to a “hit list” of machines suspected to be vulnerable. A cluster of machines on the hit list happened to be at a certain U.S. military installation, suggesting that the perpetrator had inside information that machines at that installation would be vulnerable.

The paper goes on from there, using the worm’s spread as a natural experiment on the behavior of the Internet. Researchers often fantasize about doing experiments where they launch packets from thousands of random sites on the Internet and measure the packets’ propagation, to learn about the Internet’s behavior. The worm caused many machines to send packets to the telescope, making it a kind of natural experiment that would have been very difficult to do directly. Lots of useful information about the Internet can be extracted from the infection packets, and the authors proceeded to deduce facts about the local area networks of the infected machines, how many disks they had, the speeds of various network bottlenecks, and even when each infected machine had last been rebooted before catching the worm.

This is not a world-changing paper, but it is a great example of what skilled computer scientists can do with a little bit of data and a lot of ingenuity.

On a New Server

This site is on the new server now, using WordPress. Please let me know, in the comments, if you see any problems.

About This Site

Ed Felten says:

Hi, I’m Ed Felten. In my day job, I’m a Professor of Computer Science and Public Affairs at Princeton University, and Director of Princeton’s Center for InfoTech Policy.

Alex Halderman says:

Hi, I’m J. Alex Halderman. In my afternoon and night job, I’m a graduate student in Computer Science at Princeton University.

Dan Wallach says:

Hi, I’m Dan Wallach. During the day, I’m an associate professor in the department of computer science at Rice University. Back in the day, I got my PhD at Princeton working for Ed. These days, I’m spending most of my time working on electronic voting security.

We, and the other authors listed in the sidebar, write this weblog. The focus is on issues related to legal regulation of technology, and especially on legal attempts to restrict the right of technologists and citizens to tinker with technological devices. But we reserve the right to write about anything that strikes our fancy.

Needless to say, we speak only for ourselves. Nothing we write here is endorsed by our employers, our fellow contributors on this blog, or by anyone else except the author. Even we are not too sure about some of this stuff. Posts by others, including our fellow bloggers, guest bloggers and other contributors, reflect their opinions, not necessarily ours.

We welcome comments, suggestions, and polite argumentation. If you send us an email about something we’ve written here, we’ll assume (unless you tell us otherwise) that we have your permission to quote your message on the site. Or you can post a comment to the site yourself.

Material in the Comments section is contributed by others. We can’t vouch for its accuracy and it doesn’t necessarily reflect our opinions. We reserve the right to remove comments that are clearly off-topic or highly offensive; but otherwise we’ll leave the comments alone.

(We also use automated tools to fight comment spam. When these tools see indications of spamminess in a comment – according to whatever criteria the tools’ authors chose to use – they will remove a comment or hold it for human inspection. We look at the held comments periodically and release any that are not spam. If your comments seem to disappear or be mysteriously delayed for hours, this is probably the explanation. We apologize for any inconvenience, but we have found automated anti-spam tools necessary given the volume of comment spam we face.)

Unless noted otherwise, the author of each post owns the copyright on that post. (Commenters may own the copyright on their comments – ask a copyright lawyer – but we assume that commenters give our readers permission to redistribute or use their comments under the same terms that apply to our material on which they are commenting.) Everything else that is copyrightable is copyrighted by Edward W. Felten, J. Alex Halderman, and Dan S. Wallach. Thanks to the Sonny Bono Copyright Term Extension Act of 1998, our copyrights on this site will expire early in the 22nd century.

Creative Commons License
Unless noted otherwise, material on Freedom to Tinker is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

BitTorrent Search

BitTorrent.com released a new search facility yesterday, making it slightly easier to find torrent files on the Net. This is an odd strategic move by BitTorrent.com – it doesn’t help the company’s customers much, but mostly just muddles the company’s public messaging.

[Backstory about BitTorrent: The BitTorrent technology allows efficient Internet distribution of large files to many recipients, without creating a central network bottleneck. In current released versions of BitTorrent, you locate content by getting a torrent file from a standard web server. The torrent file points to the location of a “tracker” which in turn keeps track of where on the net you can go to get pieces of the content. (A new beta version eliminates the tracker, which is an interesting development but is largely irrelevant to the issues I’m discussing today.)]

The term “BitTorrent” is used to refer to three separate things:

  • a company, which I’ll call “BitTorrent.com”,
  • a software product called BitTorrent, distributed for free, with source code, by BitTorrent.com,
  • the communication protocols that enable users’ systems to communicate, which are implemented by the BitTorrent software but can be implemented by other software programs.

Blending the three together is sometimes a harmless rhetorical shortcut, but at other times leads to faulty reasoning. For example, a court could hypothetically shut down BitTorrent.com (if the company were found to be a lawbreaker) but it could neither undistribute the software code that was already in users’ hands, nor uncreate the protocol. Critics who are thinking sloppily (or want their audiences to think sloppily) sometimes ignore these distinctions. BitTorrent.com, the company, may have a business incentive to blur the distinctions, in order to make the company’s role seem more important than it really is.

The new BitTorrent.com search facility seems to be entirely separate, functionally, from the BitTorrent software and protocols. Anybody could have created this search facility; and indeed others have. Google, for instance, happens to offer a fairly complete torrent search facility. A BitTorrent.com search for “sith” returns quite a few files claiming to be the new Star Wars movie; but so does a Google search for “sith filetype:torrent”. There’s no reason, functionally, why BitTorrent.com had to be the one to offer a torrent search engine. An independent search engine would work just as well.

Is BitTorrent.com search legal? I’ll leave that one to the lawyers; but I’ll point out two things. First, the DMCA provides a safe harbor against indirect infringement for search engines that follow certain takedown procedures on receiving infringement complaints. BitTorrent.com will apparently follow those procedures, and so the safe harbor may apply. Second, the connection from BitTorrent.com to any infringing content is quite indirect: a BitTorrent.com search result gives the address of a torrent file; the torrent file gives the address of a tracker, the tracker gives the addresses of client computers, and the client computers are the ones that actually distribute infringing content. (The new trackerless version of BitTorrent changes the details, but doesn’t reduce the number of steps.) There are at least three intermediaries between BitTorrent.com and any infringing material.

Even if the search facility is legal, it seems like a bad strategic move by BitTorrent.com. Large copyright interests have been trying to paint BitTorrent as having a pro-infringement agenda; but thus far their efforts have had only limited success because Bram Cohen (the software’s creator) and BitTorrent.com have carefully dissociated themselves from infringement and have conspicuously designed their technology for the benefit of noninfringing users.

As Joe Gratz argues, the new BitTorrent.com search facility, regardless of the merits, will make it easier for BitTorrent.com’s critics to paint the company as having a secret pro-infringement agenda. And that by itself is enough to make an in-house search engine a big mistake for the company.

BitTorrent.com needs to remember that it can be killed by Washington politics. But politicians need to remember, too, that it is the BitTorrent protocol, not the company, that is changing the world. Killing the company will not kill the protocol. A protocol is an idea; and in a free society ideas cannot be killed.

A Land Without Music

Here’s a story I heard recently from an anonymous source. Based on the source’s identity and some of the details of the story, I believe it to be true. I have omitted some details here, to protect the source.

A well-known company, running a massive multi-player virtual world, was considering adding a new space to their world. Due to the nature of the space, characters there would probably want to make music. So the programmers created a set of virtual musical instruments, and tools for players to create their own instruments. The plan was that players would get virtual instruments and make music, for all of the reasons people make music in the real world.

But management nixed the idea, on advice from lawyers, because of concerns about copyright infringement. The problem was that players might use their virtual instruments to play copyrighted songs, and the game company might be sued for contributory or vicarious copyright infringement, for failing to prevent this.

Stop for a minute to think about this. All kinds of virtual objects exist in this virtual world, including a wide variety of weapons. But saxophones? Too risky. Presumably management would have approved a magic saxophone that was only capable of playing non-copyrighted songs, but the engineers had no idea how such a thing could be built.

To put this in context, recall that programmable virtual instruments are widely sold and used in the real world. They’re called synthesizers, and they’re really just computers that can be programmed to play any sequence of sounds, whether copyrighted or not. It’s not so easy to draw a principled line between real-world synthesizers and game-world instruments that makes one legal and the other illegal.

Perhaps the company was being overly cautious and the lawsuit risk was illusory. But I’m not so sure. This would hardly be the most farfetched copyright lawsuit we have seen.