April 19, 2014

avatar

Dissecting the Witty Worm

A clever new paper by Abhishek Kumar, Vern Paxson, and Nick Weaver analyzes the Witty Worm, which infected computers running certain security software in March 2004. By analyzing the spray of random packets Witty sent around the Internet, they managed to learn a huge amount about Witty’s spread, including exactly where the virus was injected into the net, and which sites might have been targeted specially by the attacker. They can track with some precision exactly who infected whom as Witty spread.

They did this using data from a “network telescope”. The “telescope” sits in a dark region of the Internet: a region containing 0.4% of all valid IP addresses but no real computers. The telescope records every single network packet that shows up in this dark region. Since there are no ordinary computers in the region, any packets that do show up must be sent in error, or sent to a randomly chosen address.

Witty, like many worms, spread by sending copies of itself to randomly chosen IP addresses. An infected machine would send 20,000 copies of the worm to random addresses, then do a little damage to the local machine, then send 20,000 more copies of the worm to random addresses, then do more local damage, and so on, until the local machine died. When one of the random packets happened to arrive at a vulnerable machine, that machine would be infected and would join the zombie army pumping out infection packets.

Whenever an infected machine happened to send an infection packet into the telescope’s space, the telescope would record the packet and the researchers could deduce that the source machine was infected. So they could figure out which machines were infected, and when each infection began.

Even better, they realized that infected machines were generating the sequence of “random” addresses to attack using something called a Linear Congruential PseudoRandom Number Generator, which is a special kind of deterministic procedure that is sometimes used to crank out a sequence of numbers that looks random, but isn’t really random in the sense that coin-flips are random. Indeed, a LCPRNG has the property that if you can observe its output, then you can predict which “random” numbers it will generate in the future, and you can even calculate which “random” numbers it generated in the past. Now here’s the cool part: the infection packets arriving at the telescope contained “random” addresses that were produced by a LCPRNG, so the researchers could reconstruct the exact state of the LCPRNG on each infected machine. And from that, they could reconstruct the exact sequence of infection attempts that each infected machine made.

Now they knew pretty much everything there was to know about the spread of the Witty worm. They could even reconstruct the detailed history of which machine infected which other machine, and when. This allowed them to trace the infection back to its initial source, “Patient Zero”, which operated from a particular IP address owned by a “European retail ISP”. They observed that Patient Zero did not follow the usual infection rules, meaning that it was running special code designed to launch the worm, apparently by spreading the worm to a “hit list” of machines suspected to be vulnerable. A cluster of machines on the hit list happened to be at a certain U.S. military installation, suggesting that the perpetrator had inside information that machines at that installation would be vulnerable.

The paper goes on from there, using the worm’s spread as a natural experiment on the behavior of the Internet. Researchers often fantasize about doing experiments where they launch packets from thousands of random sites on the Internet and measure the packets’ propagation, to learn about the Internet’s behavior. The worm caused many machines to send packets to the telescope, making it a kind of natural experiment that would have been very difficult to do directly. Lots of useful information about the Internet can be extracted from the infection packets, and the authors proceeded to deduce facts about the local area networks of the infected machines, how many disks they had, the speeds of various network bottlenecks, and even when each infected machine had last been rebooted before catching the worm.

This is not a world-changing paper, but it is a great example of what skilled computer scientists can do with a little bit of data and a lot of ingenuity.

avatar

On a New Server

This site is on the new server now, using WordPress. Please let me know, in the comments, if you see any problems.

avatar

About This Site

Ed Felten says:

Hi, I’m Ed Felten. In my day job, I’m a Professor of Computer Science and Public Affairs at Princeton University, and Director of Princeton’s Center for InfoTech Policy.

Alex Halderman says:

Hi, I’m J. Alex Halderman. In my afternoon and night job, I’m a graduate student in Computer Science at Princeton University.

Dan Wallach says:

Hi, I’m Dan Wallach. During the day, I’m an associate professor in the department of computer science at Rice University. Back in the day, I got my PhD at Princeton working for Ed. These days, I’m spending most of my time working on electronic voting security.

We, and the other authors listed in the sidebar, write this weblog. The focus is on issues related to legal regulation of technology, and especially on legal attempts to restrict the right of technologists and citizens to tinker with technological devices. But we reserve the right to write about anything that strikes our fancy.

Needless to say, we speak only for ourselves. Nothing we write here is endorsed by our employers, our fellow contributors on this blog, or by anyone else except the author. Even we are not too sure about some of this stuff. Posts by others, including our fellow bloggers, guest bloggers and other contributors, reflect their opinions, not necessarily ours.

We welcome comments, suggestions, and polite argumentation. If you send us an email about something we’ve written here, we’ll assume (unless you tell us otherwise) that we have your permission to quote your message on the site. Or you can post a comment to the site yourself.

Material in the Comments section is contributed by others. We can’t vouch for its accuracy and it doesn’t necessarily reflect our opinions. We reserve the right to remove comments that are clearly off-topic or highly offensive; but otherwise we’ll leave the comments alone.

(We also use automated tools to fight comment spam. When these tools see indications of spamminess in a comment – according to whatever criteria the tools’ authors chose to use – they will remove a comment or hold it for human inspection. We look at the held comments periodically and release any that are not spam. If your comments seem to disappear or be mysteriously delayed for hours, this is probably the explanation. We apologize for any inconvenience, but we have found automated anti-spam tools necessary given the volume of comment spam we face.)

Unless noted otherwise, the author of each post owns the copyright on that post. (Commenters may own the copyright on their comments – ask a copyright lawyer – but we assume that commenters give our readers permission to redistribute or use their comments under the same terms that apply to our material on which they are commenting.) Everything else that is copyrightable is copyrighted by Edward W. Felten, J. Alex Halderman, and Dan S. Wallach. Thanks to the Sonny Bono Copyright Term Extension Act of 1998, our copyrights on this site will expire early in the 22nd century.

Creative Commons License
Unless noted otherwise, material on Freedom to Tinker is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

avatar

BitTorrent Search

BitTorrent.com released a new search facility yesterday, making it slightly easier to find torrent files on the Net. This is an odd strategic move by BitTorrent.com – it doesn’t help the company’s customers much, but mostly just muddles the company’s public messaging.

[Backstory about BitTorrent: The BitTorrent technology allows efficient Internet distribution of large files to many recipients, without creating a central network bottleneck. In current released versions of BitTorrent, you locate content by getting a torrent file from a standard web server. The torrent file points to the location of a "tracker" which in turn keeps track of where on the net you can go to get pieces of the content. (A new beta version eliminates the tracker, which is an interesting development but is largely irrelevant to the issues I'm discussing today.)]

The term “BitTorrent” is used to refer to three separate things:

  • a company, which I’ll call “BitTorrent.com”,
  • a software product called BitTorrent, distributed for free, with source code, by BitTorrent.com,
  • the communication protocols that enable users’ systems to communicate, which are implemented by the BitTorrent software but can be implemented by other software programs.

Blending the three together is sometimes a harmless rhetorical shortcut, but at other times leads to faulty reasoning. For example, a court could hypothetically shut down BitTorrent.com (if the company were found to be a lawbreaker) but it could neither undistribute the software code that was already in users’ hands, nor uncreate the protocol. Critics who are thinking sloppily (or want their audiences to think sloppily) sometimes ignore these distinctions. BitTorrent.com, the company, may have a business incentive to blur the distinctions, in order to make the company’s role seem more important than it really is.

The new BitTorrent.com search facility seems to be entirely separate, functionally, from the BitTorrent software and protocols. Anybody could have created this search facility; and indeed others have. Google, for instance, happens to offer a fairly complete torrent search facility. A BitTorrent.com search for “sith” returns quite a few files claiming to be the new Star Wars movie; but so does a Google search for “sith filetype:torrent”. There’s no reason, functionally, why BitTorrent.com had to be the one to offer a torrent search engine. An independent search engine would work just as well.

Is BitTorrent.com search legal? I’ll leave that one to the lawyers; but I’ll point out two things. First, the DMCA provides a safe harbor against indirect infringement for search engines that follow certain takedown procedures on receiving infringement complaints. BitTorrent.com will apparently follow those procedures, and so the safe harbor may apply. Second, the connection from BitTorrent.com to any infringing content is quite indirect: a BitTorrent.com search result gives the address of a torrent file; the torrent file gives the address of a tracker, the tracker gives the addresses of client computers, and the client computers are the ones that actually distribute infringing content. (The new trackerless version of BitTorrent changes the details, but doesn’t reduce the number of steps.) There are at least three intermediaries between BitTorrent.com and any infringing material.

Even if the search facility is legal, it seems like a bad strategic move by BitTorrent.com. Large copyright interests have been trying to paint BitTorrent as having a pro-infringement agenda; but thus far their efforts have had only limited success because Bram Cohen (the software’s creator) and BitTorrent.com have carefully dissociated themselves from infringement and have conspicuously designed their technology for the benefit of noninfringing users.

As Joe Gratz argues, the new BitTorrent.com search facility, regardless of the merits, will make it easier for BitTorrent.com’s critics to paint the company as having a secret pro-infringement agenda. And that by itself is enough to make an in-house search engine a big mistake for the company.

BitTorrent.com needs to remember that it can be killed by Washington politics. But politicians need to remember, too, that it is the BitTorrent protocol, not the company, that is changing the world. Killing the company will not kill the protocol. A protocol is an idea; and in a free society ideas cannot be killed.

avatar

A Land Without Music

Here’s a story I heard recently from an anonymous source. Based on the source’s identity and some of the details of the story, I believe it to be true. I have omitted some details here, to protect the source.

A well-known company, running a massive multi-player virtual world, was considering adding a new space to their world. Due to the nature of the space, characters there would probably want to make music. So the programmers created a set of virtual musical instruments, and tools for players to create their own instruments. The plan was that players would get virtual instruments and make music, for all of the reasons people make music in the real world.

But management nixed the idea, on advice from lawyers, because of concerns about copyright infringement. The problem was that players might use their virtual instruments to play copyrighted songs, and the game company might be sued for contributory or vicarious copyright infringement, for failing to prevent this.

Stop for a minute to think about this. All kinds of virtual objects exist in this virtual world, including a wide variety of weapons. But saxophones? Too risky. Presumably management would have approved a magic saxophone that was only capable of playing non-copyrighted songs, but the engineers had no idea how such a thing could be built.

To put this in context, recall that programmable virtual instruments are widely sold and used in the real world. They’re called synthesizers, and they’re really just computers that can be programmed to play any sequence of sounds, whether copyrighted or not. It’s not so easy to draw a principled line between real-world synthesizers and game-world instruments that makes one legal and the other illegal.

Perhaps the company was being overly cautious and the lawsuit risk was illusory. But I’m not so sure. This would hardly be the most farfetched copyright lawsuit we have seen.

avatar

Moving to New Server

I’ll be moving this site to a new server over the next few days, so there may be a few glitches. Please let me know if any problems persist.

avatar

Broadcast Flag and Compatibility

National Journal Tech Daily (an excellent publication, but behind a paywall) has an interesting story, by Sarah Lai Stirland, about an exchange between Mike Godwin of Public Knowledge and some entertainment industry lobbyists, at a DC panel last week. Godwin argued that the FCC’s broadcast flag rule, if it is reinstated, will end up regulating a very broad range of devices.

Godwin said any regulations concerning digital television copy-protection schemes would necessarily have to affect any devices that hook up to digital television receivers. That technical fact could have far-reaching implications, such as making gadgets incompatible with each other and crimping technology companies’ ability to innovate, he said.

“I don’t want to be the legislator or the legislative staff person in charge of shutting off connectivity and compatibility for consumers, and I don’t think you want to do that either,” he told a roomful of technology policy lobbyists and congressional staffers. “It’s going to make consumers’ lives hell.”

Godwin’s talk drew a sharp protest from audience member Rick Lane, vice president of government affairs at News Corp.

“Compatibility is not a goal,” he said, pointing out that there are currently a plethora of consumer electronics and entertainment products that are not interoperable. Lane was seconded by NBC Universal’s Senior Counsel for Government Relations Alec French, who also was in the audience.

To consumers, compatibility is a goal. When devices don’t work together, that is a problem to be solved, not an excuse to mandate even more incompatibility.

The FCC and Congress had better be careful in handling the digital TV issue, or they’ll be blamed for breaking the U.S. television system. Mandating incompatibility, via the Broadcast Flag, will not be a popular policy, especially at a time when Congress is talking about shutting off analog TV broadcasts.

The most dangerous place in Washington is between Americans and their televisions.

avatar

Is the FCC Ruling Out VoIP on PCs?

The FCC has issued an order requiring VoIP systems that interact with the old-fashioned phone network to provide 911service. Carriers have 120 days to comply.

It won’t be easy for VoIP carriers to provide the 911 service that people have come to expect from the traditional phone system. The biggest challenge in providing 911 on VoIP is knowing where the caller is located.

In the traditional phone system, it’s easy to know the caller’s location. The phone company strings wires from its facility to customers’ homes and offices. Every call starts on a phone company wire, and the phone company knows where each of those wires originates; so they know the caller’s location. The phone company routes 911 calls to the appropriate local emergency call center, and they provide the call center with the caller’s location. One big advantage of this system is that it works even if the caller doesn’t know his location precisely (or can’t communicate it clearly).

Things are different in the VoIP world. Suppose I’m running a VoIP application on my laptop. I can make and receive VoIP calls whenever my laptop is connected to the Internet, whether I’m at home, or in my office, or in a hotel room in Zurich. My VoIP endpoint and my VoIP phone number can be used anywhere. No longer can the carrier map my phone number to a single, fixed location. My number goes wherever my laptop goes.

How can a VoIP carrier know where my laptop is at any given moment? I’m not sure. The carrier could try to see which IP address (i.e., which address on the Internet) my packets are coming from, and then figure out the physical location of that IP address. That will work well if I connect to the Net in the simplest possible way; but more sophisticated connection methods will foil this method. For example, my VoIP network packets will probably appear to come from the Princeton computer science department, regardless of whether I’m at my office, at home, or in a hotel somewhere. How will my VoIP carrier know where I am?

Another approach is to have my laptop try to figure out where it is, by looking at its current IP address (and other available information). This won’t work too well, either. Often all my laptop can deduce from its IP address is that there is a fancy firewall between it and the real Internet. That’s true for me at home, and in most hotels. I suppose you could put a GPS receiver in future laptops, but that won’t help me today.

We could try to invent some kind of Internet-location-tracking protocol, which would be quite complicated, and would raise significant privacy issues. It’s not clear how to let 911 call centers track me, without also making me trackable by many others who have no business knowing where I am.

Tim Lee at Technology Liberation Front suggests creating a protocol that lets Internet-connected devices learn their geographic location. (It might be an extension of DHCP.) This is probably feasible technically, but it take a long time to be adopted. And it surely won’t be deployed widely within 120 days.

All in all, this looks like a big headache for VoIP providers, especially for ones who use existing standard software and hardware. Maybe VoIP providers will take a best-effort approach and then announce their compliance; but that will probably fail as stories about VoIP 911 failures continue to show up in the media.

Of course, VoIP carriers can avoid these rules by avoiding interaction with the old-fashioned phone network. VoIP systems that don’t provide a way to make and receive calls with old-fashioned phone users, won’t be required to provide 911 service. So the real effect of the FCC’s order may be to cut off interaction between the old and new phone systems, which won’t really help anyone.

avatar

Why I Can't Tinker with my Household Cleaner

John Mark Ockerbloom emailed an interesting story about Federal regulation of tinkering with household chemicals, which I quote here with permission:

I just washed our kitchen floor tonight. And (I admit guiltily)I haven’t done it in a while– usually I “let” my wife do it. So I look at the small print on the label of our Lysol All-Purpose Cleaner to remind me what to do, as I fill a bucket with water. And there I see the words “It is a violation of federal law to use this product in a manner inconsistent with its labeling.”

And I get to wondering: what law? And why? It’s all well and good that, for now at least, I can tinker with my digital TV signal, and I’m right with you at wanting to be able to tinker with my software. But supposing I like hacking cleaners or chemicals instead of code– why can’t I tinker with my cleaner like I should be able to with my computer?

Maybe, I think, it’s just like one of those overbroad warnings you sometimes see affixed to copyright notices. And I start to think about what the *real* regulation might be– using it to make a drug? An explosive? Poison? What can I really do with this, and what can’t I? And why?

15 minutes after firing up Google, I think I have my answer. Technically, under federal law I really *can’t* do anything with it other than what the label tells me to do– unless it’s to use a lower dose than what’s on the label. The relevant law appears to be in Title 7, Chapter 6, Subchapter II: Environmental Pesticides. 7 USC 136j tells me it really is illegal to use pesticides in a manner inconsistent with the labeling. And “pesticide” is defined in 7 USC 136 (u). I don’t see that it obviously includes my Lysol bottle, but another web page tells me that the EPA considers this definition to include disinfectants. My Lysol bottle claims to disinfect, so I gather this makes it a pesticide, and therefore subject to this federal law, and thus illegal to hack, as it were.

Now, my question: is this legal overreaching, and if so, where exactly is the overreach? I can see a legitimate reason for a government (if not federal, than at least state) to stop people from polluting the environment, as might occur if they dump too much bug killer– or even household cleaner– into the air or water. But can they really prevent me from doing anything other than what the labeler allows with my cleaner? (Like try to mix it with other ingredients to make a safer cleaner, or doing cleaner-vs-Twinkie endurance experiments, or seeing if I can separate some ingredients from the rest– in all cases being careful to contain and control the liquid and its vapors, and keep them on my property?) Or is the EPA, or Congress, overstepping its powers somehow?

And if not, then is there anything stopping Congress from giving, say, the FCC broad powers to prohibit using, say, digital media or devices in a manner inconsistent with the labeling their manufacturers give them? (There might be the First Amendment… or there might not be. It’s not clear it would apply in all electronic tinkering cases, and cases like Pacifica show that it can overridden in some cases where it would seem to apply.)

I’d be interested in finding out more, before I start seeing notices saying “it is illegal to record this program in a manner inconsistent with the presence of commercials” on my TV…

He also notes this:

I do see that it’s possible to get an “experimental use permit” to tinker with things considered pesticides, described in 7 USC 136c. Though it sounds like it’s not trivial to get; among other things, there’s a sentence in 136c that states “The Administrator may issue an experimental use permit only if the Administrator determines that the applicant needs such permit in order to accumulate information necessary to register a pesticide under section 136a of this title.”

avatar

Course Blog: Lessons Learned

This semester I had students in my course, “Information Technology and the Law,” write for a course blog. This was an experiment, but it worked out quite well. I will definitely do it again.

We required each of the twenty-five students in the course to post at least once a week. Each student was assigned a particular day of the week on which his or her entries were due. We divided the due dates evenly among the seven days of the week, to ensure an even flow of new posts, and to facilitate discussion among the students. The staggered due dates worked nicely, and had the unexpected benefit of evening out the instructors’ and students’ blog reading workload.

To be honest, I’m not sure how religiously students read the blog. Many entries had comments from other students, but I suspect that many students read the blog irregularly. My guess is that most of them read it, most of the time.

We told students that they should write 400-500 words each week, about any topic related to the course. As expected, most students wrote about the topics we were discussing in class at the moment. Some students would read ahead a bit and then post about the readings before we discussed them in class. Others would reflect on recent in-class discussions. In both cases, the blogging helped to extend the class discussion. A few students wrote about material outside the readings, but within the course topic.

One of the biggest benefits, which I didn’t fully appreciate in advance, was that students got to see the writing their peers submitted. This was valuable not only for the exchange of ideas, but also in helping students improve their writing. Often students learn about the standard of performance only by reading comments from a grader; here they could see what their peers were producing.

To protect students’ privacy, we gave them the option of writing under a pseudonym. Seven of twenty-five students used a pseudonym. Students had to reveal their pseudonym to the instructors, but it was up to them whether to reveal it to the other students in the course. A few students chose pseudonyms that would be obvious to people in the course; for example, one student used his first name. Most of the others seemed willing to reveal their pseudonyms to the rest of the class, though not everyone had occasion to do so.

I was pleasantly surprised by the quality of the writing. Most of it was good, and some was top-notch. Comments from peers, and from outsiders, were also helpful. However, it seems unlikely that many outsiders would read such a course blog, given the sheer volume of postings.

The logistics worked out pretty well. We used WordPress, with comment moderation enabled (to fend off comment spam). We sent out a brief email with instructions at the beginning, and students caught on quickly.

On the whole, the course blog worked out better than expected, and I will use the same method in the future.

[If any students from the course read this, please chime in in the comments. I already submitted course grades, so you can be brutally honest.]