August 28, 2016

Archives for September 2005


Net Governance Debate Heats Up

European countries surprised the U.S. Wednesday by suggesting that an international body rather than the U.S. government should have ultimate control over certain Internet functions. According to Tom Wright’s story in the International Herald Tribune,

The United States lost its only ally [at the U.N.’s World Summit on the Information Society] late Wednesday when the EU made a surprise proposal to create an intergovernmental body that would set principles for running the Internet. Currently the U.S. Commerce Department approves changes to the Internet’s “root zone files”, which are administered by the Internet Corporation for Assigned Names and Numbers, or Icann, a nonprofit organization based in Marina del Rey, California.

As often happens, this discussion seems to confuse control over Internet naming with control over the Internet as a whole. Note the juxtaposition: the EU wants a new body to “set principles for running the Internet”; currently the U.S. controls naming via Icann.

This battle would be simpler and less intense if it were only about naming. What is really at issue is who will have the perceived legitimacy to regulate the Internet. The U.S. fears a U.N.-based regulator, as do I. Much of the international community fears and resents U.S. hegemony over the Net. (General anti-Americanism plays a role too, as in the Inquirer’s op-ed.)

The U.S. would have cleaner hands in this debate if it swore off broad regulation of the Net. It’s hard for the U.S. to argue against creating a new Internet regulator when the U.S. itself looks eager to regulate the Net. Suspicion is strong that the U.S. will regulate the Net to the advantage of its entertainment and e-commerce industries. Here’s the Register’s story:

The UN’s special adviser for internet governance, Nitin Desai, told us that the issue of control was particularly stark for developing nations, where the internet is not so much an entertainment or e-commerce medium but a vital part of the country’s infrastructure.

[Brazilian] Ambassador Porto clarified that point further: “Nowadays our voting system in Brazil is based on ICTs [information and communication technologies], our tax collection system is based on ICTs, our public health system is based on ICTs. For us, the internet is much more than entertainment, it is vital for our constituencies, for our parliament in Brazil, for our society in Brazil.” With such a vital resource, he asked, “how can one country control the Internet?”

The U.S. says flatly that it will not agree to an international governance scheme at this time.

If the U.S. doesn’t budge, and the international group tries to go ahead on its own, we might possibly see a split, where a new entity I’ll call “UNCANN” coexists with ICANN, with each of the two claiming authority over Internet naming. This won’t break the Internet, since each user will choose to pay attention to either UNCANN or ICANN. To the extent that UNCANN and ICANN assign names differently, there will be some confusion when UNCANN users talk to ICANN users. I wouldn’t expect many differences, though, so probably the creation of UNCANN wouldn’t make much difference, except in two respects. First, the choice to point one’s naming software at UNCANN or ICANN would probably take on symbolic importance, even if it made little practical difference. Second, UNCANN’s aura of legitimacy as a naming authority would make it easier for UNCANN to issue regulatory decrees that were taken seriously by the states that would ultimately have to implement them.

This last issue, of regulatory legitimacy, is the really important one. All the talk about naming is a smokescreen.

My guess is that the Geneva meeting will break up with much grumbling but no resolution of this issue. The EU and the rest of the international group won’t move ahead with its own naming authority, and the U.S. will tread more carefully in the future. That’s the best outcome we can hope for in the short term.

In the longer term, this issue will have to be resolved somehow. Until it is, many people around the world will keep asking the question, “Who runs the Internet?”, and not liking the answer.


The Pizzaright Principle

Lately, lots of bogus arguments for copyright expansion have been floating around. A handy detector for bogus arguments is the Pizzaright Principle.

Pizzaright – the exclusive right to sell pizza – is a new kind of intellectual property right. Pizzaright law, if adopted, would make it illegal to make or serve a pizza without a license from the pizzaright owner.

Creating a pizzaright would be terrible policy, of course. We’re much better off letting the market decide who can make and sell pizza.

The Pizzaright Principle says that if you make an argument for expanding copyright or creating new kinds of intellectual property rights, and if your argument serves equally well as an argument for pizzaright, then your argument is defective. It proves too much. Whatever your argument is, it had better rest on some difference between pizzaright and the exclusive right you want to create.

Let’s apply the Pizzaright Principle to two well-known bogus arguments for intellectual property expansion.

Suppose Alice argues that extending the term of copyright is good, because it gives the copyright owner a revenue stream that can be invested in creating new works. She could equally well argue that pizzaright is good, because it gives the pizzaright owner a revenue stream that can be invested in creating new pizzas.

(The flaw in Alice’s argument is that the decision whether to invest in a new copyrighted work, or a new pizza, is rationally based only on the cost of the investment and the expected payoff. Making a transfer payment to the would-be investor doesn’t change his decision, assuming that capital markets are efficient.)

Suppose that Bob argues that the profitability of broadcasting may be about to decrease, so broadcasters should be given new intellectual property rights. He could equally well argue that if the pizza business has become less profitable, a pizzaright should be created.

(The flaw in Bob’s argument was the failure to show that the new right furthers the interests of society as a whole, as opposed to the narrow interests of the broadcasters or pizzamakers.)

The Pizzaright Principle is surprisingly useful. Try it out on the next IP expansion argument you hear.


Secure Flight: Shifting Goals, Vague Plan

The Transportation Security Administration (TSA) released Friday a previously confidential report by the Secure Flight Working Group (SFWG), an independent expert committee on which I served. The committee’s charter was to study the privacy implications of the Secure Flight program. The final report is critical of TSA’s management of Secure Flight.

(Besides me, the committee members were Martin Abrams, Linda Ackerman, James Dempsey, Daniel Gallington, Lauren Gelman, Steven Lilienthal, Bruce Schneier, and Anna Slomovic. Members received security clearances and had access to non-public information; but everything I write here is based on public information. I should note that although the report was meant to reflect the consensus of the committee members, readers should not assume that every individual member agrees with everything said in the report.)

Secure Flight is a successor to existing programs that do three jobs. First, they vet air passengers against a no-fly list, which contains the names of people who are believed to pose a danger to aviation and so are not allowed to fly. Second, they vet passengers against a watch list, which contains the names of people who are believed to pose a more modest danger and so are subject to a secondary search at the security checkpoint. Third, they vet passengers’ reservations against the CAPPS I criteria, and subject those who meet the criteria to a secondary search. (The precise CAPPS I criteria are not public, but it is widely believed that the criteria include whether the passenger paid cash for the ticket, whether the ticket is one-way, and other factors.)

The key section of the report is on pages 5-6. Here’s the beginning of that section:

The SFWG found that TSA has failed to answer certain key questions about Secure Flight: First and foremost, TSA has not articulated what the specific goals of Secure Flight are. Based on the limited test results presented to us, we cannot assess whether even the general goal of evaluating passengers for the risk they represent to aviation security is a realistic or feasible one or how TSA proposes to achieve it. We do not know how much or what kind of personal information the system will collect or how data from various sources will flow through the system.

The lack of clear goals for the program is a serious problem (p. 5):

The TSA is under a Congressional mandate to match domestic airline passenger lists against the consolidated terrorist watch list. TSA has failed to specify with consistency whether watch list matching is the only goal of Secure Flight at this state. The Secure Flight Capabilities and Testing Overview, dated February 9, 2005 (a non-public document given to the SFWG), states in the Appendix that the program is not looking for unknown terrorists and has no intention of doing so. On June 29, 2005, Justin Oberman (Assistant Administrator, Secure Flight/Registered Traveler [at TSA]) testified to a Congressional committee that “Another goal proposed for Secure Flight is its use to establish “Mechanisms for … violent criminal data vetting.” Finally, TSA has never been forthcoming about whether it has an additional, implicit goal – the tracking of terrorism suspects (whose presence on the terrorist watch list does not necessarily signify intention to commit violence on a flight).

The report also notes that TSA had not answered questions about what the system’s architecture would be, whether Secure Flight would be linked to other TSA systems, whether and how the system would use commercial data sources, and how oversight would work. TSA had not provided enough information to evaluate the security of Secure Flight’s computer systems and databases.

The report ends with these recommendations:

Congress should prohibit live testing of Secure Flight until it receives the following from the [Homeland Security Secretary].

First, a written statement of the goals of Secure Flight signed by the Secretary of DHS that only can be changed on the Secretary’s order. Accompanying documentation should include: (1) a description of the technology, policy and processes in place to ensure that the system is only used to achieve the stated goals; (2) a schematic that describes exactly what data is collected, from what entities, and how it flows though the system; (3) rules that describe who has access to the data and under what circumstances; and (4) specific procedures for destruction of the data. There should also be an assurance that someone has been appointed with sufficient independence and power to ensure that the system development and subsequent use follow the documented procedures.

In conclusion, we believe live testing of Secure Flight should not commence until there has been adequate time to review, comment, and conduct a public debate on the additional documentation outlined above.

Speaking for myself, I joined the committee with an open mind. A system along the general lines of Secure Flight might make sense, and might properly balance security with privacy. I wanted to see whether Secure Flight could be justified. I wanted to hear someone make the case for Secure Flight. TSA had said that it was gathering evidence and doing analysis to do so.

In the end, TSA never did make a case for Secure Flight. I still have the same questions I had at the beginning. But now I have less confidence that TSA can successfully run a program like Secure Flight.


Google Print, Damages and Incentives

There’s been lots of discussion online of this week’s lawsuit filed against Google by a group of authors, over the Google Print project. Google Print is scanning in books from four large libraries, indexing the books’ contents, and letting people do Google-style searches on the books’ contents. Search results show short snippets from the books, but won’t let users extract long portions. Google will withdraw any book from the program at the request of the copyright holder. As I understand it, scanning was already underway when the suit was filed.

The authors claim that scanning the books violates their copyright. Google claims the project is fair use. Everybody agrees that Google Print is a cool project that will benefit the public – but it might be illegal anyway.

Expert commentators disagree about the merits of the case. Jonathan Band thinks Google should win. William Patry thinks the authors should win. Who am I to argue with either of them? The bottom line is that nobody knows what will happen.

So Google was taking a risk by starting the project. The risk is larger than you might think, because if Google loses, it won’t just have to reimburse the authors for the economic harm they have suffered. Instead, Google will have to pay statutory damages of up to $30,000 for every book that has been scanned. That adds up quickly! (I don’t know how many books Google has scanned so far, but I assume it’s a nontrivial numer.)

You might wonder why copyright law imposes such a high penalty for an act – scanning one book – that causes relatively little harm. It’s a good question. If Google loses, it makes economic sense to make Google pay for the harm it has caused (and to impose an injunction against future scanning). This gives Google the right incentive, to weigh the expected cost of harm to the authors against the project’s overall value.

Imposing statutory damages makes technologists like Google too cautious. Even if a new technology creates great value while doing little harm, and the technologist has a strong (but not slam-dunk) fair use case, the risk of statutory damages may deter the technology’s release. That’s inefficient.

Some iffy technologies should be deterred, if they create relatively little value for the harm they do, or if the technologist has a weak fair use case. But statutory damages deter too many new technologies.

[Law and economics mavens may object that under some conditions it is efficient to impose higher damages. That’s true, but I don’t think those conditions apply here. I don’t have space to address this point further, but please feel free to discuss it in the comments.]

In light of the risk Google is facing, it’s surprising that Google went ahead with the project. Maybe Google will decide now that discretion is the better part of valor, and will settle the case, stopping Google Print in exchange for the withdrawal of the lawsuit.

The good news, in the long run at least, is that this case will remind policymakers of the value of a robust fair use privilege.


Who Is An ISP?

There’s talk in Washington about a major new telecommunications bill, to update the Telecom Act of 1996. A discussion draft of the bill is floating around.

The bill defines three types of services: Internet service (called “Broadband Internet Transmission Service” or BITS for short); VoIP; and broadband television. It lays down specific regulations for each type of service, and delegates regulatory power to the FCC.

In bills like this, much of the action is in the definitions. How you’re regulated depends on which of the definitions you satisfy, if any. The definitions essentially define the markets in which companies can compete.

Here’s how the Internet service market is defined:

The term “BITS” or “broadband Internet transmission service” –
(A) means a packet-switched service that is offered to the public, or [effectively offered to the public], with or without a fee, and that, regardless of the facilities used –
(i) is transmitted in a packed-based protocol, including TCP/IP or a successor protocol; and
(ii) provides to subscribers the capability to send and receive packetized information; …

The term “BITS provider” means any person who provides or offers to provide BITS, either directly or through an affiliate.

The term “packet-switched service” means a service that routes or forwards packets, frames, cells, or other data units based on the identification, address, or other routing information contained in the packets, frames, cells, or other data units.

The definition of BITS includes ordinary Internet Service Providers, as we would expect. But that’s not all. It seems to include public chat servers, which deliver discrete messages to specified destination users. It seems to include overlay networks like Tor, which provide anonymous communication over the Internet using a packet-based protocol. As Susan Crawford observes, it seems to cover nodes in ad hoc mesh networks. It even seems to include anybody running an open WiFi access point.

What happens to you if you’re a BITS provider? You have to register with the FCC and hope your registration is approved; you have to comply with consumer protection requirements (“including service appointments and responses to service interruptions and outages”); and you have to comply with privacy regulation which, ironically, require you to keep track of who your users are so you can send them annual notices telling them that you are not storing personal information about them.

I doubt the bill’s drafters meant to include chat or Tor as BITS providers. The definition can probably be rewritten to exclude cases like these.

A more interesting question is whether they meant to include open access points. It’s hard to justify applying heavyweight regulation to the individuals or small businesses who run access points. And it seems likely that many would ignore the regulations anyway, just as most consumers seem ignore the existing rules that require an FCC license to use the neighborhood-range walkie-talkies sold at Wal-Mart.

The root of the problem is the assumption that Internet connectivity will be provided only by large institutions that can amortize regulatory compliance costs over a large subscriber base. If this bill passes, that will be a self-fulfilling prophecy – only large institutions will be able to offer Internet service.


Movie Studios Form DRM Lab

Hollywood argues – or at least strongly implies – that technology companies could stop copyright infringement if they wanted to, but have chosen not to do so. I have often wondered whether Hollywood really believes this, or whether the claim is just a ploy to gain political advantage.

Such a ploy might be very effective if it worked. Imagine that you somehow convinced policymakers that the auto industry could make cars that operated with no energy source at all. You could then demand that the auto industry make all sorts of concessions in energy policy, and you could continue to criticize them for foot-dragging no matter how much they did.

If you were using this ploy, the dumbest thing you could do is to set up your own “Perpetual Motion Labs” to develop no-energy-source cars. Your lab would fail, of course, and its failure would demonstrate that your argument was bogus all along. You would only set up the lab if you thought that perpetual-motion cars were pretty easy to build.

Which brings us to the movie industry’s announcement, yesterday, that they will set up “MovieLabs”, a $30 million research effort to develop effective anti-copying technologies. The only sensible explanation for this move is that Hollywood really believes that there are easily-discovered anti-copying technologies that the technology industry has failed to find.

So Hollywood is still in denial about digital copying.

The pressure will be on MovieLabs to find strong anti-copying technologies, because a failure by MovieLabs can’t be blamed on the tech industry. Failure will show, instead, that stopping digital copying is much harder than Hollywood thought. And MovieLabs will fail, just as Perpetual Motion Labs would.

When MovieLabs fails, expect the spinners to emerge again, telling us that MovieLabs has a great technology that it can’t tell us about, or that there’s a great technology that isn’t quite finished, or that the goal all along was not to stop P2P copying but only to reduce some narrow, insignificant form of copying. Expect, most of all, that MovieLabs will go to almost any length to avoid independent evaluation of its technologies.

This is a chance for Hollywood to learn what the rest of us already know – that cheap and easy copying is an unavoidable side-effect of the digital revolution.


P2P Still Growing; Traffic Shifts to eDonkey

CacheLogic has released a new report presentation on peer-to-peer traffic trends, based on measurement of networks worldwide. (The interesting part starts at slide 5.)

P2P traffic continued to grow in 2005. As expected, there was no dropoff after the Grokster decision.

Traffic continues to shift away from the FastTrack network (used by Kazaa and others), mostly toward eDonkey. BitTorrent is still quite popular but has lost some usage share. Gnutella showed unexpected growth in the U.S., though its share is still small.

CacheLogic speculates, plausibly, that these trends reflect a usage shift away from systems that faced heavier legal attacks. FastTrack saw several legal attacks, including the Grokster litigation, along with many lawsuits against individual users. BitTorrent itself didn’t come under legal attack, but some sites directories of (mostly) infringing BitTorrent traffic were shut down. eDonkey came in for fewer legal attacks, and the lawyers mostly ignored Gnutella as insignificant; these systems grew in popularity. So far in 2005, legal attacks have shifted users from one system to another, but they haven’t reduced overall P2P activity.

Another factor in the data, which CacheLogic doesn’t say as much about, is a possible shift toward distribution of larger files. The CacheLogic traffic data count the total number of bytes transferred, so large files are weighted much more heavily than small files. This factor will tend to inflate the apparent importance of BitTorrent and eDonkey, which transfer large files efficiently, at the expense of FastTrack and Gnutella, which don’t cope as well with large files. Video files, which tend to be large, are more common on BitTorrent and eDonkey. Overall, video accounted for about 61% of P2P traffic, and audio for 11%. Given the size disparity between video and audio, it seems likely that the majority of content (measured by number of files, or by dollar value, or by minutes of video/audio content) was still audio.

The report closes by predicting the continued growth of P2P, which seems like a pretty safe bet. It notes that copyright owners are now jumping on the P2P bandwagon, having learned the lesson of BitTorrent, which is that P2P is a very efficient way to distribute files, especially large files. As for users,

End users love P2P as it gives them access to the media they want, when they want it and at high speed …

Will the copyright owners’ authorized P2P systems give users the access and flexibility they have come to expect? If not, users will stick with other P2P systems that do.


Secrecy in Science

There’s an interesting dispute between astronomers about who deserves credit for discovering a solar system object called 2003EL61. Its existence was first announced by Spanish astronomers, but another team in the U.S. believes that the Spaniards may have learned about the object due to an information leak from the U.S. team.

The U.S. team’s account appears on their web page and was in yesterday’s NY Times. The short version is that the U.S. team published an advance abstract about their paper, which called the object by a temporary name that encoded the date it had been discovered. They later realized that an obscure website contained a full activity log for the telescope they had used, which allowed anybody with a web browser to learn exactly where the telescope had been pointing on the date of the discovery. This in turn allowed the object’s orbit to be calculated, enabling anybody to point their telescope at the object and “discover” it. Just after the abstract was released, the Spanish team apparently visited the telescope log website; and a few days later the Spanish team announced that they had discovered the object.

If this account is true, it’s clearly a breach of scientific ethics by the Spaniards. The seriousness of the breach depends on other circumstances which we don’t know, such as the possibility that the Spaniards had already discovered the object independently and were merely checking whether the Americans’ object was the same one. (If so, their announcement should have said that the American team had discovered the object independently.)

[UPDATE (Sept. 15): The Spanish team has now released their version of the story. They say they discovered the object on their own. When the U.S. group’s abstract, containing a name for the object, appeared on the Net, the Spaniards did a Google search for the object name. The search showed a bunch of sky coordinates. They tried to figure out whether any of those coordinates corresponded to the object they had seen, but they were unable to tell one way or the other. So they went ahead with their own announcement as planned.

This is not inconsistent with the U.S. team’s story, so it seems most likely to me that both stories are true. If so, then I was too hasty in inferring a breach of ethics, for which I apologize. I should have realized that the Spanish team might have been unable to tell whether the objects were the same.]

When this happened, the American team hastily went public with another discovery, of an object called 2003UB313 which may be the tenth planet in our solar system. This raised the obvious question of why the team had withheld the announcement of this new object for as long as they did. The team’s website has an impassioned defense of the delay:

Good science is a careful and deliberate process. The time from discovery to announcement in a scientific paper can be a couple of years. For all of our past discoveries, we have described the objects in scientific papers before publicly announcing the objects’ existence, and we have made that announcement in under nine months…. Our intent in all cases is to go from discovery to announcement in under nine months. We think that is a pretty fast pace.

One could object to the above by noting that the existence of these objects is never in doubt, so why not just announce the existence immediately upon discovery and continue observing to learn more? This way other astronomers could also study the new object. There are two reasons we don’t do this. First, we have dedicated a substantial part of our careers to this survey precisely so that we can discover and have the first crack at studying the large objects in the outer solar system. The discovery itself contains little of scientific interest. Almost all of the science that we are interested in doing comes from studying the object in detail after discovery. Announcing the existence of the objects and letting other astronomers get the first detailed observations of these objects would ruin the entire scientific point of spending so much effort on our survey. Some have argued that doing things this way “harms science” by not letting others make observations of the objects that we find. It is difficult to understand how a nine month delay in studying an object that no one would even know existed otherwise is in any way harmful to science!

Many other types of astronomical surveys are done for precisely the same reasons. Astronomers survey the skies looking for ever higher redshift galaxies. When they find them they study them and write a scientific paper. When the paper comes out other astronomers learn of the distant galaxy and they too study it. Other astronomers cull large databases such as the 2MASS infrared survey to find rare objects like brown dwarves. When they find them they study them and write a scientific paper. When the paper comes out other astronomers learn of the brown dwarves and they study them in perhaps different ways. Still other astronomers look around nearby stars for the elusive signs of directly detectable extrasolar planets. When they find one they study it and write a scientific paper….. You get the point. This is the way that the entire field of astronomy – and probably all of science – works. It’s a very effective system; people who put in the tremendous effort to find these rare objects are rewarded with getting to be the first to study them scientifically. Astronomers who are unwilling or unable to put in the effort to search for the objects still get to study them after a small delay.

This describes an interesting dynamic that seems to occur in all scientific fields – I have seen it plenty of times in computer science – where researchers withhold results from their colleagues for a while, to ensure that they get a headstart on the followup research. That’s basically what happens when an astronomer delays announcing the discovery of an object, in order to do followup analyses of the object for publication.

The argument against this secrecy is pretty simple: announcing the first result would let more people do followup work, making the followup work both quicker and more complete on average. Scientific discovery would benefit.

The argument for this kind of secrecy is more subtle. The amount of credit one gets for a scientific result doesn’t always correlate with the difficulty of getting the result. If a result is difficult to get but doesn’t create much credit to the discoverer, then there is an insufficient incentive to look for that result. The incentive is boosted if the discoverer gets an advantage in doing followup work, for example by keeping the original result secret for a while. So secrecy may increase the incentive to do certain kinds of research.

Note that there isn’t much incentive to keep low-effort / high-credit research secret, because there are probably plenty of competing scientists who are racing to do such work and announce it first. The incentive to keep secrets is biggest for high-effort / low-credit research which enables low-effort / high-credit followup work. And this is exactly the case where incentives most need to be boosted.

Michael Madison compares the astronomers’ tradeoff between publication and secrecy to the tradeoff an inventor faces between keeping an invention secret, and filing for a patent. As a matter of law, discovered scientific facts are not patentable, and that’s a good thing.

As Madison notes, science does have its own sort of “intellectual property” system that tries to align incentives for the public good. There is a general incentive to publish results for the public good – scientific credit goes to those who publish. Secrecy is sometimes accepted in cases where secret-keeping is needed to boost incentives, but the system is designed to limit this secrecy to cases where it is really needed.

But this system isn’t perfect. As the astronomers note, the price of secrecy is that followup work by others is delayed. Sometimes the delay isn’t too serious – 2003UB313 will still be plodding along in its orbit and there will be plenty of time to study it later. But sometimes delay is a bigger deal, as when an astronomical object is short-lived and cannot be studied at all later. Another example, which arises more often in computer security, is when the discovery is about an ongoing risk to the public which can be mitigated more quickly if it is more widely known. Scientific ethics tend to require at least partial publication in cases like these.

What’s most notable about the scientific system is that it works pretty well, at least within the subject matter of science, and it does so without much involvement by laws or lawyers.


RIAA, MPAA Join Internet2 Consortium

RIAA and MPAA, trade associations that include the major U.S. record and movie companies, joined the Internet2 consortium on Friday, according to a joint press release. I’ve heard some alarm about this, suggesting that this will allow the AAs to control how the next generation Internet is built. But once we strip away the hype, there’s not much to worry about in this announcement.

Despite its grand name, Internet2 is not a new network. Its main purpose has been to add some fast links to today’s Internet, to connect bandwidth-hungry universities, e.g., so that researchers at one university can explore the results of climate simulations done at a peer university. The Internet2 links carry traffic of all sorts and they use the same protocols as the rest of the Internet.

A lesser function of Internet2 is to host discussions among researchers studying specific topics. It’s good when people studying similar problems can talk to each other, as long as one group isn’t put in charge of what the other groups do. And as I understand it, the Internet2 discussions are just that – discussions – and not a top-down management structure. So it doesn’t look to me like Internet2, as a corporate body, could do much to divert the natural course of research, even if it wanted to.

Finally, Internet2 is not in a position to dicate what technology gets deployed in the future Internet. Internet2 may give birth to ideas that are then adopted by the industry; but those ideas will only be deployed if market pressures drive the industry to build them. If the AAs think that they can sit down with Internet2 and negotiate the future of the Internet, they’re sadly mistaken. But I very much doubt that that’s what they think.

So why are the AAs joining Internet2? My guess is that they joined for mostly the same reasons that other non-IT-industry corporate members did. Why did Johnson and Johnson join? Why did Ford join? Because their business strategies depend on the future of high-performance networks. The same is true of the record and movie companies. Their business models will one day center on online, digital distribution of content. It’s best for them, and probably for everybody else too, if they face that future squarely, right away. I’m hope their presence in Internet2 will help them see what is coming, and figure out how to adapt to it.


Acoustic Snooping on Typed Information

Li Zhuang, Feng Zhou, and Doug Tygar have an interesting new paper showing that if you have an audio recording of somebody typing on an ordinary computer keyboard for fifteen minutes or so, you can figure out everything they typed. The idea is that different keys tend to make slightly different sounds, and although you don’t know in advance which keys make which sounds, you can use machine learning to figure that out, assuming that the person is mostly typing English text. (Presumably it would work for other languages too.)

Asonov and Agrawal had a similar result previously, but they had to assume (unrealistically) that you started out with a recording of the person typing a known training text on the target keyboard. The new method eliminates that requirement, and so appears to be viable in practice.

The algorithm works in three basic stages. First, it isolates the sound of each individual keystroke. Second, it takes all of the recorded keystrokes and puts them into about fifty categories, where the keystrokes within each category sound very similar. Third, it uses fancy machine learning methods to recover the sequence of characters typed, under the assumption that the sequence has the statistical characteristics of English text.

The third stage is the hardest one. You start out with the keystrokes put into categories, so that the sequence of keystrokes has been reduced a sequence of category-identifiers – something like this:

35, 12, 8, 14, 17, 35, 6, 44, …

(This means that the first keystroke is in category 35, the second is in category 12, and so on. Remember that keystrokes in the same category sound alike.) At this point you assume that each key on the keyboard usually (but not always) generates a particular category, but you don’t know which key generates which category. Sometimes two keys will tend to generate the same category, so that you can’t tell them apart except by context. And some keystrokes generate a category that doesn’t seem to match the character in the original text, because the key happened to sound different that time, or because the categorization algorithm isn’t perfect, or because the typist made a mistake and typed a garbbge charaacter.

The only advantage you have is that English text has persistent regularities. For example, the two-letter sequence “th” is much more common that “rq”, and the word “the” is much more common than “xprld”. This turns out to be enough for modern machine learning methods to do the job, despite the difficulties I described in the previous paragraph. The recovered text gets about 95% of the characters right, and about 90% of the words. It’s quite readable.

[Exercise for geeky readers: Assume that there is a one-to-one mapping between characters and categories, and that each character in the (unknown) input text is translated infallibly into the corresponding category. Assume also that the input is typical English text. Given the output category-sequence, how would you recover the input text? About how long would the input have to be to make this feasible?]

If the user typed a password, that can be recovered too. Although passwords don’t have the same statistical properties as ordinary text (unless they’re chosen badly), this doesn’t pose a problem as long as the password-typing is accompanied by enough English-typing. The algorithm doesn’t always recover the exact password, but it can come up with a short list of possible passwords, and the real password is almost always on this list.

This is yet another reminder of how much computer security depends on controlling physical access to the computer. We’ve always known that anybody who can open up a computer and work on it with tools can control what it does. Results like this new one show that getting close to a machine with sensors (such as microphones, cameras, power monitors) may compromise the machine’s secrecy.

There are even some preliminary results showing that computers make slightly different noises depending on what computations they are doing, and that it might be possible to recover encryption keys if you have an audio recording of the computer doing decryption operations.

I think I’ll go shut my office door now.