January 20, 2025

U.S. Computer Science Malaise

There’s a debate going on now among U.S. computer science researchers and educators, about whether the U.S. as a nation is serious about maintaining its lead in computer science. We have been the envy of the world, drawing most of the worlds’ best and brightest in the field to our country, and laying the foundations of a huge industry that has fostered wealth and national power. But there is a growing sense within the field that all of this may be changing. This sense of malaise is a common topic around faculty water coolers across the country, and in speeches by industry figures like Bill Gates and Vint Cerf.

Whatever the cause – and more on that below – there two main symptoms. First is a sharp decrease in funding for computer science research, especially in strategic areas such as cybersecurity. For example, DARPA, the Defense Department research agency that funded the early Internet and other breakthroughs, has cut its support for university computer science research by more than 40% in the last three years, and has redirected the remaining funding toward short-term advanced development efforts. Corporate research is not picking up the slack.

The second symptom, which in my view is more worrisome, is the sharp decrease in the number of students majoring in computer science. One reputable survey found a 60% drop in the last four years. One would have expected a drop after the dotcom crash – computer science enrollments have historically tracked industry business cycles – but this is a big drop! (At Princeton, we’ve been working hard to make our program more compelling, so we have seen a much smaller decrease.)

All this despite fundamentals that seem sound. Our research ideas seem as strong as ever (though research is inherently a hit-and-miss affair), and the job market for our graduates is still very strong, though not as overheated as a few years ago. Our curricula aren’t perfect but are better than ever. So what’s the problem?

The consensus seems to be that computer science has gotten a bad rap as a haven for antisocial, twinkie-fed nerds who spend their nights alone in cubicles wordlessly writing code, and their days snoring and drooling on office couches. Who would want to be one of them? Those of us in the field know that this stereotype is silly; that computer scientists do many things beyond coding; that we work in groups and like to have fun; and that nowadays computer science plays a role in almost every field of human endeavor.

Proposed remedies abound, most of them attempts to show people who computer scientists really are and what we really do. Stereotypes take a long time to overcome, but there’s no better time than the present to get started.

UPDATE (July 28): My colleagues Sanjeev Arora and Bernard Chazelle have a thoughtful essay on this issue in the August issue of Communications of the ACM.

Privacy, Price Discrimination, and Identification

Recently it was reported that Disney World is fingerprinting its customers. This raised obvious privacy concerns. People wondered why Disney would need that information, and what they were going to do with it.

As Eric Rescorla noted, the answer is almost surely price discrimination. Disney sells multi-day tickets at a discount. They don’t want people to buy (say) a ten-day ticket, use it for two days, and then resell the ticket to somebody else. Disney makes about $200 more by selling five separate two-day tickets than by selling a single ten-day ticket. To stop this, they fingerprint the users of such tickets and verify that the fingerprint associated with a ticket doesn’t change from day to day.

Price discrimination often leads to privacy worries, because some price discrimination strategies rely on the ability to identify individual customers so the seller knows what price to charge them. Such privacy worries seem to be intensifying as technology advances, since it is becoming easier to keep records about individual customers, easier to get information about customers from outside sources, and easier to design and manage complex price discrimination strategies.

On the other hand, some forms of price discrimination don’t depend on identifying customers. For example, early-bird discounts at restaurants cause customers to self-select into categories based on willingness to pay (those willing to come at an inconvenient time to get a lower price vs. those not willing) without needing to identify individuals.

Disney’s type of price discrimination falls into a middle ground. They don’t need to know who you are; all they need to know is that you are the same person who used the ticket yesterday. I think it’s possible to build a fingerprint-based system that stores just enough information to verify that a newly-presented fingerprint is the same one seen before, but without storing the fingerprint itself or even information useful in reconstructing or forging it. That would let Disney get what it needs to prevent ticket resale, without compromising customers’ fingerprints.

If this is possible, why isn’t Disney doing it? I can only guess, but I can think of two reasons. First, in designing identity-based systems, people seem to gravitate to designs that try to extract a “true identity”, despite the fact that this is more privacy-compromising and is often unnecessary. Second, if Disney sees customer privacy mainly as a public-relations issue, then they don’t have much incentive to design a more privacy-protective system, when ordinary customers can’t easily tell the difference.

Researchers have been saying for years that identification technologies can be designed cleverly to minimize unneeded information flows; but this suggestion hasn’t had much effect. Perhaps bad publicity over information leaks will cause companies to be more careful.

Thee and Ay

It’s not often that you learn something about yourself from a stranger’s blog. But that’s what happened to me on Friday. I was sifting through a list of new links to this blog (thanks to Technorati), and I found an entry on a blog called Serendipity, about the way I pronounce the word “the”. It turns out that my pronunciation of “the” is inconsistent, in an interesting way. In fact, in a single eight-minute public talk, I pronounce “the” in four different ways.

(Could there possibly be a less enticing premise for a blog entry than how the blog’s author pronounces the word “the”? Well, I think the details turn out to be interesting. And it’s my blog.)

Here’s the background. The article “the” in English is pronounced in two different ways, unreduced (“thee”), and reduced (“thuh”). The standard is to use the unreduced form when the next word starts with a vowel sound (“thee elephant”), and the reduced form when the next word starts with a consonant sound (“thuh dog”).

After Mark Liberman discussed this on the Language Log, readers pointed out that George W. Bush sometimes pronounces ‘a’ as the unreduced “ay” before a consonant. Bush did this a few times in his speech nominating John Roberts to the Supreme Court. Roberts also used one “thee” and one “ay” before consonants in the ensuing Q&A session.

Then Chris Waigl remembered, somehow, that she had heard me do something similar in a recorded talk. So she dug up an eight-minute recording of me speaking at the 2002 Berkeley DRM conference, and analyzed each use of “a” and “the”. She even color-coded the transcript.

It turns out that I pronounced “the” before a consonant four different ways. Sometimes I used “thee”, sometimes I used “thuh”, sometimes I used “thee” and corrected myself to “thuh”, and sometimes I used “thuh” and corrected myself to “thee”.

Why do I do this? I have no idea. I have been listening to myself ever since I read this, and I do indeed mix reduced and unreduced “the” and “a” before consonants. I haven’t caught myself correcting one to the other, but then again I probably wouldn’t notice if I did.

And now I’m listening to every speaker I hear, to see whether they do it too. Do you?

Harry Potter and the Half-Baked Plan

Despite J.K. Rowling’s decision not to offer the new Harry Potter book in e-book format, it took less than a day for fans to scan the book and assemble an unauthorized electronic version, which is reportedly circulating on the Internet.

If Rowling thought that her decision against e-book release would prevent infringement, then she needs to learn more about Muggle technology. (It’s not certain that her e-book decision was driven by infringement worries. Kids’ books apparently sell much worse as e-books than comparable adult books do, so she might have thought there would be insufficient demand for the e-book. But really – insufficient demand for Harry Potter this week? Not likely.)

It’s a common mistake to think that digital distribution leads to infringement, so that one can prevent infringement by sticking with analog distribution. Hollywood made this argument in the broadcast flag proceeding, saying that the switch to digital broadcasting of television would make the infringement problem so much worse – and the FCC even bought it.

As Harry Potter teaches us, what enables online infringement is not digital release of the work, but digital redistribution by users. And a work can be redistributed digitally, regardless of whether it was originally released in digital or analog form. Analog books can be scanned digitally; analog audio can be recorded digitally; analog video can be camcorded digitally. The resulting digital copies can be redistributed.

(This phenomenon is sometimes called the “analog hole”, but that term is misleading because the copyability of analog information is not an exception to the normal rule but a continuation of it. Objects made of copper are subject to gravity, but we don’t call that fact the “copper hole”. We just call it gravity, and we know that all objects are subject to it. Similarly, analog information is subject to digital copying because all information is subject to digital copying.)

If anything, releasing a work a work in digital form will reduce online infringement, by giving people who want a digital copy a way to pay for it. Having analog and digital versions that offer different value propositions to customers also enables tricky pricing strategies that can capture more revenue. Copyright owners can lead the digital parade or sit on the sidelines and watch it go by; but one way or another, there is going to be a parade.

Who'll Stop the Spam-Bots?

The FTC has initiated Operation Spam Zombies, a program that asks ISPs to work harder to detect and isolate spam-bots on their customers’ computers. Randy Picker has a good discussion of this.

A bot is a malicious, long-lived software agent that sits on a computer and carries out commands at the behest of a remote badguy. (Bots are sometimes called zombies. This makes for more colorful headlines, but the cognoscenti prefer “bot”.) Bots are surprisingly common; perhaps 1% of computers on the Internet are infected by bots.

Like any successful parasite, a bot tries to limit its impact on its host. A bot that uses too many resources, or that too obviously destabilizes its host system, is more likely to be detected and eradicated by the user. So a clever bot tries to be unobtrusive.

One of the main uses of bots is for sending spam. Bot-initiated spam comes from ordinary users’ machines, with only a modest volume coming from each machine; so it is difficult to stop. Nowadays the majority of spam probably comes from bots.

Spam-bots exhibit the classic economic externality of Internet security. A bot on your machine doesn’t bother you much. It mostly harms other people, most of whom you don’t know; so you lack a sufficient incentive to find and remove bots on your system.

What the FTC hopes is that ISPs will be willing to do what users aren’t. The FTC is urging ISPs to monitor their networks for telltale spam-bot activity, and then to take action, up to and including quarantining infected machines (i.e., cutting off or reducing their network connectivity).

It would be good if ISPs did more about the spam-bot problem. But unfortunately, the same externality applies to ISPs as to users. If an ISP’s customer hosts a spam-bot, most the spam sent by the bot goes to other ISPs, so the harm from that spam-bot falls mostly on others. ISPs will have an insufficient incentive to fight bots, just as users do.

A really clever spam-bot could make this externality worse, by making sure not to direct any spam to the local ISP. That would reduce the local ISP’s incentive to stop the bot to almost zero. Indeed, it would give the ISP a disincentive to remove the bot, since removing the bot would lower costs for the ISP’s competitors, leading to tougher price competition and lower profits for the ISP.

That said, there is some hope for ISP-based steps against bot-spam. There aren’t too many big ISPs, so they may be able to agree to take steps against bot-spam. And voluntary steps may help to stave off unpleasant government regulation, which is also in the interest of the big ISPs.

There are interesting technical issues here too. If ISPs start monitoring aggressively for bots, the bots will get stealthier, kicking off an interesting arms race. But that’s a topic for another day.