January 22, 2022

Archives for July 2005

Thee and Ay

It’s not often that you learn something about yourself from a stranger’s blog. But that’s what happened to me on Friday. I was sifting through a list of new links to this blog (thanks to Technorati), and I found an entry on a blog called Serendipity, about the way I pronounce the word “the”. It turns out that my pronunciation of “the” is inconsistent, in an interesting way. In fact, in a single eight-minute public talk, I pronounce “the” in four different ways.

(Could there possibly be a less enticing premise for a blog entry than how the blog’s author pronounces the word “the”? Well, I think the details turn out to be interesting. And it’s my blog.)

Here’s the background. The article “the” in English is pronounced in two different ways, unreduced (“thee”), and reduced (“thuh”). The standard is to use the unreduced form when the next word starts with a vowel sound (“thee elephant”), and the reduced form when the next word starts with a consonant sound (“thuh dog”).

After Mark Liberman discussed this on the Language Log, readers pointed out that George W. Bush sometimes pronounces ‘a’ as the unreduced “ay” before a consonant. Bush did this a few times in his speech nominating John Roberts to the Supreme Court. Roberts also used one “thee” and one “ay” before consonants in the ensuing Q&A session.

Then Chris Waigl remembered, somehow, that she had heard me do something similar in a recorded talk. So she dug up an eight-minute recording of me speaking at the 2002 Berkeley DRM conference, and analyzed each use of “a” and “the”. She even color-coded the transcript.

It turns out that I pronounced “the” before a consonant four different ways. Sometimes I used “thee”, sometimes I used “thuh”, sometimes I used “thee” and corrected myself to “thuh”, and sometimes I used “thuh” and corrected myself to “thee”.

Why do I do this? I have no idea. I have been listening to myself ever since I read this, and I do indeed mix reduced and unreduced “the” and “a” before consonants. I haven’t caught myself correcting one to the other, but then again I probably wouldn’t notice if I did.

And now I’m listening to every speaker I hear, to see whether they do it too. Do you?

Harry Potter and the Half-Baked Plan

Despite J.K. Rowling’s decision not to offer the new Harry Potter book in e-book format, it took less than a day for fans to scan the book and assemble an unauthorized electronic version, which is reportedly circulating on the Internet.

If Rowling thought that her decision against e-book release would prevent infringement, then she needs to learn more about Muggle technology. (It’s not certain that her e-book decision was driven by infringement worries. Kids’ books apparently sell much worse as e-books than comparable adult books do, so she might have thought there would be insufficient demand for the e-book. But really – insufficient demand for Harry Potter this week? Not likely.)

It’s a common mistake to think that digital distribution leads to infringement, so that one can prevent infringement by sticking with analog distribution. Hollywood made this argument in the broadcast flag proceeding, saying that the switch to digital broadcasting of television would make the infringement problem so much worse – and the FCC even bought it.

As Harry Potter teaches us, what enables online infringement is not digital release of the work, but digital redistribution by users. And a work can be redistributed digitally, regardless of whether it was originally released in digital or analog form. Analog books can be scanned digitally; analog audio can be recorded digitally; analog video can be camcorded digitally. The resulting digital copies can be redistributed.

(This phenomenon is sometimes called the “analog hole”, but that term is misleading because the copyability of analog information is not an exception to the normal rule but a continuation of it. Objects made of copper are subject to gravity, but we don’t call that fact the “copper hole”. We just call it gravity, and we know that all objects are subject to it. Similarly, analog information is subject to digital copying because all information is subject to digital copying.)

If anything, releasing a work a work in digital form will reduce online infringement, by giving people who want a digital copy a way to pay for it. Having analog and digital versions that offer different value propositions to customers also enables tricky pricing strategies that can capture more revenue. Copyright owners can lead the digital parade or sit on the sidelines and watch it go by; but one way or another, there is going to be a parade.

Who'll Stop the Spam-Bots?

The FTC has initiated Operation Spam Zombies, a program that asks ISPs to work harder to detect and isolate spam-bots on their customers’ computers. Randy Picker has a good discussion of this.

A bot is a malicious, long-lived software agent that sits on a computer and carries out commands at the behest of a remote badguy. (Bots are sometimes called zombies. This makes for more colorful headlines, but the cognoscenti prefer “bot”.) Bots are surprisingly common; perhaps 1% of computers on the Internet are infected by bots.

Like any successful parasite, a bot tries to limit its impact on its host. A bot that uses too many resources, or that too obviously destabilizes its host system, is more likely to be detected and eradicated by the user. So a clever bot tries to be unobtrusive.

One of the main uses of bots is for sending spam. Bot-initiated spam comes from ordinary users’ machines, with only a modest volume coming from each machine; so it is difficult to stop. Nowadays the majority of spam probably comes from bots.

Spam-bots exhibit the classic economic externality of Internet security. A bot on your machine doesn’t bother you much. It mostly harms other people, most of whom you don’t know; so you lack a sufficient incentive to find and remove bots on your system.

What the FTC hopes is that ISPs will be willing to do what users aren’t. The FTC is urging ISPs to monitor their networks for telltale spam-bot activity, and then to take action, up to and including quarantining infected machines (i.e., cutting off or reducing their network connectivity).

It would be good if ISPs did more about the spam-bot problem. But unfortunately, the same externality applies to ISPs as to users. If an ISP’s customer hosts a spam-bot, most the spam sent by the bot goes to other ISPs, so the harm from that spam-bot falls mostly on others. ISPs will have an insufficient incentive to fight bots, just as users do.

A really clever spam-bot could make this externality worse, by making sure not to direct any spam to the local ISP. That would reduce the local ISP’s incentive to stop the bot to almost zero. Indeed, it would give the ISP a disincentive to remove the bot, since removing the bot would lower costs for the ISP’s competitors, leading to tougher price competition and lower profits for the ISP.

That said, there is some hope for ISP-based steps against bot-spam. There aren’t too many big ISPs, so they may be able to agree to take steps against bot-spam. And voluntary steps may help to stave off unpleasant government regulation, which is also in the interest of the big ISPs.

There are interesting technical issues here too. If ISPs start monitoring aggressively for bots, the bots will get stealthier, kicking off an interesting arms race. But that’s a topic for another day.