December 23, 2024

Archives for 2006

How I Spent My Summer Vacation

Ah, summer, when a man’s thoughts turn to … ski jumping?

On Sunday I had the chance to try ski jumping, at the Swiss national team’s training center at Einsiedeln. My companions and I – or at least the ones foolish enough to try, which of course included me – donned thick neoprene bodysuits, gloves, helmets, and ski boots. We clumped up a short path then strapped on our skis and set off down the jumping track. Having skied only twice before – and that twenty-five years ago – I found this a real adventure, about which more below.

Ski jumping, it turns out, is a popular summer sport, probably more popular in the summer than the winter. I can understand why the spectators would prefer summer weather, but the jumpers’ gear felt better suited for winter than for the 25-degree Centigrade (80-degree Fahrenheit) day we had. The bodysuits are so hot, in fact, that every male jumper, whether expert or rookie, unzipped his suit and peeled it back to the waist when not jumping, exposing a tanned muscular torso or a flabby white one.

Skiing in summer requires a special surface. On the jumping ramp, the skis ride in two parallel tracks made of a ceramic material speckled with pea-sized bumps sticking upward so that the skis touch only the bumps. The landing area is thatched with countless thin, green plastic sticks about 20 cm (eight inches) long, anchored on one end with the other end pointing downhill. These are hosed down with water to reduce friction. At the bottom of the hill is a stopping area covered with wood chips.

After our own attempts at jumping (be patient; I’ll tell all below), we watched some expert Swiss jumpers practicing, from several vantage points. (video (8Meg AVI); another YouTube video) From the top where the jumpers start, the ramp seems impossibly steep and stretches far below. It must take considerable courage to look down the ramp and then let oneself go. As the coach told us, once you start, there’s no turning back – you’re going to end up at the bottom of the hill one way or another. You have to be fearless. Unsurprisingly, the jumpers we saw all looked like seventeen-year-old young men.

Though the jumpers go fast, they never get very high above the ground. At the jump point, the ramp is 1.5 meters (five feet) above the ground. Below the jump point, the hillside curves down parabolically, parallel to the flight path that a jumper would follow in the absence of air resistance. So a jumper who doesn’t spring upward is never more than 1.5 meters off the ground. A skilled jumper might reach three meters (ten feet) above the ground. But then again he’ll be going very fast and will fly more than 100 meters (330 feet) down the hill.

The best view is from the takeoff point. You see the jumper start, far above. As he accelerates down the ramp toward you, his body locked in a compact crouch, you hear a ferocious clattering as his skis drum over the bumps in the ceramic tracks. Suddenly the ramp vanishes beneath him as he springs upward. At that moment he flashes past you, and the clattering is replaced all at once with an insistent whooshing, hissing sound as the jumper floats down the hill, his ski tips spread, his body leaning forward like a more aerodynamic Superman. In a second he vanishes behind the downward curve of the hill, and you wait for the faint but firm slapping noise of a safe landing. He returns to view at the base of the hill as he coasts to a stop on a flat grassy area. Nonchalantly, he removes his helmet, gloves, and skis, and half-unzips his body suit, as the next jumper begins.

The jump used by our group of rookies was maybe one-tenth the size of the big jump. Still, it looked distressingly long and steep from the top. On our first trip down the hill, we started just below the jump point, to practice skiing down the landing ramp. The trick is to maintain your balance on the strange green-stick surface as the hill curves sharply downward and then levels out again. Without ski poles, and with your ankles held rigid by the ski boots, you have little leverage to shift your weight forward or backward. Side to side balance is easy, but that’s not the problem.

As I awaited my first run, one of my companions jokingly (in German) asked the onsite medical expert how far it was to the hospital. Not to worry, she answered, the hospital was practically at the bottom of the hill. With that reassurance, I levered my skis out onto the top of the landing ramp and faced down the hill, as the coach held me motionless.

And then I was moving jerkily downhill as my skis fought the initial friction of the green sticks. I recovered my balance and picked up speed as the hill reached what seemed like a 45-degree angle. For a moment I tried to remember whether skiing on snow felt like this, but the thought was thrust aside by a rush of adrenaline and the sense that I was starting to lose my balance. As I reached the bottom of the hill any illusion of balance was gone, and I tumbled to a stop on the wood chips. I lay on my side, my skis angled awkwardly behind me, still attached to my feet. But I was unhurt and decided to try again.

On my second run, things went awry almost immediately. As I started I sensed that I was leaning ever so slightly backward. The downward curve of the hill started to tilt me backward even more. I struggled to get my center of gravity over my feet but it was hopeless. I ended up laying myself down on one hip and sliding comfortably down the last part of the hill. I was none the worse for wear, but my bodysuit was drenched from sliding over recently-watered green sticks.

I felt fine but decided not to try again. Two falls is enough for a guy of my age. I cheered on my companions as some of them managed to make short jumps and reach the bottom of the hill gracefully. Then we headed off for a delightful lunch and a tour of the Einsiedeln monastery and a stunningly restored church crowded, for some reason, with thousands of visiting Sri Lankans.

Having learned firsthand how hard ski jumping is, I was even more amazed to see the skilled jumpers casually launch themselves off the big jump. Next Winter Olympics, ski jumping will be a must-watch.

(My companions are invited to identify themselves and/or tell their own stories, in the comments.)

Syndromic Surveillance: 21st Century Data Harvesting

[This article was written by a pseudonymous reader who calls him/herself Enigma Foundry. I’m publishing it here because I think other readers would find it interesting. – Ed Felten]

The recent posts about 21st Century Wiretapping described a government program which captured, stored, filtered and analyzed large quantities of information, information which the government had not previously had access to without special court permission. On reading these posts, it had struck me that there were other government programs that are in the process of being implemented that will also capture, store, filter and analyze large quantities of information that had not been previously available to governmental authorities.

In contrast to the NSA wiretap program described in previous posts, the program I am going to describe has not yet generated any significant amount of public controversy, although its development has taken place in nearly full public view for the past decade. Also, unlike the NSA program, this program is still hypothetical, although a pilot project is underway.

The systems that have been used to detect disease outbreaks to date primarily rely on the recognition and reporting of health statistics that fit recognized disease patterns. (See, e.g., the summary for the CDC’s Morbidity and Mortality weekly Report.) These disease surveillance systems works well enough for outbreaks of recognized and ‘reportable’ diseases which, by virtue of having a long clinically described history, have distinct and well-known symptoms and, in almost all cases, definitive tests exist for their diagnosis. But what if an emerging infectious disease or a bio-terrorist attack used an agent that did not fit a recognized pattern, and therefore there existed no well-defined set of symptoms, let alone a clinically meaningful test for identifying it?

If the initial symptoms are severe enough, as in the case of S.A.R.S., the disease will quickly come to light. (Although it is important to note that that did not happen in China, where the press was tightly controlled) If the initial symptoms are not severe, however, the recognition that an attack has even occurred may be delayed many months (or using certain types of agents, conceivably even years) after the event had occurred. To give Health Authorities the ability to see events that are outside the set of diseases that are required to be reported, the creation of a large database, which would collate information such as: workplace and school absenteeism, prescription and OTC (over the counter) medicine sales, symptoms reported at schools, numbers of doctor and Emergency Department visits, even weather patterns and veterinary conditions reported could serve a very useful function in identifying a disease outbreak, and bringing it to the attention of Public Health Authorities. Such a data monitoring system has been given the name ‘Syndromic Surveillance,’ to separate it from the traditional ‘Disease Surveillance’ programs.

You don’t need to invoke the specter of bioterrorism to make a strong case for the value of such a system. The example frequently cited is a 1993 outbreak in Milwaukee of cryptosporidium (an intestinal parasite) which eventually affected over 400,000 people. In that case, sales of anti-diarrhea medicines spiked some three weeks before officials became aware of the outbreak. If the sales of OTC medications had been monitored, perhaps officials could have been alerted to the outbreak earlier.

Note that this system, as currently proposed does not necessarily create or require records that can be tied to particular individuals, although certain data about each individual such as place of work and residence, occupation, recent travel are all of interest. The data would probably tie individual reports to census tract, or perhaps census block. So the concerns about individual privacy being violated seem to be less then in the case of the NSA data mining of telephone records, since the information is not tied to an individual and the type of information is very different from that harvested by the NSA program.

There are three interesting problems created by the database used by a Syndromic Surveillance system: (1) The problem of False Positives, (2) Issues relating to access to and control of the data base & (3) What to do if the Syndromic Surveillance system actually works.

First with regard to the false positives, even a very minor rate error rate can lead to many false alarms, and the consequences of a false alarm are much greater than in the case of the NSA data filtering program:

For instance, thousands of syndromic surveillance systems soon will be running simultaneously in cities and counties throughout the United States. Each might analyze data from 10 or more data series—symptom categories, separate hospitals, OTC sales, and so on. Imagine if every county in the United States had in place a single syndromic surveillance system with a 0.1 percent false-positive rate; that is, the alarm goes off inappropriately only once in a thousand days. Because there are about 3,000 counties in the United States, on average three counties a day would have a false-positive alarm. The costs of excessive false alarms are both monetary, in terms of resources needed to respond to phantom events, and operational, because too many false events desensitize responders to real events….

There are obviously many issues relating to public policy regarding to access and dissemination of information generated by such a public health database, but there are two particular items providing contradictory information which I’d like to present, and hear your reactions and thoughts:

Livingston, NJ -When news of former President Bill Clinton’s experience with chest pains and his impending cardiac bypass surgery hit the streets, hospital emergency departments and urgent care centers in the Northeast reportedly had an increase in cardiac patients. Referred to as “the Bill Clinton Effect,” the talked-about increase in cardiac patients seeking care has now been substantiated by Emergency Medical Associates’ (EMA) bio-surveillance system.

Reports of Clinton’s health woes were first reported on September 3rd, with newspaper accounts appearing nationally in September 4th editions. On September 6th, EMA’s bio-surveillance noted an 11% increase in emergency department visits with patients complaining of chest pain (over the historical average for that date), followed by a 76% increase in chest pain visits on September 7th, and a 53% increase in chest pain visits on September 8th.

The second story has to do with my own personal experience and observation of the Public Health authorities’ actions in Warsaw immediately following the Chernobyl accident. In Warsaw, the authorities had prepared for the event, and children were immediately given iodine to prevent the uptake of radioactive iodine. This has been widely credited with preventing many deaths due to cancer. In Warsaw, the Public Health Authorities also very promptly informed the public about the level of ambient radiation. Certainly, there was great concern among the populace but panic was largely averted. My empirical evidence is of course limited, but my gut feeling is that much dislocation was averted by (1) the obvious signs of organized preparation for such an event, and (2) the transparency with which data concerning public health were disseminated.

Links:
article summarizing ‘Syndromic Surveillance’
CDC article
epi-x, CDC’s epidemic monitoring program

The Last Mile Bottleneck and Net Neutrality

When thinking about the performance of any computer system or network, the first question to ask is “Where is the bottleneck?” As demand grows, one part of the system reaches its capacity first, and limits performance. That’s the bottleneck. If you want to improve performance, often the only real options are to use the bottleneck more efficiently or to increase the bottleneck’s capacity. Fiddling around with the rest of the system won’t make much difference.

For a typical home broadband user, the bottleneck for Internet access today is the “last mile” wire or fiber connecting their home to their Internet Service Provider’s (ISP’s) network. This is true today, and I’m going to assume from here on that it will continue to be true in the future. I should admit up front that this assumption could turn out to be wrong – but if it’s right, it has interesting implications for the network neutrality debate.

Two of the arguments against net neutrality regulation are that (a) ISPs need to manage their networks to optimize performance, and (b) ISPs need to monetize their networks in every way possible so they can get enough revenue to upgrade the last mile connections. Let’s consider how the last mile bottleneck affects each of these arguments.

The first argument says that customers can get better performance if ISPs (and not just customers) have more freedom to manage their networks. If the last mile is the bottleneck, then the most important management question is which packets get to use the last mile link. But this is something that each customer can feasibly manage. What the customer sends is, of course, under the customer’s control – and software on the customer’s computer or in the customer’s router can prioritize outgoing traffic in whatever way best serves that customer. Although it’s less obvious to nonexperts, the customer’s equipment can also control how the link is allocated among incoming data flows. (For network geeks: the customer’s equipment can control the TCP window size on connections that have incoming data.) And of course the customer knows better than the ISP which packets can best serve the customer’s needs.

Another way to look at this is that every customer has their own last mile link, and if that link is not shared then different customers’ links can be optimized separately. The kind of global optimization that only an ISP can do – and that might be required to ensure fairness among customers – just won’t matter much if the last mile is the bottleneck. No matter which way you look at it, there isn’t much ISPs can do to optimize performance, so we should be skeptical of ISPs’ claims that their network management will make a big difference for users. (All of this assumes, remember, that the last mile will continue to be the bottleneck.)

The second argument against net neutrality regulation is that ISPs need to be able to charge everybody fees for everything, so there is maximum incentive for ISPs to build their next-generation networks. If the last mile is the bottleneck, then building new last-mile infrastructure is one of the most important steps that can be taken to improve the Net, and so paying off the ISPs to build that infrastructure might seem like a good deal. Giving them monopoly rents could be good policy, if that’s what it takes to get a faster Net built – or so the argument goes.

It seems to me, though, that if we accept this last argument then we have decided that the residential ISP business is naturally not very competitive. (Otherwise competition will erode those monopoly rents.) And if the market is not going to be competitive, then our policy discussion will have to go beyond the simple “let the market decide” arguments that we hear from some quarters. Naturally noncompetitive communications markets have long posed difficult policy questions, and this one looks like no exception. We can only hope that we have learned from the regulatory mistakes of the past.

Lets hope that the residential ISP business turns out instead to be competitive. If technologies like WiMax or powerline networking turn out to be practical, this could happen. A competitive market is the best outcome for everybody, letting the government safely keeps its hands off the Internet, if it can.

The Exxon Valdez of Privacy

Recently I moderated a panel discussion, at Princeton Reunions, about “Privacy and Security in the Digital Age”. When the discussion turned to public awareness of privacy and data leaks, one of the panelists said that the public knows about this issue but isn’t really mobilized, because we haven’t yet seen “the Exxon Valdez of privacy” – the singular, dramatic event that turns a known area of concern into a national priority.

Scott Craver has an interesting response:

An audience member asked what could possibly comprise such a monumental disaster. One panelist said, “Have you ever been a victim of credit card fraud? Well, multiply that by 500,000 people.”

This is very corporate thinking: take a loss and multiply it by a huge number. Sure that’s a nightmare scenario for a bank, but is that really a national crisis that will enrage the public? Especially since cardholders are somewhat sheltered from fraud. Also consider how many people are already victims of identity theft, and how much money it already costs. I don’t see any torches and pitchforks yet.

Here’s what I think: the “Exxon Valdez” of privacy won’t be $100 of credit card fraud multiplied by a half million people. It will instead be the worst possible privacy disruption that can befall a single individual, and it doesn’t have to happen to a half million people, or even ten thousand. The number doesn’t matter, as long as it’s big enough to be reported on CNN …

[…]

So back to the question: what is the worst, the most sensational privacy disaster that can befall an individual – that in a batch of, oh say 500-5,000 people, will terrify the general public? I’m not thinking of a disaster that is tangentially aided by a privacy loss, like a killer reading my credit card statement to find out what cafe I hang out at. I’m talking about a direct abuse of the private information being the disaster itself.

What would be the Exxon Valdez of privacy? I’m not sure. I don’t think it will just be a loss of money – Scott explained why it won’t be many small losses, and it’s hard to imagine a large loss where the privacy harm doesn’t seem incidental. So it will have to be a leak of information so sensitive as to be life-shattering. I’m not sure exactly what that is.

What do you think?

Twenty-First Century Wiretapping: False Positives

Lately I’ve been writing about the policy issues surrounding government wiretapping programs that algorithmically analyze large amounts of communication data to identify messages to be shown to human analysts. (Past posts in the series: 1; 2; 3; 4; 5; 6; 7.) One of the most frequent arguments against such programs is that there will be too many false positives – too many innocent conversations misidentified as suspicious.

Suppose we have an algorithm that looks at a set of intercepted messages and classifies each message as either suspicious or innocuous. Let’s assume that every message has a true state that is either criminal (i.e., actually part of a criminal or terrorist conspiracy) or innocent. The problem is that the true state is not known. A perfect, but unattainable, classifier would label a message as suspicious if and only if it was criminal. In practice a classifier will make false positive errors (mistakenly classifying an innocent message as suspicious) and false negative errors (mistakenly classifying a criminal message as innocuous).

To illustrate the false positive problem, let’s do an example. Suppose we intercept a million messages, of which ten are criminal. And suppose that the classifier correctly labels 99.9% of the innocent messages. This means that 1000 innocent messages (0.1% of one million) will be misclassified as suspicious. All told, there will be 1010 suspicious messages, of which only ten – about 1% – will actually be criminal. The vast majority of messages labeled as suspicious will actually be innocent. And if the classifier is less accurate on innocent messages, the imbalance will be even more extreme.

This argument has some power, but I don’t think it’s fatal to the idea of algorithmically classifying intercepts. I say this for three reasons.

First, even if the majority of labeled-as-suspicous messages are innocent, this doesn’t necessarily mean that listening to those messages is unjustified. Letting the police listen to, say, ten innocent conversations is a good tradeoff if the eleventh conversation is a criminal one whose interception can stop a serious crime. (I’m assuming that the ten innocent conversations are chosen by some known, well-intentioned algorithmic process, rather than being chosen by potentially corrupt government agents.) This only goes so far, of course – if there are too many innocent conversations or the crime is not very serious, then this type of wiretapping will not be justified. My point is merely that it’s not enough to argue that most of the labeled-as-suspcious messages will be innocent.

Second, we can learn by experience what the false positive rate is. By monitoring the operation of the system, we can see learn how many messages are labeled as suspicious and how many of those are actually innocent. If there is a warrant for the wiretapping (as I have argued there should be), the warrant can require this sort of monitoring, and can require the wiretapping to be stopped or narrowed if the false positive rate is too high.

Third, classification algorithms have (or can be made to have) an adjustable sensitivity setting. Think of it as a control knob that can be moved continuously between two extremes, where one extreme is labeled “avoid false positives” and the other is labeled “avoid false negatives”. Adjusting the knob trades off one kind of error for the other.

We can always make the false positive rate as low as we like, by turning the knob far enough toward “avoid false positives”. Doing this has a price, because turning the knob in that direction also increases the number of false negatives, that is, it causes some criminal messages to be missed. If we turn the knob all the way to the “avoid false positives” end, then there will be no false positives at all, but there might be many false negatives. Indeed, we might find that when the knob is turned to that end, all messages, whether criminal or not, are classified as innocuous.

So the question is not whether we can reduce false positives – we know we can do that – but whether there is anywhere we can set the knob that gives us an acceptably low false positive rate yet still manages to flag some messages that are criminal.

Whether there is an acceptable setting depends on the details of the classification algorithm. If you forced me to guess, I’d say that for algorithms based on today’s voice recognition or speech transcription technology, there probably isn’t an acceptable setting – to catch any appreciable number of criminal conversations, we’d have to accept huge numbers of false positives. But I’m not certain of that result, and it could change as the algorithms get better.

The most important thing to say about this is that it’s an empirical question, which means that it’s possible to gather evidence to learn whether a particular algorithm offers an acceptable tradeoff. For example, if we had a candidate classification algorithm, we could run it on a large number of real-world messages and, without recording any of those messages, simply count how many messages the algorithm would have labeled as suspicious. If that number were huge, we would know we had a false positive problem. We could do this for different settings of the knob, to see where we had to get an acceptable false positive rate. Then we could apply the algorithm with that knob setting to a predetermined set of known-to-be-criminal messages, to see how many it flagged.

If governments are using algorithmic classifiers – and the U.S. government may be doing so – then they can do these types of experiments. Perhaps they have. It doesn’t seem too much to ask for them to report on their false positive rates.