May 30, 2024

Syndromic Surveillance: 21st Century Data Harvesting

[This article was written by a pseudonymous reader who calls him/herself Enigma Foundry. I’m publishing it here because I think other readers would find it interesting. – Ed Felten]

The recent posts about 21st Century Wiretapping described a government program which captured, stored, filtered and analyzed large quantities of information, information which the government had not previously had access to without special court permission. On reading these posts, it had struck me that there were other government programs that are in the process of being implemented that will also capture, store, filter and analyze large quantities of information that had not been previously available to governmental authorities.

In contrast to the NSA wiretap program described in previous posts, the program I am going to describe has not yet generated any significant amount of public controversy, although its development has taken place in nearly full public view for the past decade. Also, unlike the NSA program, this program is still hypothetical, although a pilot project is underway.

The systems that have been used to detect disease outbreaks to date primarily rely on the recognition and reporting of health statistics that fit recognized disease patterns. (See, e.g., the summary for the CDC’s Morbidity and Mortality weekly Report.) These disease surveillance systems works well enough for outbreaks of recognized and ‘reportable’ diseases which, by virtue of having a long clinically described history, have distinct and well-known symptoms and, in almost all cases, definitive tests exist for their diagnosis. But what if an emerging infectious disease or a bio-terrorist attack used an agent that did not fit a recognized pattern, and therefore there existed no well-defined set of symptoms, let alone a clinically meaningful test for identifying it?

If the initial symptoms are severe enough, as in the case of S.A.R.S., the disease will quickly come to light. (Although it is important to note that that did not happen in China, where the press was tightly controlled) If the initial symptoms are not severe, however, the recognition that an attack has even occurred may be delayed many months (or using certain types of agents, conceivably even years) after the event had occurred. To give Health Authorities the ability to see events that are outside the set of diseases that are required to be reported, the creation of a large database, which would collate information such as: workplace and school absenteeism, prescription and OTC (over the counter) medicine sales, symptoms reported at schools, numbers of doctor and Emergency Department visits, even weather patterns and veterinary conditions reported could serve a very useful function in identifying a disease outbreak, and bringing it to the attention of Public Health Authorities. Such a data monitoring system has been given the name ‘Syndromic Surveillance,’ to separate it from the traditional ‘Disease Surveillance’ programs.

You don’t need to invoke the specter of bioterrorism to make a strong case for the value of such a system. The example frequently cited is a 1993 outbreak in Milwaukee of cryptosporidium (an intestinal parasite) which eventually affected over 400,000 people. In that case, sales of anti-diarrhea medicines spiked some three weeks before officials became aware of the outbreak. If the sales of OTC medications had been monitored, perhaps officials could have been alerted to the outbreak earlier.

Note that this system, as currently proposed does not necessarily create or require records that can be tied to particular individuals, although certain data about each individual such as place of work and residence, occupation, recent travel are all of interest. The data would probably tie individual reports to census tract, or perhaps census block. So the concerns about individual privacy being violated seem to be less then in the case of the NSA data mining of telephone records, since the information is not tied to an individual and the type of information is very different from that harvested by the NSA program.

There are three interesting problems created by the database used by a Syndromic Surveillance system: (1) The problem of False Positives, (2) Issues relating to access to and control of the data base & (3) What to do if the Syndromic Surveillance system actually works.

First with regard to the false positives, even a very minor rate error rate can lead to many false alarms, and the consequences of a false alarm are much greater than in the case of the NSA data filtering program:

For instance, thousands of syndromic surveillance systems soon will be running simultaneously in cities and counties throughout the United States. Each might analyze data from 10 or more data series—symptom categories, separate hospitals, OTC sales, and so on. Imagine if every county in the United States had in place a single syndromic surveillance system with a 0.1 percent false-positive rate; that is, the alarm goes off inappropriately only once in a thousand days. Because there are about 3,000 counties in the United States, on average three counties a day would have a false-positive alarm. The costs of excessive false alarms are both monetary, in terms of resources needed to respond to phantom events, and operational, because too many false events desensitize responders to real events….

There are obviously many issues relating to public policy regarding to access and dissemination of information generated by such a public health database, but there are two particular items providing contradictory information which I’d like to present, and hear your reactions and thoughts:

Livingston, NJ -When news of former President Bill Clinton’s experience with chest pains and his impending cardiac bypass surgery hit the streets, hospital emergency departments and urgent care centers in the Northeast reportedly had an increase in cardiac patients. Referred to as “the Bill Clinton Effect,” the talked-about increase in cardiac patients seeking care has now been substantiated by Emergency Medical Associates’ (EMA) bio-surveillance system.

Reports of Clinton’s health woes were first reported on September 3rd, with newspaper accounts appearing nationally in September 4th editions. On September 6th, EMA’s bio-surveillance noted an 11% increase in emergency department visits with patients complaining of chest pain (over the historical average for that date), followed by a 76% increase in chest pain visits on September 7th, and a 53% increase in chest pain visits on September 8th.

The second story has to do with my own personal experience and observation of the Public Health authorities’ actions in Warsaw immediately following the Chernobyl accident. In Warsaw, the authorities had prepared for the event, and children were immediately given iodine to prevent the uptake of radioactive iodine. This has been widely credited with preventing many deaths due to cancer. In Warsaw, the Public Health Authorities also very promptly informed the public about the level of ambient radiation. Certainly, there was great concern among the populace but panic was largely averted. My empirical evidence is of course limited, but my gut feeling is that much dislocation was averted by (1) the obvious signs of organized preparation for such an event, and (2) the transparency with which data concerning public health were disseminated.

article summarizing ‘Syndromic Surveillance’
CDC article
epi-x, CDC’s epidemic monitoring program


  1. enigma_foundry says

    Edward Kuns noted above:

    Assuming the CDC program you are talking about actually achieves its primary goal (which is a possiblity), how much risk to individual liberty and loss of privacy would we willingly accept as a society? Let’s say that once every 50 years a disease event occurs with the risk of harming millions of people, and that once every five years a disease event occurs in some locality with the risk of harming tens or small hundreds of thousands.

    Well all indications are that there will be many many more emerging infectious diseases than the numbers Edward Kuns gives. This is a very unfortunate fact. A paper I’d read recently indicated that the historical rate of zoonotic disease being transfered to humans was about one new disease every century. For a variety of factors (chiefly more humans, living in closer proximity to animals and moving around quickly) it appears that since about 1980, one disease has been transfered to humans from animals each year, a hundred-fold increase. So putting together a cohesive disease monitoring system should be a priority. It isn’t right now.

    Also, I am very disturbed that the public health community seems to be in the process of being removed from the management of these systems, and is being replaced by the national security community, to wit:

    Privacy advocates are worried that the CIA has invested in a company commonly used to help manage health IT records in the United States and Canada.
    In-Q-Tel, a private venture group established by the Central Intelligence Agency, led a Series E investment round in Initiate Systems earlier this year. The group is charged with providing solutions to the CIA and the greater intelligence community.

    Initiate Systems’ IdentityHub software uses a variety of identification protocols to determine whether records stored under similar names in different databases refer to the same or different patients. It also uses such demographic information as birthdays and address to match records to people who have used different names.

    The software helps companies find stored information about clients or patients in real time, and it also helps to identify and delete duplicate records. It has also been used to quickly find prescription information when patients enter the emergency department.


    On the one hand, we should be happy that the importance of public health is being recognized. On the other hand we should be very concerned that public health authorities are being removed from the management of these systems.

    As someone noted above We are watching for a reason. which I agree with, wholeheartedly. However, the National Security apparatus does not have a history of managing disease outbreaks, and in countries with strong national security network, the disease outbreak management has been abysmal (think China and SARS) I want CDC in charge of a disease outbreak, not the CIA. I want the information out there, so the press can it’s job of pressing for answers and explanations. In the case of a large disease outbreak, silence=death. Yes, there exist cases where premature release of information could conceivably cause mass panic. But those are events that can be managed and learnt from. The wrong disease on the lose, without adequate information or early Public Health involvement, could be deadly.

  2. enigma_foundry says

    Protections are in place. No need to fear. We are watching for a reason.

    I feel good about these systems, and hope they are developed further. My point really was: why is an important relatively non-controversial program like syndromic surveillance so slow to get off the ground, wheereas a controversial system like the NSA program, of relatively small public benefit, so quick to be funded.

    Infectious diseases, their surveillance and management will become a major issue, IMHO. SARS was, unfortunately, just the first of many new diseases.

    I think it is very interesting that we’ve had lawyers and epidiomologists comment on a blog run by a computer scientist, with a guest contributer (me) who is an architect. Much more of this cross discipline discussion is needed.

  3. Big Brother says

    hmmm. interesting. i’m a public health epidemiologist in a large US city. I pursued and acquired 1 of our 2 syndromic surveillance systems. We use surveillance data from 1) the National Retail Data Monitor (sales of cough syrup, Pedialyte, etc over the counter only, not prescription ) and 2) local hospital ER admissions chief complaint data. The first is agreggate and broadly available (for anyone who can afford the data), the second is composed of individual admission records sent directly to us, processed and analyzed directly on our servers, and is not available to anyone but me and my staff by law. Protections are in place. No need to fear. We are watching for a reason.

  4. Just so long as the data is aggregate statistics, rather than a list of who bought cough syrup (or AZT) when and where…

  5. enigma_foundry says

    I am not at all a critic of the program – I think most of the data they collect is not a danger to privacy. If anything I believe the program has clearly suffered from neglect and it deserves to have a much higher funding and operational priority than it has had in recent years.

    My intention with this post was two fold:

    1. To draw attention to the fact that this non-controversial program, essential to the continued safe and health of our society, was not yet implemented, in contrast to the NSA program which seems (to me) not essential to the health or safety of society, to the degree the Syndromic Surveillance System could be.

    2. Raise the policy issues for this program, which I believe center around the issue of how to operate and control the data once collected. My belief is the data should be made braodly available, and that that would represent the ‘Public Health’ approach to this database. To restrict access risks making the data less useful to the Public Health Community…

  6. Edward Kuns says

    Assuming the CDC program you are talking about actually achieves its primary goal (which is a possiblity), how much risk to individual liberty and loss of privacy would we willingly accept as a society? Let’s say that once every 50 years a disease event occurs with the risk of harming millions of people, and that once every five years a disease event occurs in some locality with the risk of harming tens or small hundreds of thousands. Let’s say that this program effectively mitigates harm by providing early notice.

    With that assumption, how much data do we let them collect and what controls do we put on that data?

    I am actually in favor of this CDC program, but I also observe it with mild concern. Look at just the recent government database losses — and here is some concentrated data of great use to hostile governments and other hostile entities. (Perhaps not of such great use to people seeking identity theft.)

  7. Eh. I meant to say the yellowish ones could all be combined, not eliminated. Although they could probably be eliminated. The US could probably have done without doomsday clocks and the five or six DEFCON statuses during the Cold War too, given that nothing actually happened except that the USSR blew up a spy plane here, snuck operatives into some secret research institution there, and eventually shot itself in the foot and died.

  8. Considering that it has a 90% mortality rate (like Ebola) and is potentially able to become easy to transmit (unlike Ebola) the media attention seems justified. Unlike, say, the various colored “terror alerts” announced weekly in the US with nothing having actually happened since 9/11 worthy of anything but “green”. If you ask me, the terrorists they need to worry about terrorizing their population these days are all working for DHS, mostly as media liaisons. 😛 And how many colors do they need, anyway? The ones I’ve heard mentioned cover pretty much the whole spectrum with five or six different levels. Green and blue could probably be eliminated anyway, and all of the yellow, orange, amber, whatnot, leaving Star Trek’s venerable “yellow” and “red”. One for elevated risk based on solid intelligence and the other for definite plan discovered with definite date and targets. Keeping in mind that the US got along for over 200 years without any national alert levels, through two world wars, a civil war, and yes, numerous terrorist attacks. Including 9/11.

  9. The question is how useful information can be distributed without triggering the “Bill Clinton Effect”, even with preparation and transparency. In the US today there seems to be a glut of medical information, without much of either. Look at the whole media circus around avian flu.

  10. Nice post. I think your Warsaw example is instructive.

  11. enigma_foundry says

    “The risks of dying in Warsaw beacuse of the Chernobyl incident were (extermely) low. It was a low-impact incident for most of Europe.”

    Mathfox: Yes, we know that NOW. Believe me, being in Warsaw in late April, 1986, this was not known at that time.

  12. “My empirical evidence is of course limited, but my gut feeling is that much dislocation was averted by (1) the obvious signs of organized preparation for such an event, and (2) the transparency with which data concerning public health were disseminated.”

    The risks of dying in Warsaw beacuse of the Chernobyl incident were (extermely) low. It was a low-impact incident for most of Europe. I wonder what would have happened in the case of a high risk incident. Would politics have allowed the same amount of openness? Would the population believe that the disaster response teams give correct information?
    It will take a few more disasters before we know whether the Warsaw team is acting effectively, but indications are good.

  13. this is what i was referring to in my comment to the other article.
    it’s especially interesting as our government (im from austria) decoded to “reform” health insurance and medical records. every insured person is now getting an “e-card” that holds his medical records, medical details and his insurance state. of course the information is encrypted, but seeing that the company developing the system seems to be a bit of a slacker, i’m not too sure that this encryption will hold for long.

    so i am really afraid that with only a chipcard reader, and a software set, you can access not only medical data, but also employment statistics of any person here.

    now imagine this system beeing upgraded with rfid for easier use at the doctor, or that the information backupped on central servers for redundancy. applying for a job would be MUCH harder if the company can access this data and pick applicants by their records. after all, all you need to identify them is their social security id.