November 21, 2024

Can Google Flu Trends Be Manipulated?

Last week researchers from Google and the Centers for Disease Control unveiled a cool new research result, showing that they could gauge the level of influenza infections in a region of the U.S. by seeing how often people in those regions did Google searches for certain terms related to the flu and flu symptoms. The search-based predictions correlate remarkably well with the medical data on flu rates — not everyone who searches for “cough medicine” has the flu, but enough do that an increase in flu cases correlates with an increase in searches for “cough medicine” and similar terms. The system is called Google Flu Trends.

Privacy groups have complained, but this use of search data seems benign — indeed, this level of flu detection requires only that search data be recorded per region, not per individual user. The legitimate privacy worry here is not about the flu project as it stands today but about other uses that Google or the government might find for search data later.

My concern today is whether Flu Trends can be manipulated. The system makes inferences from how people search, but people can change their search behavior. What if a person or a small group set out to convince Flu Trends that there was a flu outbreak this week?

An obvious approach would be for the conspirators to do lots of searches for likely flu-related terms, to inflate the count of flu-related searches. If all the searches came from a few computers, Flu Trends could presumably detect the anomalous pattern and the algorithm could reduce the influence of these few computers. Perhaps this is already being done; but I don’t think the research paper mentions it.

A more effective approach to spoofing Flu Trends would be to use a botnet — a large collection of hijacked computers — to send flu-related searches to Google from a larger number of computers. If the added searches were diffuse and well-randomized, they would be very hard to distinguish from legitimate searches, and the Flu Trends would probably be fooled.

This possibility is not discussed in the Flu Trends research paper. The paper conspicuously fails to identify any of the search terms that the system is looking for. Normally a paper would list the terms, or at least give examples, but none of the terms appear in the paper, and the Flu Trends web site gives only “flu” as an example search term. They might be withholding the search terms to make manipulation harder, but more likely they’re withholding the search terms for business reasons, perhaps because the terms have value in placing or selling ads.

Why would anyone want to manipulate Flu Trends? If flu rates affect the financial markets by moving the stock prices of certain drug or healthcare companies, then a manipulator can profit by sending false signals about flu rates.

The most interesting question about Flu Trends, though, is what other trends might be identifiable via search terms. Government might use similar methods to look for outbreaks of more virulent diseases, and businesses might look for cultural trends. In all of these cases, manipulation will be a risk.

There’s an interesting analogy to web linking behavior. When the web was young, people put links in their sites to point readers to other interesting sites. But when Google started inferring sites’ importance from their incoming links, manipulators started creating links for their Google-effect. The result was an ongoing cat-and-mouse game between search engines and manipulators. The more search behavior takes on commercial value, the more manipulators will want to change search behavior for commercial or cultural advantage.

Anything that is valuable to measure is probably, to someone, valuable to manipulate.

Comments

  1. I don’t think people in general will be stupid enough to believe everything Google says. Besides, Google is not the center of the universe at all so why bother ourselves with the Google Flu Trend?

  2. The media can have a considerable effect on the topic of flu or any other malady simply by sending email offering free info and assistance in locating a solution for the problem. Media affect is a serious problem when using Google search results refined under a specific search pattern.
    In other words, everybody should be ready to take Google search results with a grain of salt. In other words, use moderation when considering Google search results because there are far too many ways to game the results in favor of one group or another. To make things worse, pharmaceutical companies could benefit through the manipulation of search results by simply advertising via email to go to there web site of a solution. This wouldn’t necessarily be false advertisement or spam but could result in manipulation of the Google results.
    Be wary of any corporation offering gifts of free trend reports. Somebody will pay in the end.

  3. Ed, thanks for picking this up! I mused about it over at The Noisy Channel (http://thenoisychannel.com/2008/11/11/big-google-can-be-benign/) but figured the bigger story was the privacy backlash (http://thenoisychannel.com/2008/11/16/google-flu-trends-the-privacy-backlash-begins/).

  4. Just using a botnet probably isn’t enough (or if it is, I think it could be mitigated pretty easily). If you just blindly sent fake queries to Google, they’d see flu activity increase in many regions simultaneously. I don’t think you’d ever see a simultaneous increase like that in actual flu activity. So, it’s possible they could (automatically?) readjust the baseline to counter a botnet.

    Of course, a more sophisticated attack would be sent from only bots in the target region. This might be sufficient to evading detection of tampering on a global scale, but there are other things Google could do. For example, they could count queries only from browers with long-standing Google cookies. Bots would then have to infiltrate your browser’s cookie jar to manipulate the results.

  5. I’d think that false alarms would be an expected part of the model once it was throroughly, um, debugged. Not only flu shots and side effects, but local reporting on flu.

    The way to manipulate the market, I’d think, would be to show upticks in flu in areas where your competitor (be it vaccine or Tamiflu or something else) had a higher rate of penetration. “Everybody knows” that actual flu numbers are underreported, and away you go.

    The notion that anything that’s valuable to measure is valuable to manipulate can be taken even further: if there’s something you can do to change the facts being predicted, the value of the thing being measured becomes problematic. Effective antiflu measures that used Google Flue Trends to target intervention could kill much of the statistical association they rely on. Perhaps one of the standard examples of this is that “leading economic indicators” tend over the years not to predict expansion or recession so much as to predict economic policymakers’ interventions.

  6. At the end of the paper there is a graph that compares the model with the CDC reported flu rate. The model consistently predicts a slightly higher rate than is actually reported, for most of the period covered. And there is a larger over-prediction hump around the end of 2007/beginning of 2008 which I think would just about cover the time that people were getting their flu shots or experiencing side effects from them. (This year, for example, National Influenza Vaccination Week is Dec. 8-14, according to the CDC.)

  7. I would like to think that anyone using the Flu Trends data wouldn’t be relying solely on this data, but would rather use it as a sort of heads up. I can’t see anyone deciding to invest in a Pharma that produces flu medications, for example, simply because Flu Trends noticed an uptick in flu occurrence in certain geographic regions. I’d think an investor would want some corroborating evidence, such as news reports on outbreaks.
    From this perspective, I wouldn’t think it would be worth much to game Flu Trends.

    On the other hand, I do see a lot of potential for false alarms, which could be costly. For example, when CNN or some other national network goes through its annual ritual of warning about the flu season, I can see thousands of towns showing up as at-risk all at the same time. Or when some local community organization decides to run a massive flu shot drive, I can see that town showing up as at-risk when people go out and use the internet to figure out whether they should bother to get vaccinated. If this triggers investigative action by local or national health organizations, based on Flu-Trends alarms, it could be counter-productive.