April 20, 2014

avatar

The Silver Effect: What We Can Learn from Poll Aggregators

For those who now think Nate Silver is god, here’s a question: Can Nate Silver make a prediction so accurate that Nate Silver himself doesn’t believe it?

Yes, he can–and he did. Silver famously predicted the results of Election 2012 correctly in every state. Yet while his per-state predictions added up to the 332 electoral votes that Obama won, Silver himself predicted that Obama’s expected electoral vote total was only 313. Why? Because Silver predicted that Silver would get some states wrong. Unpacking this (pseudo-)paradox can help us understand what we can and can’t learn from the performance of poll aggregators like Nate Silver and Princeton’s Sam Wang in this election.

(I mention Silver more often than Wang because Silver is more famous–though I would bet on Wang if the two disagreed.)

Silver’s biggest innovation was to introduce modern quantitative thinking to the world of political punditry. Silver’s predictions come with confidence estimates. Unlike traditional pundits, who either claim to be absolutely certain, or say that they have no idea what will happen, Silver might say that he believes something with 70% confidence. If he makes ten predictions, each with 70% confidence, then on average three of those predictions will turn out to be wrong. He knows that some of them will be wrong, but he doesn’t know which ones will turn out to be wrong. The same basic argument applied to his state-by-state predictions–they were made with less than 100% confidence, so he knew that some of them were likely to be wrong. That explains why he predicted 313 total electoral votes for Obama, while at the same time making individual state predictions adding up to 332 electoral votes–because he expected some of his state predictions would turn out to be wrong. Silver’s own numbers imply an 88% confidence that he would get at least one state wrong.

In short, Nate Silver got lucky, and his good luck will lead some of his less numerate readers to misunderstand why he was right. To see why, imagine a counterfactual world in which Silver’s three lowest-confidence state predictions had gone the other way, and Romney had won Florida, Virginia, and Colorado. Obama would still have won the election–as Silver predicted with 91% confidence–but Silver would have gotten fewer kudos. Yet this scenario would have better illustrated the value of statistical thinking, by showing how statistical reasoning can get the big picture right even if its detailed predictions are only right most of the time.

In fact, the improbable match between Silver’s state-by-state prediction and the actual results is an argument against the correctness of Silver’s methodology, because it implies that his state-by-state confidence estimates might have been too low. We can’t say for sure, based on only one election, but what evidence there is points toward the conclusion that Silver’s confidence estimates were off. It’s also interesting that Sam Wang’s methodology–which I prefer slightly over Silver’s–led to higher confidence levels. (Sam predicted an Obama victory with essentially 100% confidence.)

In the next election, statistical analysis will be much more central to the discussion. We can already see the start of the kind of “Moneyball war” that we saw in baseball, where cigar-chomping oldtimers scoffed that mere number crunching could never substitute for gut feeling–and meanwhile the smarter oldtimers were figuring out how to integrate statistical thinking into their organizations, and thriving as a result. In 2016, we can expect a cadre of upstart analysts, each with their own “secret sauce”, who claim to have access to deeper truth than the mainstream analysts have. But unlike in baseball, where a hitter comes to the plate hundred of times in a season, allowing statistical prediction methods to be tested on large data sets, presidential elections are rare, and there are only a handful of historical elections that had good polling data. So we won’t be able to tell the good analysts from the bad by looking at their track records–we’ll have to rely on quantitative reasoning to see whose methodology is better.

When Nate Silver is less lucky next time–and we can predict with fairly high confidence that he will be–please don’t abandon him. The good analysts, like Sam Wang and Nate Silver, are better than traditional pundits not because they are always right but because, unlike traditional pundits, they tell you how much confidence you should have in what they say.

Comments

  1. Ewan says:

    Any chance of a simple explanation of what it is in Wang’s methodology that you prefer over Silver’s? Thanks :) .

    • Ed Felten says:

      Sam Wang posted an interesting comparison of the various poll aggregation sites, which I recommend. It’s at http://election.princeton.edu/2012/11/04/comparisons-among-aggregators-and-modelers/

      The short answer to why I prefer Sam Wang’s methodology is that his model is simpler, more sensitive to poll movements, and the details are more defensible. For example, Sam uses the median of all reputable polls, rather than trying to weight polls based on accuracy. He also does a more precise computation to combine the state-by-state probabilities, using dynamic programming rather than the Monte Carlo (simulate a lot of random trials) approach that Nate Silver seems to use. This last point doesn’t make a big difference in the result, but to me it is a signal of methodological craftsmanship.

  2. Andrew Douglass says:

    Ditto. “I would bet on Wang if the two disagreed” essentially eliminates Silver as someone to rely on. You *did* couch it as a matter of probability (“bet”). :)

    • BanFrenchRoast says:

      Hmm…you aren’t supposed to do a probability of a probability. “I would bet on Wang if the two disagreed” is not a probability. It is a 100% certain statement. No distribution (and derived confidence level) is given for the statement “Wang is right when Silver is wrong”. The statement as given is not statistical analysis, it is pure punditry from an Old(ish) White Guy. Watch out, we are grumpy bunch these days. :-)

      • Andrew Douglass says:

        Lol, well, point taken. Note that “pure punditry” is currently synonymous for “pure BS.” Anyway it’s a heckuva endorsement. As noted the reason for it is the interesting thing!

        • wulongjie says:

          In fact, the improbable match between Silver’s state-by-state prediction and the actual results is an argument against the correctness of Silver’s methodology, because it implies that his state-by-state confidence estimates might have been too low. We can’t say for sure, based on only one election, but what evidence there is points toward the conclusion that Silver’s confidence estimates were off. It’s also interesting that Sam Wang’s methodology–which I prefer slightly over Silver’s–led to higher confidence levels. (Sam predicted an Obama victory with essentially 100% confidence.)

  3. Starling says:

    > In fact, the improbable match between Silver’s state-by-state prediction and the actual results is an argument
    > against the correctness of Silver’s methodology, because it implies that his state-by-state confidence estimates
    > might have been too low.

    I don’t think it necessarily implies this and it may not even be improbable, e.g., if the state-by-state predictions were very highly correlated (which seems plausible).

  4. 1 says:

    1

  5. Mitch Golden says:

    Much less attention is paid to the fact that Wang got all the senate seats right when Silver got two of them wrong. In fact, Silver made two rather strong, wrong predictions: He said that Montana was 2:1 for the Republican, and North Dakota was 11:1 for the Republican, but both were carried by Democrats.

    I actually don’t understand what happened in these, especially the latter, as it seems more or less that Silver may simply have missed some polls. However, it’s also true (as Wang emphasized) that the “state fundamentals” secret sauce that Silver ads probably just confuses matters without adding anything – or even makes stuff worse.