An essay in the new Wired, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” argues that we won’t need scientific theories any more, now that we have so much stored information and such great tools for analyzing it. Wired has never been the best source for accurate technology information, but this has to be a new low point.
Here’s the core of the essay’s argument:
[…] The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.
There are several errors here, but the biggest one is about correlation and causation. It’s true that correlation does not imply causation. But the reason is not that the correlation might have arisen by chance – that possibility can be eliminated given enough data. The problem is that we need to know what kind of causation is operating.
To take a simple example, suppose we discover a correlation between eating spinach and having strong muscles. Does this mean that eating spinach will make you stronger? Not necessarily; this will only be true if spinach causes strength. But maybe people in poor health, who tend to have weaker muscles, have an aversion to spinach. Maybe this aversion is a good thing because spinach is actually harmful to people in poor health. If that is true, then telling everybody to eat more spinach would be harmful. Maybe some common syndrome causes both weak muscles and aversion to spinach. In that case, the next step would be to study that syndrome. I could go on, but the point should be clear. Correlations are interesting, but if we want a guide to action – even if all we want to know is what question to ask next – we need models and experimentation. We need the scientific method.
Indeed, in a world with more and more data, and better and better tools for finding correlations, we need the scientific method more than ever. This is confirmed by the essay’s physics story, in which physics theory (supposedly) went off the rails due to a lack of experimental data. Physics theory would be more useful if there were more data. And the same is true of scientific theory in general: theory and experiment advance in tandem, with advances in one creating opportunities for the other. In the coming age, theory will not wither away. Instead, it will be the greatest era ever for theory, and for experiment.