December 24, 2024

The End of Theory? Not Likely

An essay in the new Wired, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” argues that we won’t need scientific theories any more, now that we have so much stored information and such great tools for analyzing it. Wired has never been the best source for accurate technology information, but this has to be a new low point.

Here’s the core of the essay’s argument:

[…] The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

There are several errors here, but the biggest one is about correlation and causation. It’s true that correlation does not imply causation. But the reason is not that the correlation might have arisen by chance – that possibility can be eliminated given enough data. The problem is that we need to know what kind of causation is operating.

To take a simple example, suppose we discover a correlation between eating spinach and having strong muscles. Does this mean that eating spinach will make you stronger? Not necessarily; this will only be true if spinach causes strength. But maybe people in poor health, who tend to have weaker muscles, have an aversion to spinach. Maybe this aversion is a good thing because spinach is actually harmful to people in poor health. If that is true, then telling everybody to eat more spinach would be harmful. Maybe some common syndrome causes both weak muscles and aversion to spinach. In that case, the next step would be to study that syndrome. I could go on, but the point should be clear. Correlations are interesting, but if we want a guide to action – even if all we want to know is what question to ask next – we need models and experimentation. We need the scientific method.

Indeed, in a world with more and more data, and better and better tools for finding correlations, we need the scientific method more than ever. This is confirmed by the essay’s physics story, in which physics theory (supposedly) went off the rails due to a lack of experimental data. Physics theory would be more useful if there were more data. And the same is true of scientific theory in general: theory and experiment advance in tandem, with advances in one creating opportunities for the other. In the coming age, theory will not wither away. Instead, it will be the greatest era ever for theory, and for experiment.

Government Data and the Invisible Hand

David Robinson, Harlan Yu, Bill Zeller, and I have a new paper about how to use infotech to make government more transparent. We make specific suggestions, some of them counter-intuitive, about how to make this happen. The final version of our paper will appear in the Fall issue of the Yale Journal of Law and Technology. The best way to summarize it is to quote the introduction:

If the next Presidential administration really wants to embrace the potential of Internet-enabled government transparency, it should follow a counter-intuitive but ultimately compelling strategy: reduce the federal role in presenting important government information to citizens. Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use. We argue that this understanding is a mistake. It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

In the current Presidential cycle, all three candidates have indicated that they think the federal government could make better use of the Internet. Barack Obama’s platform explicitly endorses “making government data available online in universally accessible formats.” Hillary Clinton, meanwhile, remarked that she wants to see much more government information online. John McCain, although expressing excitement about the Internet, has allowed that he would like to delegate the issue, possible to a vice-president.

But the situation to which these candidates are responding – the wide gap between the exciting uses of Internet technology by private parties, on the one hand, and the government’s lagging technical infrastructure on the other – is not new. The federal government has shown itself consistently unable to keep pace with the fast-evolving power of the Internet.

In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that “exposes” the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.

Our approach follows the engineering principle of separating data from interaction, which is commonly used in constructing websites. Government must provide data, but we argue that websites that provide interactive access for the public can best be built by private parties. This approach is especially important given recent advances in interaction, which go far beyond merely offering data for viewing, to offer services such as advanced search, automated content analysis, cross-indexing with other data sources, and data visualization tools. These tools are promising but it is far from obvious how best to combine them to maximize the public value of government data. Given this uncertainty, the best policy is not to hope government will choose the one best way, but to rely on private parties with their vibrant marketplace of engineering ideas to discover what works.

To read more, see our preprint on SSRN.

The Microsoft Case, Ten Years Later

Sunday was the tenth anniversary of the government filing its antitrust case against Microsoft. The date passed almost unnoticed, though echoes of the case continue to reverberate. This week I want to reflect on the case, with the benefit of ten years’ hindsight. I’ll write at least three posts: today, on the overall legacy of the case; Wednesday, on how the case affected the public view of Microsoft and software companies generally; and Friday, on how the government’s theory of the software market (which the courts accepted) looks in hindsight.

(Before starting, I should clarify that although I worked with the DoJ trial team through virtually the entire case – from before the case was filed, through the negotiation of the final settlement – I can’t say anything about what happened behind closed doors. My opinion is informed by everything I saw and heard, but unfortunately some of the most interesting details have to stay secret.)

Today I want to consider the overall legacy of the case. The purpose of antitrust law is to protect market competition, for the good of consumers. Thus Microsoft’s ultimate success in crushing Netscape and blunting the effect of Java only matters to the extent that it might have harmed consumers. The relevant questions are these: (1) Are the markets for operating systems and browsers healthier and more competitive than they would have been had the case not been brought? (2) Are consumers better off than they would have been had the case not been brought?

I see the case as a success by these standards, not so much because of the settlement, which most people saw as weak, but because the case taught Microsoft that ignoring antitrust concerns can be dangerous. Microsoft was routed in court and faced the possibility (though never the likelihood) of a court-ordered break-up; but the company managed to negotiate a favorable settlement when the government was distracted after the 9/11 attacks. Apparently worried that it might not be so lucky the next time, the company has moderated its behavior. It still dominates the operating system and browser markets – and it is still a fierce technical competitor, but its business and legal behavior is more moderate.

This kinder, gentler Microsoft is one of the two main legacies of the case. The other is the consensus that antitrust laws do in fact apply to high-tech companies. Though the law moves slowly – and sometimes can only deter via the possibility of after-the-fact sanctions – companies are not immune to its discipline just because they are in high-tech markets. Other powerful companies, such as Intel and Google, have learned this lesson too.

Tomorrow: how the case affected the public view of Microsoft and the software industry.

Stupidest Infotech Policy Contest

James Fallows at the Atlantic recently ran a reader contest to nominate the worst public policy decision of the past fifty years. (<a href="http://jamesfallows.theatlantic.com/archives/2008/05/stupidest_policy_ever_contest_1.php"The winner? Ethanol subsidies.) I’d like to do the same for technology policy.

Readers, please submit your suggestions for the stupidest infotech policy ever. An ideal submission is an infotech policy that (1) was established by a government, (2) did serious damage, (3) had wide support across the political spectrum, (4) failed for reasons that should have been obvious at the time, (5) failed even by the standards of its own supporters. It’s not enough that you would have chosen differently, or that you would have weighed competing public goods differently – we’re looking for a policy that no reasonable person, with the benefit of hindsight, would support.

Submit your suggestions in the comments. Once the discussion has died down, I’ll choose a winner. If this contest is successful, we’ll follow it up with a best policy contest.

Comcast and BitTorrent: Why You Can't Negotiate with a Protocol

The big tech policy news yesterday was Comcast’s announcement that it will stop impeding BitTorrent traffic, but instead will respond to network congestion by slowing traffic from the highest-volume users, regardless of what those users are doing. Comcast also announced a deal with BitTorrent, aimed at developing more effective ways of channeling peer-to-peer traffic through networks.

It may seem natural to respond to a network issue involving BitTorrent by making a deal with BitTorrent – and much of the reporting and commentary has taken that line – but there is something odd about the BitTorrent deal, which only becomes clear when we unpack the difference between the BitTorrent protocol and the BitTorrent company. The BitTorrent protocol is a set of technical rules used by desktop software programs to coordinate the peer-to-peer distribution of files. The company BitTorrent Inc. is just one maker of software that uses the protocol – indeed, it’s a relatively minor player in that market. Most people who use the BitTorrent protocol don’t use software from BitTorrent Inc.

What this means is that changes in BitTorrent Inc’s products won’t have much effect on Comcast’s network. What Comcast needs, if it wants to change conditions in its network, is to change the BitTorrent protocol.

The problem is that you can’t negotiate with a protocol, for the same reason that you can’t negotiate with (say) the English language. You can use the language to negotiate with someone, but you can’t have a negotiation where the other party is the language. You can negotiate with the Queen of England, or English Department at Princeton, or the people who publish the most popular dictionary. But the language itself just isn’t the kind of entity that can make an agreement or have an intention.

This property of protocols – that you can’t get a meeting with them, convince them to change their behavior, or make a deal with them – seems especially challenging to some Washington policymakers. If, as they do, you live in a world driven by meetings and deal-making, a world where problem-solving means convincing someone to change something, then it’s natural to think that every protocol, and every piece of technology, must be owned and managed by some entity.

Engineers sometimes make a similar mistake in thinking about technology markets. We like to think that technologies are designed by engineers, but often it’s more accurate to say that some technology was designed by a market. And where the market is in charge, there is nobody to call when the technology needs to be changed.

Will Comcast and BitTorrent Inc. succeed in improving the BitTorrent protocol? Maybe. But it won’t be enough simply to have a better protocol. They’ll also have to convince the population of BitTorrent users to switch.

UPDATE (April 2): A reader points out that BitTorrent Inc bought uTorrent, one of the popular client programs implementing the BitTorrent protocol. This means that BitTorrent Inc has more leverage to force adoption of new protocol versions than I had thought. Still, I stand by the basic point of the post, that BitTorrent Inc doesn’t have unilateral power to change the protocol.