February 28, 2017

Archives for June 2008

Newspapers' Problem: Trouble Targeting Ads

Richard Posner has written a characteristically thoughtful blog entry about the uncertain future of newspapers. He renders widespread journalistic concern about the unwieldy character of newspapers into the crisp economic language of “bundling”:

Bundling is efficient if the cost to the consumer of the bundled products that he doesn’t want is less than the cost saving from bundling. A particular newspaper reader might want just the sports section and the classified ads, but if for example delivery costs are high, the price of separate sports and classified-ad “newspapers” might exceed that of a newspaper that contained both those and other sections as well, even though this reader was not interested in the other sections.

With the Internet’s dramatic reductions in distribution costs, the gains from bundling are decreased, and readers are less likely to prefer bundled products. I agree with Posner that this is an important insight about the behavior of readers, but would argue that reader behavior is only a secondary problem for newspapers. The product that newspaper publishers sell—the dominant source of their revenues—is not newspapers, but audiences.

Toward the end of his post, Posner acknowledges that papers have trouble selling ads because it has gotten easier to reach niche audiences. That seems to me to be the real story: Even if newspapers had undiminished audiences today, they’d still be struggling because, on a per capita basis, they are a much clumsier way of reaching readers. There are some populations, such as the elderly and people who are too poor to get online, who may be reachable through newspapers and unreachable through online ads. But the fact that today’s elderly are disproportionately offline is an artifact of the Internet’s novelty (they didn’t grow up with it), not a persistent feature of the marektplace. Posner acknoweldges that the preference of today’s young for online sources “will not change as they get older,” but goes on to suggest incongruously that printed papers might plausibly survive as “a retirement service, like Elderhostel.” I’m currently 26, and if I make it to 80, I very strongly doubt I’ll be subscribing to printed papers. More to the point, my increasing age over time doesn’t imply a growing preference for print; if anything, age is anticorrelated with change in one’s daily habits.

As for the claim that poor or disadvantaged communities are more easily reached offline than on, it still faces the objection that television is a much more efficient way of reaching large audiences than newsprint. There’s also the question of how much revenue can realistically be generated by building an audience of people defined by their relatively low level of purchasing power. If newsprint does survive at all, I might expect to see it as a nonprofit service directed at the least advantaged. Then again, if C. K. Prahalad is correct that businesses have neglected a “fortune at the bottom of the pyramid” that can be gathered by aggregating the small purchases of large numbers of poor people, we may yet see papers survive in the developing world. The greater relative importance of cell phones there, as opposed to larger screens, could augur favorably for the survival of newsprint. But phones in the developing world are advancing quickly, and may yet emerge as a better-than-newsprint way of reading the news.

The End of Theory? Not Likely

An essay in the new Wired, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” argues that we won’t need scientific theories any more, now that we have so much stored information and such great tools for analyzing it. Wired has never been the best source for accurate technology information, but this has to be a new low point.

Here’s the core of the essay’s argument:

[…] The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

There are several errors here, but the biggest one is about correlation and causation. It’s true that correlation does not imply causation. But the reason is not that the correlation might have arisen by chance – that possibility can be eliminated given enough data. The problem is that we need to know what kind of causation is operating.

To take a simple example, suppose we discover a correlation between eating spinach and having strong muscles. Does this mean that eating spinach will make you stronger? Not necessarily; this will only be true if spinach causes strength. But maybe people in poor health, who tend to have weaker muscles, have an aversion to spinach. Maybe this aversion is a good thing because spinach is actually harmful to people in poor health. If that is true, then telling everybody to eat more spinach would be harmful. Maybe some common syndrome causes both weak muscles and aversion to spinach. In that case, the next step would be to study that syndrome. I could go on, but the point should be clear. Correlations are interesting, but if we want a guide to action – even if all we want to know is what question to ask next – we need models and experimentation. We need the scientific method.

Indeed, in a world with more and more data, and better and better tools for finding correlations, we need the scientific method more than ever. This is confirmed by the essay’s physics story, in which physics theory (supposedly) went off the rails due to a lack of experimental data. Physics theory would be more useful if there were more data. And the same is true of scientific theory in general: theory and experiment advance in tandem, with advances in one creating opportunities for the other. In the coming age, theory will not wither away. Instead, it will be the greatest era ever for theory, and for experiment.

Copyright, Technology, and Access to the Law

James Grimmelmann has an interesting new essay, “Copyright, Technology, and Access to the Law,” on the challenges of ensuring that the public has effective knowledge of the laws. This might sound like an easy problem, but Grimmelmann combines history and explanation to show why it can be difficult. The law – which includes both legislators’ statutes and judges’ decisions – is large, complex, and ever-changing.

Suppose I gave you a big stack of paper containing all of the laws ever passed by Congress (and signed by the President). This wouldn’t be very useful, if what you wanted was to know whether some action you were contemplating would violate the law. How would you find the laws bearing on that action? And if you did find such a law, how would you determine whether it had been repealed or amended later, or how courts had interpreted it?

Making the law accessible in practice, and not just in theory, requires a lot of work. You need reliable summaries, topic-based indices, reverse-citation indices (to help you find later documents that might affect the meaning of earlier ones), and so on. In the old days of paper media, all of this had to be printed and distributed in large books, and updated editions had to be published regularly. How to make this happen was an interesting public policy problem.

The traditional answer has been copyright. Generally, the laws themselves (statutes and court opinions) are not copyrightable, but extra-value content such as summaries and indices can be copyrighted. The usual theory of copyright applies: give the creators of extra-value content some exclusive rights, and the profit motive will ensure that good content is created.

This has some similarity to our Princeton model for government transparency, which urges government to publish information in simple open formats, and leave it to private parties to organize and present the information to the public. Here government was creating the basic information (statutes and court opinions) and private parties were adding value. It wasn’t exactly our model, as government was not taking care to publish information in the form that best facilitated private re-use, but it was at least evidence for our assertion that, given data, private parties will step in and add value.

All of this changed with the advent of computers and the Internet, which made many of the previously difficult steps cheaper and easier. For example, it’s much easier to keep a website up to date than to deliver updates to the owners of paper books. Computers can easily construct citation indices, and a search engine provides much of the value of a printed index. Access to the laws can be cheaper and easier now.

What does this mean for public policy? First, we can expect more competition to deliver legal information to the public, thanks to the reduced barriers to entry. Second, as competition drives down prices we’ll see fewer entities that are solely in the business of providing access to laws; instead we’ll see more non-profits, along with businesses providing free access. More competition and lower prices will mean better and more effective access to the law for citizens. Third, copyright will still play a role by supporting the steps that remain costly, such as the writing of summaries.

Finally, it will matter more than ever exactly how government provides access to the raw information. If, as sometimes happens now, government provides the raw information in an awkward or difficult-to-use form, private actors must invest in converting it into a more usable form. These investments might not have mattered much in the past when the rest of the process was already expensive; but in the Internet age they can make a big difference. Given access to the right information in the right format, one person can produce a useful mashup or visualization tool with a few weeks of spare-time work. Government, by getting the details of data publication right, can enable a flood of private innovation, not to mention a better public debate.