May 30, 2024

When Wikipedia Converges

Many readers, responding to my recent quality-check on Wikipedia, have argued that over time the entries in question will improve, so that in the long run Wikipedia will outpace conventional encyclopedias like Britannica. It seems to me that this is the most important claim made by Wikipedia boosters.

If a Wikipedia entry gets enough attention, then it will likely change over time. When the entry is new, it will almost certainly improve by adding more detail. But once it matures, it seems likely that it will reach some level of quality and then level off, executing a quality-neutral random walk, with the changes reflecting nothing more than minor substitutions of one contributor’s style or viewpoint for another.

I’d expect a similar story for Wikipedia as a whole, with early effort spent mostly on expanding the scope of the site, and later effort spent more on improving (or at least changing) existing entries. Given enough effort spent on the site, more and more entries should approach maturity, and the rate of improvement in Wikipedia as a whole should approach zero.

This leaves us with two questions: (1) Will enough effort be spent on Wikipedia to cause it to reach the quality plateau? (2) How high is the quality plateau anyway?

We can shed light on both questions by studying the evolution of individual entries over time. Such a study is possible today, since Wikipedia tracks the history of every entry. I would like to see the results of such a study, but unfortunately I don’t have time to do it myself.

Wikipedia vs. Britannica Smackdown

On Friday I wrote about my spot-check of the accuracy of Wikipedia, in which I checked Wikipedia’s entries for six topics I knew well. I was generally impressed, except for one entry that went badly wrong.

Adam Shostack pointed out, correctly, that I had left the job half done, and I needed to compare to the entries for the same six topics in a traditional encyclopedia. So here’s my Wikipedia vs. Britannica Online comparison, for the six topics I wrote about on Friday.

Princeton University: Both entries are accurate and reasonably well written. Wikipedia has more information. Verdict: small advantage to Wikipedia.

Princeton Township: Britannica has a single entry for Princeton Township and Princeton Boro, while Wikipedia has separate entries. Both entries are good, but Wikipedia has more information (including demographics). Also, Britannica makes an error in saying that Morven is the Governor’s Residence for the state of New Jersey; Wikipedia correctly labels Morven as the former Governor’s Residence and Drumthwacket as the current one. Verdict: advantage to Wikipedia.

Me: Wikipedia has a short but decent entry; Britannica, unsurprisingly, has nothing. Verdict: advantage Wikipedia.

Virtual memory: Wikipedia has a pretty good entry; Britannica has no entry for virtual memory, and doesn’t appear to discuss the concept elsewhere, either. Verdict: advantage Wikipedia.

Public-key cryptography: Good, accurate entries in both. Verdict: toss-up.

Microsoft antitrust case: Britannica has only two sentences, saying that Judge Jackson ruled against Microsoft and ordered a breakup, and that the Court of Appeals overturned the breakup but agreed that Microsoft had broken the law. That’s correct, but it leaves out the settlement. Wikipedia’s entry is much longer but error-prone. Verdict: big advantage to Britannica.

Overall verdict: Wikipedia’s advantage is in having more, longer, and more current entries. If it weren’t for the Microsoft-case entry, Wikipedia would have been the winner hands down. Britannica’s advantage is in having lower variance in the quality of its entries.

Wikipedia Quality Check

There’s been an interesting debate lately about the quality of Wikipedia, the free online encyclopedia that anyone can edit.

Critics say that Wikipedia can’t be trusted because any fool can edit it, and because nobody is being paid to do quality control. Advocates say that Wikipedia allows domain experts to write entries, and that quality control is good because anybody who spots an error can correct it.

The whole flap was started by a minor newspaper column. The column, like much of the debate, ignores the best evidence in the Wikipedia-quality debate: the content of Wikipedia. Rather than debating, in the abstract, whether Wikipedia would be accurate, why don’t we look at Wikipedia and see?

I decided to take a look and see how accurate Wikipedia is. I looked at its entries on things I know very well: Princeton University, Princeton Township, myself, virtual memory (a standard but hard-to-explain bit of operating-system technology), public-key cryptography, and the Microsoft antitrust case.

The entries for Princeton University and Princeton Township were excellent.

The entry on me was accurate, but might be criticized for its choice of what to emphasize. When I first encountered the entry, my year of birth was listed as “[1964 ?]”. I replaced it with the correct year (1963). It felt a bit odd to be editing an encyclopedia entry on myself, but I managed to limit myself to a strictly factual correction.

The technical entries, on virtual memory and public-key cryptography, were certainly accurate, which is a real achievement. Both are backed by detailed technical information that probably would not be available at all in a conventional encyclopedia. My only criticism of these entries is that they could do more to make the concepts accessible to non-experts. But that’s a quibble; these entries are certainly up to the standard of typical encyclopedia writing about technical topics.

So far, so good. But now we come to the entry on the Microsoft case, which was riddled with errors. For starters, it got the formal name of the case (U.S. v. Microsoft) wrong. It badly mischaracterized my testimony, it got the timeline of Judge Jackson’s rulings wrong, and it made terminological errors such as referring to the DOJ as “the prosecution” rather than the “the plaintiff”. I corrected two of these errors (the name of the case, and the description of my testimony), but fixing the whole thing was too big an effort.

Until I read the Microsoft-case page, I was ready to declare Wikipedia a clear success. Now I’m not so sure. Yes, that page will improve over time; but new pages will be added. If the present state of Wikipedia is any indication, most of them will be very good; but a few will lead high-school report writers astray.

More Journal Editors Have Declared Independence

In response to my previous post about the revolt by the editors of the Journal of Algorithms, Peter Suber points out that journal editors have “declared independence” before, at least twelve times. Peter’s blog, Open Access News is a great source for news about the trend toward open access to scholarly publications.

Journal of Algorithms Editorial Board Revolts

The editorial board of the Journal of Algorithms has resigned en masse, to protest what they call price-gouging by Elsevier, the company that publishes the journal. The journal’s annual subscription price had risen to $700, which is beyond the reach of many libraries, not to mention individuals.

The resigning board includes very distinguished computer scientists such as Donald Knuth. They have announced their intention to work on a new journal, Transactions on Algorithms, to be published by ACM, the leading professional society for computer scientists.

It’s surprising that this sort of thing doesn’t happen more often. The value of a journal comes from the quality of articles in it; and this quality derives mostly from the reputations of the editorial board members and the work they do in choosing and editing articles. If a journal’s management takes a direction that the scientists on the editorial board don’t like, there is something they can do about it!

Elsevier says they will find a new board and continue publishing the journal, but it’s hard to imagine that anybody in the field will take it seriously anymore.

Computer scientists are lucky, in that most of our best journals and conference proceedings are published by our professional societies at reasonable prices and terms. The new Transactions on Algorithms will be yet another example.