August 26, 2016


Developing Texts Like We Develop Software

Recently I was asked to speak at a conference for university librarians, about how the future of academic publication looks to me as a computer scientist. It’s an interesting question. What do computer scientists have to teach humanists about how to write? Surely not our elegant prose style.

There is something distinctive about how computer scientists write: we tend to use software development tools to “develop” our texts. This seems natural to us. A software program, after all, is just a big text, and the software developers are the authors of the text. If a tool is good for developing the large, complex, finicky text that is a program, why not use it for more traditional texts as well?

Like software developers, computer scientist writers tend to use version control systems. These are software tools that track and manage different versions of a text. What makes them valuable is not just the ability to “roll back” to old versions — you can get that (albeit awkwardly) by keeping multiple copies of a file. The big win with version control tools is the level of control they give you. Who wrote this line? What did Joe write last Tuesday? Notify me every time section 4 changes. Undo the changes Fred made last Wednesday, but leave all subsequent changes in place. And so on. Version control systems are a much more powerful relative of the “track changes” and “review” features of standard word processors.

Another big advantage of advanced version control is that it enables parallel development, a style of operation in which multiple people can work on the text, separately, at the same time. Of course, it’s easy to work in parallel. What’s hard is to merge the parallel changes into a coherent final product — which is a huge pain in the neck with traditional editing tools, but is easy and natural with a good version control system. Parallel development lets you turn out a high-quality product faster — it’s a necessity when you have hundred or thousands of programmers working on the same product — and it vastly reduces the amount of human effort spent on coordination. You still need coordination, of course, but you can focus it where it matters, on the conceptual clarity of the document, without getting distracted by version-wrangling.

Interestingly, version control and parallel development turn out to be useful even for single-author works. Version control lets you undo your mistakes, and to reconstruct the history of a problematic section. Parallel development is useful if you want to try an experiment — what happens if I swap sections 3 and 4? — and try out this new approach for a while yet retain the ability to accept or reject the experiment as a whole. These tools are so useful that experienced computer scientists tend to use them to write almost anything longer than a blog post.

While version control and parallel development have become standard in computer science writing, there are other software development practices that are only starting to cross the line into CS writing: issue tracking and the release early and often strategy.

Issue tracking systems are used to keep track of problems, bugs, and other issues that need to be addressed in a text. As with version control, you can do this manually, or rely on a simple to-do list, but specialized tools are more powerful and give you better control and better visibility into the past. As with software, issues can range from small problems (our terminology for X is confusing) to larger challenges (it would be nice if our dataset were bigger).

“Release early and often” is a strategy for rapidly improving a text by making it available to users (or readers), getting feedback, and rapidly turning out a new version that addresses the feedback. Users’ critiques become issues in the issue tracking system; authors modify the text to address the most urgent issues; and a new version is released as soon as the text stabilizes. The result is rapid improvement, aligned with the true desires of users. This approach requires the right attitude from users, who need to be willing to tolerate problems, in exchange for a promise that their critiques will be addressed promptly.

What does all of this mean for writers who are not computer scientists? I won’t be so bold as to say that the future of writing will be just exactly like software development. But I do think that the tools and techniques of software development, which are already widely used by computer scientist writers, will diffuse into more common usage. It will be hard to retrofit them into today’s large, well-established editing software, but as writing tools move into the cloud, I wouldn’t be surprised to see them take on more of the attributes of today’s software development tools.

One consequence of using these tools is that you end up with a fairly complete record of how the text developed over time, and why. Imagine having a record like that for the great works of the past. We could know what the writer did every hour, every day while writing. We could know which issues and problems the author perceived in earlier versions of the text, and how these were addressed. We could know which issues the author saw as still unfixed in the final published text. This kind of visibility will be available into our future writing — assuming we produce works that are worthy of study.


Lessons from Amazon's 1984 Moment

Amazon got some well-deserved criticism for yanking copies of Orwell’s 1984 from customers’ Kindles last week. Let me spare you the copycat criticism of Amazon — and the obvious 1984-themed jokes — and jump right to the most interesting question: What does this incident teach us?

Human error was clearly part of the problem. Somebody at Amazon decided that repossessing purchased copies of 1984 would be a good idea. They were wrong about this, as both the public reaction and the company’s later backtracking confirm. But the fault lies not just with the decision-maker, but also with the factors that made the decision more likely, including some aspects of the technology itself.

Some put the blame on DRM, but that’s not the problem here. Even if the Kindle used open formats and let you export and back up your books, Amazon could still have made 1984 disappear from your Kindle. Yes, some users might have had backups of 1984 stored elsewhere, but most users would have lost their only copy.

Some blame cloud computing, but that’s not precisely right either. The Kindle isn’t really a cloud device — the primary storage, computing and user interface for your purchased books are provided by your own local Kindle device, not by some server at Amazon. You can disconnect your Kindle from the network forever (by flipping off the wireless network switch on the back), and it will work just fine.

Some blame the fact that Amazon controls everything about the Kindle’s software, which is a better argument but still not quite right. Most PCs are controlled by a single company, in the sense that that company (Microsoft or Apple) can make arbitrary changes to the software on the PC, including (in principle) deleting files or forcibly removing software programs.

The problem, more than anything else, is a lack of transparency. If customers had known that this sort of thing were possible, they would have spoken up against it — but Amazon had not disclosed it and generally does offer clear descriptions of how the product works or what kinds of control the company retains over users’ devices.

Why has Amazon been less transparent than other vendors? I’m not sure, but let me offer two conjectures. It might be because Amazon controls the whole system. Systems that can run third-party software have to be more open, in the sense that they have to tell the third-party developers how the system works, and they face some pressure to avoid gratuitous changes that might conflict with third-party applications. Alternatively, the lack of transparency might be because the Kindle offers less functionality than (say) a PC. Less functionality means fewer security risks, so customers don’t need as much information to protect themselves.

Going forward, Amazon will face more pressure to be transparent about the Kindle technology and the company’s relationship with Kindle buyers. It seems that e-books really are more complicated than dead-tree books.


Thoughtcrime Experiments

Cosmic rays can flip bits in memory cells or processor datapaths. Once upon a time, Sudhakar and I asked the question, “can an attacker exploit rare and random bit-flips to bypass a programming-language’s type protections and thereby break out of the Java sandbox?

Thoughtcrime Experiments

A recently published science-fiction anthology Thoughtcrime Experiments contains a story, “Single-Bit Error” inspired by our research paper. What if you could use cosmic-ray bit flips in neurons to bypass the “type protections” of human rationality?

In addition to 9 stories and 6 original illustrations, the anthology is interesting for another reason. It’s an experiment in do-it-yourself paying-the-artists high-editorial-standards open-source Creative-Commons print-on-demand publishing. Theorists like Yochai Benkler and others have explained that production costs attributable to communications and coordination have been reduced down into the noise by the Internet, and that this enables “peer production” that was not possible back in the 19th and 20th centuries. Now the Appendix to Thoughtcrime Experiments explains how to edit and produce your own anthology, complete with a sample publication contract.

It’s not all honey and roses, of course. The authors got paid, but the editors didn’t! The Appendix presents data on how many hours they spent “for free”. In addition, if you look closely, you’ll see that the way the authors got paid is that the editors spent their own money.

Still, part of the new theory of open-source peer-production asks questions like, “What motivates people to produce technical or artistic works? What mechanisms do they use to organize this work? What is the quality of the work produced, and how does it contribute to society? What are the legal frameworks that will encourage such work?” This anthology and its appendix provide an interesting datapoint for the theorists.


The future of high school yearbooks

The Dallas Morning News recently ran a piece about how kids these days aren’t interested in buying physical, printed yearbooks. (Hat tip to my high school’s journalism teacher, who linked to it from our journalism alumni Facebook group.) Why spend $60 on a dead-trees yearbook when you can get everything you need on Facebook? My 20th high school reunion is coming up this fall, and I was the “head” photographer for my high school’s yearbook and newspaper, so this is a topic near and dear to my heart.

Let’s break down everything that a yearbook actually is and then think about how these features can and cannot be replicated in the digital world. A yearbook has:

  • higher-than-normal photographic quality (yearbook photographers hopefully own better camera equipment and know how to use their gear properly)
  • editors who do all kinds of useful things (sending photographers to events they want covered, selecting the best pictures for publication, captioning them, and indexing the people in them)
  • a physical artifact that people can pass around to their friends to mark up and personalize, and which will still be around years later

If you get rid of the physical yearbook, you’ve got all kinds of issues. Permanence is the big one. There’s nothing that my high school can do to delete my yearbook after it’s been published. Conversely, if high schools host their yearbooks on school-owned equipment, then they can and will fail over time. (Yes, I know you could run a crawler and make a copy, but I wouldn’t trust a typical high school’s IT department to build a site that will be around decades later.) To pick one example, my high school’s web site, when it first went online, had a nice alumni registry. Within a few years, it unceremoniously went away without warning.

Okay, what about Facebook? At this point, almost a third of my graduating class is on Facebook, and I’m sure the numbers are much higher for more recent classes. Some of my classmates are digging up old pictures, posting them, and tagging each other. With social networking as part of the yearbook process from the start, you can get some serious traction in replacing physical yearbooks. Yearbook editors and photography staff can still cover events, select good pictures, caption them, and index them. The social networking aspect covers some of the personalization and markup that we got by writing in each others’ yearbooks. That’s fun, but please somebody convince me that Facebook will be here ten or twenty years from now. Any business that doesn’t make money will eventually go out of business, and Facebook is no exception.

Aside from the permanence issue, is anything else lost by going to a Web 2.0 social networking non-printed yearbook? Censorship-happy high schools (and we all know what a problem that can be) will never allow a social network site that they control to have students’ genuine expressions of their distaste for all the things that rebellious youth like to complain about. Never mind that the school has a responsibility to maintain some measure of student privacy. Consequently, no high school would endorse the use of a social network that they couldn’t control and censor. I’m sure several of the people who wrote in my yearbook could have gotten in trouble if the things they wrote there were to have been raised before the school administration, yet those comments are the best part of my yearbook. Nothing takes you back quite as much as off-color commentary.

One significant lever that high school yearbooks have, which commercial publications like newspapers generally lack, is that they’re non-profit. If the yearbook financially breaks even, they’re doing a good job. (And, in the digital universe, the costs are perhaps lower. I personally shot hundreds of rolls of black&white film, processed them, and printed them, and we had many more photographers on our staff. My high school paid for all the film, paper, and photo-chemistry that we used. Now they just need computers, although those aren’t exactly cheap, either.) So what if they don’t print so many physical yearbooks? Sure, the yearbook staff can do a short, vanity press run, so they can enter competitions and maybe win something, but otherwise they can put out a PDF or pickle the bowdlerized social network’s contents down to a DVD-ROM and call it a day. That hopefully creates enough permanence. What about uncensored commentary? That’s probably going to have to happen outside of the yearbook context. Any high school student can sign up for a webmail account and keep all their email for years to come. (Unlike Facebook, the webmail companies seem to be making money.) Similarly, the ubiquity of digital point-and-shoot cameras ensures that students will have uncensored, personal, off-color memories.

[Sidebar: There’s a reality show on TV called “High School Reunion.” Last year, they reunited some people from my school’s class of 1987. I was in the class of 1989. Prior to the show airing, I was contacted by one of the producers, wanting to use some of my photographs in the show. She sent me a waiver that basically had me indemnifying them for their use of my work; of course, they weren’t offering to pay me anything. Really? No thanks. One of the interesting questions was whether my photos were even “my property” to which I could even give them permission to use. There were no contracts of any kind when I signed up to work on the yearbook. You could argue that the school retains an interest in the pictures, never mind the original subjects from whom we never got model releases. Our final contract said, in effect, that I represented that I took the pictures and had no problem with them using them, but I made no claims as to ownership, and they indemnified me against any issues that might arise.

Question for the legal minds here: I have three binders full of negatives from my high school years. I could well invest a week of my time, borrow a good scanner, and get the whole collection online and post it online, either on my own web site or on Facebook. Should I? Am I opening myself to legal liability?]


Is the New York Times a Confused Company?

Over lunch I did something old-fashioned—I picked up and read a print copy of the New York Times. I was startled to find, on the front of the business section, a large, colorfully decorated feature headlined “Is Google a Media Company?” The graphic accompanying the story shows a newspaper masthead titled “Google Today,” followed by a list of current and imagined future offerings, from Google Maps and Google Earth to Google Drink and Google Pancake. Citing the new, wikipedia-esque service Knol, and using the example of that service’s wonderful entry on buttermilk pancakes, the Times story argues that Knol’s launch has “rekindled fears among some media companies that Google is increasingly becoming a competitor. They foresee Google’s becoming a powerful rival that not only owns a growing number of content properties, including YouTube, the top online video site, and Blogger, a leading blogging service, but also holds the keys to directing users around the Web.”

I hope the Times’s internal business staff is better grounded than its reporters and editors appear to be—otherwise, the Times is in even deeper trouble than its flagging performance suggests. Google isn’t becoming a media company—it is one now and always has been. From the beginning, it has sold the same thing that the Times and other media outlets do: Audiences. Unlike the traditional media outlets, though, online media firms like Google and Yahoo have decoupled content production from audience sales. Whether selling ads alongside search results, or alongside user-generated content on Knol or YouTube, or displaying ads on a third party blog or even a traditional media web site, Google acts as a broker, selling audiences that others have worked to attract. In so doing, they’ve thrown the competition for ad dollars wide open, allowing any blog to sap revenue (proportionately to audience share) from the big guys. The whole infrastructure is self-service and scales down to be economical for any publisher, no matter how small. It’s a far cry from an advertising marketplace that relies, as the newspaper business traditionally has, on human add sales. In the new environment, it’s a buyer’s market for audiences, and nobody is likely to make the kinds of killings that newspapers once did. As I’ve argued before, the worrying and plausible future for high-cost outlets like the Times is a death of a thousand cuts as revenues get fractured among content sources.

One might argue that sites like Knol or Blogger are a competitive threat to established media outlets because they draw users away from those outlets. But Google’s decision to add these sites hurts its media partners only to the (small) extent that the new sites increase the total amount of competing ad inventory on the web—that is, the supply of people-reading-things to whom advertisements can be displayed. To top it all off, Knol lets authors, including any participating old-media producers, capture revenue from the eyeballs they draw. The revenues in settings like these are slimmer because they are shared with Google, as opposed to being sold directly by or some other establishment media outlet. But it’s hard to judge whether the Knol reimbursement would be higher or lower than the equivalent payment if an ad were displayed on the established outlet’s site, since Google does not disclose the fraction of ad revenue in shares with publishers in either case. But the addition of one more user-generated content site, whether from Google or anyone else, is at most a footnote to the media industry trend: Google’s revenues come from ads, and that makes it a media company, pure and simple.


Newspapers' Problem: Trouble Targeting Ads

Richard Posner has written a characteristically thoughtful blog entry about the uncertain future of newspapers. He renders widespread journalistic concern about the unwieldy character of newspapers into the crisp economic language of “bundling”:

Bundling is efficient if the cost to the consumer of the bundled products that he doesn’t want is less than the cost saving from bundling. A particular newspaper reader might want just the sports section and the classified ads, but if for example delivery costs are high, the price of separate sports and classified-ad “newspapers” might exceed that of a newspaper that contained both those and other sections as well, even though this reader was not interested in the other sections.

With the Internet’s dramatic reductions in distribution costs, the gains from bundling are decreased, and readers are less likely to prefer bundled products. I agree with Posner that this is an important insight about the behavior of readers, but would argue that reader behavior is only a secondary problem for newspapers. The product that newspaper publishers sell—the dominant source of their revenues—is not newspapers, but audiences.

Toward the end of his post, Posner acknowledges that papers have trouble selling ads because it has gotten easier to reach niche audiences. That seems to me to be the real story: Even if newspapers had undiminished audiences today, they’d still be struggling because, on a per capita basis, they are a much clumsier way of reaching readers. There are some populations, such as the elderly and people who are too poor to get online, who may be reachable through newspapers and unreachable through online ads. But the fact that today’s elderly are disproportionately offline is an artifact of the Internet’s novelty (they didn’t grow up with it), not a persistent feature of the marektplace. Posner acknoweldges that the preference of today’s young for online sources “will not change as they get older,” but goes on to suggest incongruously that printed papers might plausibly survive as “a retirement service, like Elderhostel.” I’m currently 26, and if I make it to 80, I very strongly doubt I’ll be subscribing to printed papers. More to the point, my increasing age over time doesn’t imply a growing preference for print; if anything, age is anticorrelated with change in one’s daily habits.

As for the claim that poor or disadvantaged communities are more easily reached offline than on, it still faces the objection that television is a much more efficient way of reaching large audiences than newsprint. There’s also the question of how much revenue can realistically be generated by building an audience of people defined by their relatively low level of purchasing power. If newsprint does survive at all, I might expect to see it as a nonprofit service directed at the least advantaged. Then again, if C. K. Prahalad is correct that businesses have neglected a “fortune at the bottom of the pyramid” that can be gathered by aggregating the small purchases of large numbers of poor people, we may yet see papers survive in the developing world. The greater relative importance of cell phones there, as opposed to larger screens, could augur favorably for the survival of newsprint. But phones in the developing world are advancing quickly, and may yet emerge as a better-than-newsprint way of reading the news.


Live Webcast: Future of News, May 14-15

We’re going to do a live webcast of our workshop on “The Future of News“, which will be held tomorrow and Thursday (May 14-15) in Princeton. Attending the workshop (free registration) gives you access to the speakers and other attendees over lunch and between sessions, but if that isn’t practical, the webcast is available.

Here are the links you need:

  • Live video streaming
  • Live chat facility for remote participants
  • To ask the speaker a question, email

Sessions are scheduled for 10:45-noon and 1:30-5:00 on Wed., May 14; and 9:30-12:30 and 1:30-3:15 on Thur., May 15.


Future of News Workshop, May 14-15 in Princeton

We’ve got a great lineup of speakers for our upcoming “Future of News” workshop. It’s May 14-15 in Princeton. It’s free, and if you register we’ll feed you lunch.


Wednesday, May 14, 2008

9:30 – 10:45 Registration
10:45 – 11:00 Welcoming Remarks
11:00 – 12:00 Keynote talk by Paul Starr
12:00 – 1:30 Lunch, Convocation Room
1:30 – 3:00 Panel 1: The People Formerly Known as the Audience
3:00 – 3:30 Break
3:30 – 5:00 Panel 2: Economics of News
5:00 – 6:00 Reception

Thursday, May 15, 2008

8:15 – 9:30 Continental Breakfast
9:30 – 10:30 Featured talk by David Robinson
10:30 – 11:00 Break
11:00 – 12:30 Panel 3: Data Mining, Interactivity and Visualization
12:30 – 1:30 Lunch, Convocation Room
1:30 – 3:00 Panel 4: The Medium’s New Message
3:00 – 3:15 Closing Remarks


Panel 1: The People Formerly Known as the Audience:

How effectively can users collectively create and filter the stream of news information? How much of journalism can or will be “devolved” from professionals to networks of amateurs? What new challenges do these collective modes of news production create? Could informal flows of information in online social networks challenge the idea of “news” as we know it?

Panel 2: Economics of News:

How will technology-driven changes in advertising markets reshape the news media landscape? Can traditional, high-cost methods of newsgathering support themselves through other means? To what extent will action-guiding business intelligence and other “private journalism”, designed to create information asymmetries among news consumers, supplant or merge with globally accessible news?

  • Gordon Crovitz, former publisher, The Wall Street Journal
  • Mark Davis, Vice President for Strategy, San Diego Union Tribune
  • Eric Alterman, Distinguished Professor of English, Brooklyn College, City University of New York, and Professor of Journalism at the CUNY Graduate School of Journalism

Panel 3: Data Mining, Visualization, and Interactivity:

To what extent will new tools for visualizing and artfully presenting large data sets reduce the need for human intermediaries between facts and news consumers? How can news be presented via simulation and interactive tools? What new kinds of questions can professional journalists ask and answer using digital technologies?

Panel 4: The Medium’s New Message:

What are the effects of changing news consumption on political behavior? What does a public life populated by social media “producers” look like? How will people cope with the new information glut?

  • Clay Shirky, Adjunct Professor at NYU and author of Here Comes Everybody: The Power of Organizing Without Organizations.
  • Markus Prior, Assistant Professor of Politics and Public Affairs in the Woodrow Wilson School and the Department of Politics at Princeton University.
  • JD Lasica, writer and consultant, co-founder and editorial director of, president of the Social Media Group.

Panelists’ bios.

For more information, including (free) registration, see the main workshop page.


Online Symposium: Future of Scholarly Communication

Today we’re kicking off an online symposium on The Future of Scholarly Communication, run by the Center for Information Technology Policy at Princeton. An “online symposium” is a kind of short-term group blog, focusing on a specific topic. Panelists (besides me) include Ira Fuchs, Paul DiMaggio, Peter Suber, Stan Katz, and David Robinson. (See the symposium site for more information on the panelists.)

I started the symposium with an “introductory post. Peter Suber has already chimed in, and we’re looking forward to contributions from the other panelists.

We’ll be running more online symposia on various topics in the future, so this might be a good time to bookmark the symposium site, or subscribe to its RSS feed.


Judge Strikes Down COPA

Last week a Federal judge struck down COPA, a law requiring adult websites to use age verification technology. The ruling by Senior Judge Lowell A. Reed Jr. held COPA unconstitutional because it is more restrictive of speech (but no more effective) than the alternative of allowing private parties to use filtering software.

This is the end of a long legal process that started with the passage of COPA in 1999. The ACLU, along with various authors and publishers, immediately filed suit challenging COPA, and Judge Reed struck down the law. The case was appealed up to the Supreme Court, which generally supported Judge Reed’s ruling but remanded the case back to him for further proceedings because enough time had passed that the technological facts might have changed. Judge Reed held another trial last fall, at which I testified. Now he has ruled, again, that COPA is unconstitutional.

The policy issue behind COPA is how to keep kids from seeing harmful-to-minors (HTM) material. Some speech is legally obscene, which means it is so icky that it does not qualify for First Amendment free speech protection. HTM material is not obscene – adults have a legally protected right to read it – but is icky enough that kids don’t have a right to see it. In other words, there is a First Amendment right to transmit HTM material to adults but not to kids.

Congress has tried more than once to pass laws keeping kids away from HTM material online. The first attempt, the Communications Decency Act (CDA), was struck down by the Supreme Court in 1997. When Congress responded by passing COPA in 1999, it used the Court’s CDA ruling as a roadmap in writing the new law, in the hope that doing so would make COPA consistent with free speech.

Unlike the previous CDA ruling, Judge Reed’s new COPA ruling doesn’t seem to give Congress a roadmap for creating a new statute that would pass constitutional muster. COPA required sites publishing HTM material to use age screening technology to try to keep kids out. The judge compared COPA’s approach to an alternative in which individual computer owners had the option of using content filtering software. He found that COPA’s approach was more restrictive of protected speech and less effective in keeping kids away from HTM material. That was enough to make COPA, as a content-based restriction on speech, unconstitutional.

Two things make the judge’s ruling relatively roadmap-free. First, it is based heavily on factual findings that Congress cannot change – things like the relative effectiveness of filtering and the amount of HTM material that originates overseas beyond the effective reach of U.S. law. (Filtering operates on all material, while COPA’s requirements could have been ignored by many overseas sites.) Second, the alternative it offers requires only voluntary private action, not legislation.

Congress has already passed laws requiring schools and libraries to use content filters, as a condition of getting Federal funding and with certain safeguards that are supposed to protect adult access. The courts have upheld such laws. It’s not clear what more Congress can do. Judge Reed’s filtering alternative is less restrictive because it is voluntary, so that computers that aren’t used by kids, or on which parents have other ways of protecting kids against HTM material, can get unfiltered access. An adult who wants to get HTM material will be able to get it.

Doubtless Congress will make noise about this issue in the upcoming election year. Protecting kids from the nasty Internet is too attractive politically to pass up. Expect hearings to be held and bills to be introduced; but the odds that we’ll get a new law that makes much difference seem pretty low.