October 23, 2017

Archives for May 2010

Developing Texts Like We Develop Software

Recently I was asked to speak at a conference for university librarians, about how the future of academic publication looks to me as a computer scientist. It’s an interesting question. What do computer scientists have to teach humanists about how to write? Surely not our elegant prose style.

There is something distinctive about how computer scientists write: we tend to use software development tools to “develop” our texts. This seems natural to us. A software program, after all, is just a big text, and the software developers are the authors of the text. If a tool is good for developing the large, complex, finicky text that is a program, why not use it for more traditional texts as well?

Like software developers, computer scientist writers tend to use version control systems. These are software tools that track and manage different versions of a text. What makes them valuable is not just the ability to “roll back” to old versions — you can get that (albeit awkwardly) by keeping multiple copies of a file. The big win with version control tools is the level of control they give you. Who wrote this line? What did Joe write last Tuesday? Notify me every time section 4 changes. Undo the changes Fred made last Wednesday, but leave all subsequent changes in place. And so on. Version control systems are a much more powerful relative of the “track changes” and “review” features of standard word processors.

Another big advantage of advanced version control is that it enables parallel development, a style of operation in which multiple people can work on the text, separately, at the same time. Of course, it’s easy to work in parallel. What’s hard is to merge the parallel changes into a coherent final product — which is a huge pain in the neck with traditional editing tools, but is easy and natural with a good version control system. Parallel development lets you turn out a high-quality product faster — it’s a necessity when you have hundred or thousands of programmers working on the same product — and it vastly reduces the amount of human effort spent on coordination. You still need coordination, of course, but you can focus it where it matters, on the conceptual clarity of the document, without getting distracted by version-wrangling.

Interestingly, version control and parallel development turn out to be useful even for single-author works. Version control lets you undo your mistakes, and to reconstruct the history of a problematic section. Parallel development is useful if you want to try an experiment — what happens if I swap sections 3 and 4? — and try out this new approach for a while yet retain the ability to accept or reject the experiment as a whole. These tools are so useful that experienced computer scientists tend to use them to write almost anything longer than a blog post.

While version control and parallel development have become standard in computer science writing, there are other software development practices that are only starting to cross the line into CS writing: issue tracking and the release early and often strategy.

Issue tracking systems are used to keep track of problems, bugs, and other issues that need to be addressed in a text. As with version control, you can do this manually, or rely on a simple to-do list, but specialized tools are more powerful and give you better control and better visibility into the past. As with software, issues can range from small problems (our terminology for X is confusing) to larger challenges (it would be nice if our dataset were bigger).

“Release early and often” is a strategy for rapidly improving a text by making it available to users (or readers), getting feedback, and rapidly turning out a new version that addresses the feedback. Users’ critiques become issues in the issue tracking system; authors modify the text to address the most urgent issues; and a new version is released as soon as the text stabilizes. The result is rapid improvement, aligned with the true desires of users. This approach requires the right attitude from users, who need to be willing to tolerate problems, in exchange for a promise that their critiques will be addressed promptly.

What does all of this mean for writers who are not computer scientists? I won’t be so bold as to say that the future of writing will be just exactly like software development. But I do think that the tools and techniques of software development, which are already widely used by computer scientist writers, will diffuse into more common usage. It will be hard to retrofit them into today’s large, well-established editing software, but as writing tools move into the cloud, I wouldn’t be surprised to see them take on more of the attributes of today’s software development tools.

One consequence of using these tools is that you end up with a fairly complete record of how the text developed over time, and why. Imagine having a record like that for the great works of the past. We could know what the writer did every hour, every day while writing. We could know which issues and problems the author perceived in earlier versions of the text, and how these were addressed. We could know which issues the author saw as still unfixed in the final published text. This kind of visibility will be available into our future writing — assuming we produce works that are worthy of study.