November 21, 2024

Rebooting the CS Publication Process

The job of an academic is to conduct research, and that means publishing manuscripts for the world to read. Computer science is somewhat unusual, among the other disciplines in science and engineering, in that our primary research output goes to highly competitive conferences rather than journals. Acceptance rates at the “top” conferences are often 15% or lower, and the process of accepting those papers and rejecting the rest is famously problematic, particularly for the papers on the bubble.

Consequently, a number of computer scientists have been writing about making changes to the way we do what we do. Some changes may be fairly modest, like increasing acceptance rates by fiat, and eliminating printed paper proceedings to save costs. Other changes would be more invasive and require more coordination.

If we wanted to make a concerted effort to really overhaul the process, what would we do? If we can legitimately concern ourselves with “clean slate” redesign of the Internet as an academic discipline, why not look at our own processes in the same light? I raised this during the rump session of the last HotOS Workshop and it seemed to really get the room talking. The discipline of computer science is clearly ready to have this discussion.

Over the past few months, I’ve been working on and off to flesh out how a clean-slate publishing process might work, taking advantage of our ability to build sophisticated tools to manage the process, and including a story for how we might get from here to there. I’ve written this up as a manuscript and I’d like to invite our blog readers, academic or otherwise, to read it over and offer their feedback. At some point, I’ll probably compress this down to fit the tight word limit of a CACM article, but first things first.

Have a look. Post your feedback here on Freedom to Tinker or send me an email and I’ll followup, no doubt with a newer draft of my manuscript.

Developing Texts Like We Develop Software

Recently I was asked to speak at a conference for university librarians, about how the future of academic publication looks to me as a computer scientist. It’s an interesting question. What do computer scientists have to teach humanists about how to write? Surely not our elegant prose style.

There is something distinctive about how computer scientists write: we tend to use software development tools to “develop” our texts. This seems natural to us. A software program, after all, is just a big text, and the software developers are the authors of the text. If a tool is good for developing the large, complex, finicky text that is a program, why not use it for more traditional texts as well?

Like software developers, computer scientist writers tend to use version control systems. These are software tools that track and manage different versions of a text. What makes them valuable is not just the ability to “roll back” to old versions — you can get that (albeit awkwardly) by keeping multiple copies of a file. The big win with version control tools is the level of control they give you. Who wrote this line? What did Joe write last Tuesday? Notify me every time section 4 changes. Undo the changes Fred made last Wednesday, but leave all subsequent changes in place. And so on. Version control systems are a much more powerful relative of the “track changes” and “review” features of standard word processors.

Another big advantage of advanced version control is that it enables parallel development, a style of operation in which multiple people can work on the text, separately, at the same time. Of course, it’s easy to work in parallel. What’s hard is to merge the parallel changes into a coherent final product — which is a huge pain in the neck with traditional editing tools, but is easy and natural with a good version control system. Parallel development lets you turn out a high-quality product faster — it’s a necessity when you have hundred or thousands of programmers working on the same product — and it vastly reduces the amount of human effort spent on coordination. You still need coordination, of course, but you can focus it where it matters, on the conceptual clarity of the document, without getting distracted by version-wrangling.

Interestingly, version control and parallel development turn out to be useful even for single-author works. Version control lets you undo your mistakes, and to reconstruct the history of a problematic section. Parallel development is useful if you want to try an experiment — what happens if I swap sections 3 and 4? — and try out this new approach for a while yet retain the ability to accept or reject the experiment as a whole. These tools are so useful that experienced computer scientists tend to use them to write almost anything longer than a blog post.

While version control and parallel development have become standard in computer science writing, there are other software development practices that are only starting to cross the line into CS writing: issue tracking and the release early and often strategy.

Issue tracking systems are used to keep track of problems, bugs, and other issues that need to be addressed in a text. As with version control, you can do this manually, or rely on a simple to-do list, but specialized tools are more powerful and give you better control and better visibility into the past. As with software, issues can range from small problems (our terminology for X is confusing) to larger challenges (it would be nice if our dataset were bigger).

“Release early and often” is a strategy for rapidly improving a text by making it available to users (or readers), getting feedback, and rapidly turning out a new version that addresses the feedback. Users’ critiques become issues in the issue tracking system; authors modify the text to address the most urgent issues; and a new version is released as soon as the text stabilizes. The result is rapid improvement, aligned with the true desires of users. This approach requires the right attitude from users, who need to be willing to tolerate problems, in exchange for a promise that their critiques will be addressed promptly.

What does all of this mean for writers who are not computer scientists? I won’t be so bold as to say that the future of writing will be just exactly like software development. But I do think that the tools and techniques of software development, which are already widely used by computer scientist writers, will diffuse into more common usage. It will be hard to retrofit them into today’s large, well-established editing software, but as writing tools move into the cloud, I wouldn’t be surprised to see them take on more of the attributes of today’s software development tools.

One consequence of using these tools is that you end up with a fairly complete record of how the text developed over time, and why. Imagine having a record like that for the great works of the past. We could know what the writer did every hour, every day while writing. We could know which issues and problems the author perceived in earlier versions of the text, and how these were addressed. We could know which issues the author saw as still unfixed in the final published text. This kind of visibility will be available into our future writing — assuming we produce works that are worthy of study.

Lessons from Amazon's 1984 Moment

Amazon got some well-deserved criticism for yanking copies of Orwell’s 1984 from customers’ Kindles last week. Let me spare you the copycat criticism of Amazon — and the obvious 1984-themed jokes — and jump right to the most interesting question: What does this incident teach us?

Human error was clearly part of the problem. Somebody at Amazon decided that repossessing purchased copies of 1984 would be a good idea. They were wrong about this, as both the public reaction and the company’s later backtracking confirm. But the fault lies not just with the decision-maker, but also with the factors that made the decision more likely, including some aspects of the technology itself.

Some put the blame on DRM, but that’s not the problem here. Even if the Kindle used open formats and let you export and back up your books, Amazon could still have made 1984 disappear from your Kindle. Yes, some users might have had backups of 1984 stored elsewhere, but most users would have lost their only copy.

Some blame cloud computing, but that’s not precisely right either. The Kindle isn’t really a cloud device — the primary storage, computing and user interface for your purchased books are provided by your own local Kindle device, not by some server at Amazon. You can disconnect your Kindle from the network forever (by flipping off the wireless network switch on the back), and it will work just fine.

Some blame the fact that Amazon controls everything about the Kindle’s software, which is a better argument but still not quite right. Most PCs are controlled by a single company, in the sense that that company (Microsoft or Apple) can make arbitrary changes to the software on the PC, including (in principle) deleting files or forcibly removing software programs.

The problem, more than anything else, is a lack of transparency. If customers had known that this sort of thing were possible, they would have spoken up against it — but Amazon had not disclosed it and generally does offer clear descriptions of how the product works or what kinds of control the company retains over users’ devices.

Why has Amazon been less transparent than other vendors? I’m not sure, but let me offer two conjectures. It might be because Amazon controls the whole system. Systems that can run third-party software have to be more open, in the sense that they have to tell the third-party developers how the system works, and they face some pressure to avoid gratuitous changes that might conflict with third-party applications. Alternatively, the lack of transparency might be because the Kindle offers less functionality than (say) a PC. Less functionality means fewer security risks, so customers don’t need as much information to protect themselves.

Going forward, Amazon will face more pressure to be transparent about the Kindle technology and the company’s relationship with Kindle buyers. It seems that e-books really are more complicated than dead-tree books.

Thoughtcrime Experiments

Cosmic rays can flip bits in memory cells or processor datapaths. Once upon a time, Sudhakar and I asked the question, “can an attacker exploit rare and random bit-flips to bypass a programming-language’s type protections and thereby break out of the Java sandbox?

Thoughtcrime Experiments

A recently published science-fiction anthology Thoughtcrime Experiments contains a story, “Single-Bit Error” inspired by our research paper. What if you could use cosmic-ray bit flips in neurons to bypass the “type protections” of human rationality?

In addition to 9 stories and 6 original illustrations, the anthology is interesting for another reason. It’s an experiment in do-it-yourself paying-the-artists high-editorial-standards open-source Creative-Commons print-on-demand publishing. Theorists like Yochai Benkler and others have explained that production costs attributable to communications and coordination have been reduced down into the noise by the Internet, and that this enables “peer production” that was not possible back in the 19th and 20th centuries. Now the Appendix to Thoughtcrime Experiments explains how to edit and produce your own anthology, complete with a sample publication contract.

It’s not all honey and roses, of course. The authors got paid, but the editors didn’t! The Appendix presents data on how many hours they spent “for free”. In addition, if you look closely, you’ll see that the way the authors got paid is that the editors spent their own money.

Still, part of the new theory of open-source peer-production asks questions like, “What motivates people to produce technical or artistic works? What mechanisms do they use to organize this work? What is the quality of the work produced, and how does it contribute to society? What are the legal frameworks that will encourage such work?” This anthology and its appendix provide an interesting datapoint for the theorists.

The future of high school yearbooks

The Dallas Morning News recently ran a piece about how kids these days aren’t interested in buying physical, printed yearbooks. (Hat tip to my high school’s journalism teacher, who linked to it from our journalism alumni Facebook group.) Why spend $60 on a dead-trees yearbook when you can get everything you need on Facebook? My 20th high school reunion is coming up this fall, and I was the “head” photographer for my high school’s yearbook and newspaper, so this is a topic near and dear to my heart.

Let’s break down everything that a yearbook actually is and then think about how these features can and cannot be replicated in the digital world. A yearbook has:

  • higher-than-normal photographic quality (yearbook photographers hopefully own better camera equipment and know how to use their gear properly)
  • editors who do all kinds of useful things (sending photographers to events they want covered, selecting the best pictures for publication, captioning them, and indexing the people in them)
  • a physical artifact that people can pass around to their friends to mark up and personalize, and which will still be around years later

If you get rid of the physical yearbook, you’ve got all kinds of issues. Permanence is the big one. There’s nothing that my high school can do to delete my yearbook after it’s been published. Conversely, if high schools host their yearbooks on school-owned equipment, then they can and will fail over time. (Yes, I know you could run a crawler and make a copy, but I wouldn’t trust a typical high school’s IT department to build a site that will be around decades later.) To pick one example, my high school’s web site, when it first went online, had a nice alumni registry. Within a few years, it unceremoniously went away without warning.

Okay, what about Facebook? At this point, almost a third of my graduating class is on Facebook, and I’m sure the numbers are much higher for more recent classes. Some of my classmates are digging up old pictures, posting them, and tagging each other. With social networking as part of the yearbook process from the start, you can get some serious traction in replacing physical yearbooks. Yearbook editors and photography staff can still cover events, select good pictures, caption them, and index them. The social networking aspect covers some of the personalization and markup that we got by writing in each others’ yearbooks. That’s fun, but please somebody convince me that Facebook will be here ten or twenty years from now. Any business that doesn’t make money will eventually go out of business, and Facebook is no exception.

Aside from the permanence issue, is anything else lost by going to a Web 2.0 social networking non-printed yearbook? Censorship-happy high schools (and we all know what a problem that can be) will never allow a social network site that they control to have students’ genuine expressions of their distaste for all the things that rebellious youth like to complain about. Never mind that the school has a responsibility to maintain some measure of student privacy. Consequently, no high school would endorse the use of a social network that they couldn’t control and censor. I’m sure several of the people who wrote in my yearbook could have gotten in trouble if the things they wrote there were to have been raised before the school administration, yet those comments are the best part of my yearbook. Nothing takes you back quite as much as off-color commentary.

One significant lever that high school yearbooks have, which commercial publications like newspapers generally lack, is that they’re non-profit. If the yearbook financially breaks even, they’re doing a good job. (And, in the digital universe, the costs are perhaps lower. I personally shot hundreds of rolls of black&white film, processed them, and printed them, and we had many more photographers on our staff. My high school paid for all the film, paper, and photo-chemistry that we used. Now they just need computers, although those aren’t exactly cheap, either.) So what if they don’t print so many physical yearbooks? Sure, the yearbook staff can do a short, vanity press run, so they can enter competitions and maybe win something, but otherwise they can put out a PDF or pickle the bowdlerized social network’s contents down to a DVD-ROM and call it a day. That hopefully creates enough permanence. What about uncensored commentary? That’s probably going to have to happen outside of the yearbook context. Any high school student can sign up for a webmail account and keep all their email for years to come. (Unlike Facebook, the webmail companies seem to be making money.) Similarly, the ubiquity of digital point-and-shoot cameras ensures that students will have uncensored, personal, off-color memories.

[Sidebar: There’s a reality show on TV called “High School Reunion.” Last year, they reunited some people from my school’s class of 1987. I was in the class of 1989. Prior to the show airing, I was contacted by one of the producers, wanting to use some of my photographs in the show. She sent me a waiver that basically had me indemnifying them for their use of my work; of course, they weren’t offering to pay me anything. Really? No thanks. One of the interesting questions was whether my photos were even “my property” to which I could even give them permission to use. There were no contracts of any kind when I signed up to work on the yearbook. You could argue that the school retains an interest in the pictures, never mind the original subjects from whom we never got model releases. Our final contract said, in effect, that I represented that I took the pictures and had no problem with them using them, but I made no claims as to ownership, and they indemnified me against any issues that might arise.

Question for the legal minds here: I have three binders full of negatives from my high school years. I could well invest a week of my time, borrow a good scanner, and get the whole collection online and post it online, either on my own web site or on Facebook. Should I? Am I opening myself to legal liability?]