April 21, 2014

avatar

Rebooting the CS Publication Process

The job of an academic is to conduct research, and that means publishing manuscripts for the world to read. Computer science is somewhat unusual, among the other disciplines in science and engineering, in that our primary research output goes to highly competitive conferences rather than journals. Acceptance rates at the “top” conferences are often 15% or lower, and the process of accepting those papers and rejecting the rest is famously problematic, particularly for the papers on the bubble.

Consequently, a number of computer scientists have been writing about making changes to the way we do what we do. Some changes may be fairly modest, like increasing acceptance rates by fiat, and eliminating printed paper proceedings to save costs. Other changes would be more invasive and require more coordination.

If we wanted to make a concerted effort to really overhaul the process, what would we do? If we can legitimately concern ourselves with “clean slate” redesign of the Internet as an academic discipline, why not look at our own processes in the same light? I raised this during the rump session of the last HotOS Workshop and it seemed to really get the room talking. The discipline of computer science is clearly ready to have this discussion.

Over the past few months, I’ve been working on and off to flesh out how a clean-slate publishing process might work, taking advantage of our ability to build sophisticated tools to manage the process, and including a story for how we might get from here to there. I’ve written this up as a manuscript and I’d like to invite our blog readers, academic or otherwise, to read it over and offer their feedback. At some point, I’ll probably compress this down to fit the tight word limit of a CACM article, but first things first.

Have a look. Post your feedback here on Freedom to Tinker or send me an email and I’ll followup, no doubt with a newer draft of my manuscript.

Comments

  1. Michael says:

    I’ve seen you post on this issue before, and I am glad to see that you are doing something to address the issue. However, there is a related problem that should be considered, as well: post-Ph.D. employment.

    Looking at the Taulbee Survey (http://www.cra.org/resources/taulbee/), you can see that, in 2007-2008, the U.S. and Canada produced 1877 new Ph.D.s. In that same time frame, there were only 140 new tenure-track faculty positions at research universities. To make matters worse, a number of people are taking positions at research labs solely to bide time until they apply again. So there is a back-log of people trying for these positions.

    This clearly has an impact on the publication problem. The field is so competitive that you only have a chance at these lucrative spots if you’ve got 10 or more conference publications as a grad student. But does trying to publish that much really encourage good research?

    I’ve got one paper that was recently rejected for the fourth time. Nearly every time I’ve submitted it, the reviews came back as three “weak accepts.” None of the reviews has pointed out any major flaws with the work. And yet, it has gotten repeatedly rejected.

    I’ve long since given up on the idea of going to a research university. It’s not that I don’t like research. It’s not that I don’t care about security. I actually have a very large number of what I think are good ideas. I think, given enough time and cultivation, these ideas could eventually have a real impact. But I don’t have a chance at even applying with my publication record.

    At this point, I’m just hoping that I can get a job at a teaching college, preferably teaching material that is more complex than Visual Basic macros. Sour grapes? Perhaps.

    • dwallach says:

      Market forces work against you, and many others just like you. I’d like to reengineer the incentives such that you need to produce maybe two blockbusters rather than ten whatever papers. If you and everybody else could focus your energy on those instead, then maybe you’d come out better.

      On the flip side, CS is catching up to the rest of academia in terms of how much it sucks if you want an academic job. In much of the rest of science, you go through the hell of years of post-doc work before you have a hope at a faculty position. In the humanities, there will be hundreds of people chasing after literally only a handful of open slots, nationwide. At least in CS there’s a large industry that’s eager to hire talented PhD graduates. Many students who could do very well on the academic job market instead go do startups, go work for large firms like Google or Microsoft, or jump to other fields eager to hire them like Wall Street. If it weren’t for that, your odds of getting an academic job would be even worse.

      • Michael says:

        I didn’t mention it before, but I’m glad you’re persisting with this topic. I’ve read earlier posts (well, maybe it was only one…) by you along the same lines. And I agree with you that it could certainly be worse.

        The availability of non-academic jobs is actually a blessing and a curse. CS is unique, in that you can get a job that pays well and offers great advancement opportunities with just a bachelor’s. In many (most?) of the sciences, you have to have a graduate degree to do serious work in the field. The humanities…well, I don’t see a lot of market demand for philosophers. But in CS, you can get a B.S., go to work for the right company, and end up working on challenging, interesting work…which is what I did.

        Like many others, I came back to school to get my Ph.D. because I wanted to teach. If I had just wanted to learn more about security and continue to work in the field, I could have done that in my old job. So for me, I’m not really interested in going back to industry. That’s one of the reasons why my inability to end up at a prestigious university is a bitter pill to swallow. It feels like a bit of failure.

        The other reason is the pressure that comes at a major research university. When people here talk about “teaching colleges,” you can hear the disdain dripping from their voices. And that’s unfortunate. Given my original motivation, I think I would be very happy at a teaching college. I just hope that, wherever I end up, the program is sufficiently strong that I will get to teach security, crypto, etc.

        I’ll post another response relating to your original topic rather than my personal career search. That way people can skip my gripes.

        • dwallach says:

          Market forces have a lot more impact on your ultimate job selection than anything to do with the quality of your work. When I completed my PhD in 1998, it was still the original dot-com boom, and companies were throwing around significant hiring bonuses, while academic CS departments were growing, across the country. Today, we now have even more professors generating even more students, and with fewer jobs to go around.

          Certainly, if your goal is to teach, then please go ahead and take a job at a teaching college, or as a professor of the practice / lecturer at a research school or even consider high schools, private professional training, etc. Do what you love, and understand that the PhD after your name will still be worth something, and the experience you picked up in grad school will still be worth something, whatever you end up doing.

  2. John Clements says:

    I can read your proposal in one of two ways. Either
    a) you want to leave current conference structures alone, and reform journals, or
    b) you want to completely overhaul how conferences work.

    If (a), then it’s not clear to me how the proposed system would address the problems with our current system. That is, the paper/no-paper publication system does not seem to be at the heart of the problems you list with conferences.

    If (b), then… (I claim) you’re not providing details.

    John

    • dwallach says:

      I’d say I want to modestly reform conferences and journals but really overhaul the way we think about and use tech reports. If we did that, it would fix many of the problems we currently have with conferences and journals.

      What I most want to obliterate is the submit/reject/revise/repeat cycle. It creates tons of redundant work for conference program committees and it disheartens would-be authors. My thought is that a wholesale migration from our current system to something more akin to arXiv.org (mashed up with DBLP and HotCRP) would get us from here to there. The key insights are:

      - Making a publication that “only” appears as a tech report in this system still be “real” in the sense that it can contribute usefully toward a tenure case would give all the papers that don’t get accepted, the first time around, a home where authors feel less need to get on the resubmission treadmill.

      - Building a more sophisticated ranking system than just h-indices or citation count is essential to making those papers be “real”.

      - Once online, unrefereed publications are socially accepted and academically rewarded (to the extent that they’re cited and gain ranking), the reviewing load on our conferences and journals may well drop radically and the quality would go up.

  3. Michael says:

    I like a number of your suggestions. Having a CSPub system like arXiv would be very nice. I especially like the “accept without presentation” option for conferences. This would eliminate the problem of tossing out papers solely because of time constraints.

    The one element that I think is still needed (unless I missed it) is author rebuttals. I know some conferences (VLDB and this year’s CCS, for example) have started experimenting with these types of systems. Reviewers are human. Often, they are overworked grad students who don’t yet have full mastery of the field. In short, they make mistakes. However, the acceptance/rejection decision is always made before the author is aware of these mistakes.

    Give the authors a better fighting chance to counter the criticisms. When the reviews have been done, send the critique (not the scoring) to the authors with the opportunity to submit a rebuttal within a week. Then, the program committee could consider this statement in their decision.

    Perhaps it won’t make much of a difference, but I would feel much better if I’d have been able to argue back against some of the reviewers I’ve had.

  4. Robbie says:

    The arxiv is great, and has totally changed the way people do research in physics. It doesn’t replace the journals (or the conferences), but supplements them. The results get out there in the fastest possible way, and then the slower process of deciding the merit of the work happens at its own rate.

    One of the biggest gains to a system like this is that it shortens the annoying gap between finishing the research and feeling free to discuss it publicly. Especially in some fields, there can be a real fear that discussing a work before it is published will result in that research getting “stolen”. Arxiv (almost) eliminates this problem.

  5. Perry Lorier says:

    One thing that we were talking about recently was the idea of when you submit your paper, you get sent, say 5, other papers to review. You have an incentive to review them (if you don’t then your submission gets dropped), and instead of giving them pass/fail (since ideally you’d want to fail them all to give your paper a better chance), you rank them from best to worst.

    You provide feedback to the authors on what you think can be done to improve the paper, which then gets sent back to the original authors. The original authors can take that feedback, optionally resubmit an improved version of the paper, and provide a “rebuttal” to the feedback they got from other authors.

    Then all of this are sent to the program committee, they get 5 sets of rankings for a paper, they get 5 comments (and rebuttals) on the paper, and the (hopefully improved) paper itself.

    This helps the program committee by getting a lot of the crap thats submitted filtered through a first layer of critique and improvements (and potentially some papers pulling out and deciding to do more work before resubmitting). It helps people by exposing them to what the program committee has to put up with, and helping them understand what features of their papers are likely to make program committees reject their papers, and it helps researchers who end up reading a lot of papers in related, but not identical areas.

    It’s not a silver bullet for improving the situation, but I’d like to think it could help.

    • Anonymous says:

      One thing that we were talking about recently was the idea of when you submit your paper, you get sent, say 5, other papers to review. You have an incentive to review them (if you don’t then your submission gets dropped), and instead of giving them pass/fail (since ideally you’d want to fail them all to give your paper a better chance), you rank them from best to worst.

      This can be gamed too: rank them in opposite order to what you would honestly give them. The good papers get the worst ratings at this step and get bounced; the bad ones get bounced a bit later in the process; out goes the competition.

      Fixing this requires making the ones you rate not be in competition with your own submission at all. That is, you submit a paper that will vie to be in issue #n+1 and review five papers that are vying to be in issue #n, rather than all six being aimed at the same issue and thus competing for the same limited space.

      Better still, ditch the whole “limited space” thing and go long tail. Go paperless. Articles (in all branches of academia) are published online. None are rejected, at least not per se. But they do compete for reputation and attention. This still has to be watched for any attempts to game the system (e.g. with aggressive SEO) but it gets rid of the artificial scarcity of space in the journals AND the artificial scarcity of copies of the journals, leveling the playing field and raising efficiency.

      In the longer run, the whole culture needs to change of course. “Publish or perish” promotes quantity over quality, and is thus also wasteful and inefficient in the long term.

      • Michael says:

        We currently do have a system in which all articles are published online. They’re called technical reports, and they’re essentially ignored when it comes to hiring and tenure.

        The limited space issue isn’t really about paper. For conferences, it’s about time and physical space. If a conference is 3 days (3 sessions with 3 papers each), and it’s being held in a space with 2 rooms, you get a clear formula for 54 paper acceptances. But you just got 300 paper submissions.

        Conferences and journals can’t (and shouldn’t) just publish all papers received. I’ve been a reviewer, and I’ve seen some pretty horrendous submissions. The problem is that you want to accept 150 of those 300 papers, but scheduling limits you to 54. The current system rejects those 96 other papers, which then get re-submitted to the next conference.

        I agree that the culture of “publish or perish” induces perverse incentives to create the LPU. But you have to look at the big picture to understand how that system developed. Why is publication so important? Money.

        State budgets are getting slashed everywhere. To make up for the gap, schools have to raise a lot more money from research grants. So you look at the funding process. One of the most important elements of a proposal is a justification of why you think your team will succeed in the proposed research. How do you prove that? You point to previous successes (i.e., publications in prestigious conferences and journals).

        So if you’re only focused on the technical issues, yes, PoP is inefficient and discourages taking big risks for potentially huge gains. But as funding becomes increasingly scarce, PoP is very efficient at identifying the one or two projects that will get money from any single program.

        • dwallach says:

          With a suitable ranking system, the papers that get “accepted” will continue to get accepted and those authors will continue to have their usual advantages, including improved strength of graduate recruiting, fundraising, and so forth. The problem I really want to solve is the fate of the “bubble” papers. As you said, maybe 150 of those 300 were “worthy” of accepting, but there wasn’t physical space at the conference. That’s the problem I want to solve. I want to create a space where those papers can have a lesser form of acceptance (maybe you call it “accepted without publication”) that nonetheless is a real publication and frees the authors to move on and do something else rather than endlessly revising and resubmitting their old work.

          My goal is to use electronic publication to eliminate scarcity as much as possible. If a conference wants to take 150 out of 300 submitted papers, great, then do it. Want to then invite 30 of those to make presentations? Great, then do it. Clearly, the latter category should reflect better on the author than the former, in terms of a paper’s initial stature, but both papers will be “published” and both may then collect citations, which are a stronger arbiter of a paper’s ultimate importance.

  6. Anonymous says:

    Everyone proposes to “fix” the publication process by allowing more papers to be published in some way or another. It seems to me that we are missing the big(ger) picture here, where the goal of conferences/journals is to timestamp and make available interesting research that advances the state of the art. The acceptance rates might be too low, but is it only because there is an artificial scarcity in place? Or could it be because there are too many researchers, not all of them advancing the state of the art, but all of them trying to publish?

    • dwallach says:

      You’re right that conferences and journals serve an important function, by directing our limited attention at more important work. Whatever we do, we don’t want to eliminate the peer review process! I’m agnostic on whether conferences and journals should raise their acceptance rates, but I want those decisions to be made without being concerned about artificial scarcities (either space in a physical proceedings, or presentation slots at a physical conference), and I want the rejections to have somewhere legitimate to go that’s better than cycling around and around and clogging up the works.