March 29, 2024

How to constructively review a research paper

Any piece of research can be evaluated on three axes:

  • Correctness/validity — are the claims justified by evidence?
  • Impact/significance — how will the findings affect the research field (and the world)?
  • Novelty/originality — how big a leap are the ideas, especially the methods, compared to what was already known?

There are additional considerations such as the clarity of the presentation and appropriate citations of prior work, but in this post I’ll focus on the three primary criteria above. How should reviewers weigh these three components relative to each other? There’s no single right answer, but I’ll lay out some suggestions.

First, note that the three criteria differ greatly in terms of reviewers’ ability to judge them:

  • Correctness can be evaluated at review time, at least in principle.
  • Impact can at best be predicted at review time. In retrospect (say, 10 years after publication), informed peers will probably agree with each other about a paper’s impact.
  • Novelty, in contrast to the other two criteria, seems to be a fundamentally subjective notion.

We can all agree that incorrect papers should not be accepted. Peer review would lose its meaning without that requirement. In practice, there are complications ranging from the difficulty of verifying mathematical proofs to the statistical nature of research claims; the latter has led to replication crises in many fields. But as a principle, it’s clear that reviewers shouldn’t compromise on correctness.

Should reviewers even care about impact or novelty?

It’s less obvious why peer review should uphold standards of (predicted) impact or (perceived) novelty. If papers weren’t filtered for impact, presumably it would burden readers by making it harder to figure out which papers to pay attention to. So peer reviewers perform a service to readers by rejecting low-impact papers, but this type of gatekeeping does collateral damage: many world-changing discoveries were initially rejected as insignificant.

The argument for novelty of ideas and methods as a review criterion is different: we want to encourage papers that make contributions beyond their immediate findings, that is, papers that introduce methods that will allow other researchers to make new discoveries in the future.

In practice, novelty is often a euphemism for cleverness, which is a perversion of the intent. Readers aren’t served by needlessly clever papers. Who cares about cleverness? People who are evaluating researchers: hiring and promotion committees. Thus, publishing in a venue that emphasizes novelty becomes a badge of merit for researchers to highlight in their CVs. In turn, forums that publish such papers are seen as prestigious.

Because of this self-serving aspect, today’s peer review over-emphasizes novelty. Sure, we need occasional breakthroughs, but mostly science progresses in a careful, methodical way, and papers that do this important work are undervalued. In many fields of study, publishing is at risk of devolving into a contest where academics impress each other with their cleverness.

There is at least one prominent journal, PLoS One, whose peer reviewers are tasked with checking only correctness, with impact and novelty being left to be sorted out post-publication. But for most journals and peer-reviewed conferences, the limited number of publication slots means that there will inevitably be gatekeeping based on impact and/or novelty.

Suggestions for reviewers

Given this reality, here are four suggestions for reviewers. This list is far from comprehensive, and narrowly focused on the question of weighing the three criteria.

  1. Be explicit about how you rate the paper on correctness, impact, and novelty (and any other factors such as clarity of the writing). Ideally, review forms should insist on separate ratings for the criteria. This makes your review much more actionable for the authors: should they address flaws in the work, try harder to convince the world of its importance, or abandon it entirely?
  2. Learn to recognize your own biases in assessing impact and novelty, and accept that these assessments might be wrong or subjective. Be open to a discussion with other reviewers that might change your mind.
  3. Not every paper needs to maximize all three criteria. Consider accepting papers with important results even if they aren’t highly novel, and conversely, papers that are judged to be innovative even if the potential impact isn’t immediately clear. But don’t reward cleverness for the sake of cleverness; that’s not what novelty is supposed to be about.
  4. Above all, be supportive of authors. If you rated a paper low on impact or novelty, do your best to explain why.

Conclusion

Over the last 150 years, peer review has evolved to be more and more of a competition. There are some advantages to this model, but it makes it easy for reviewers to lose touch with the purpose of peer review and basic norms of civility. Once in a while, we need to ask ourselves critical questions about what we’re doing and how best to do it. I hope this post was useful for such a reflection.

 

Thanks to Ed Felten and Marshini Chetty for feedback on a draft.

 

Comments

  1. Paulo Borba says

    Thanks! This is very good.

    I’m also disappointed by how little is done by PC chairs and journal editors to fix this problem and better educate reviewers.

  2. Dean C. Rowan says

    I’m not coming from a perspective even remotely connected to CS research, but I tend to think of novelty in negative terms, i.e., it signifies the absence of plagiarism. The “how big a leap” criterion seems as relevant to impact/significance as to novelty/originality, though I can see how a clearly superior methodical innovation stands a chance of demonstrating its impact on the field, if not the world, sooner than ten years. The criterion also poses a challenge for replication, for which incremental methodical adjustments might have more utility than a vast leap.

    • Arvind Narayanan says

      Yes, novelty/originality is used in a strong sense (as I did) and a weak sense (as you point out).

      As far as I can tell, computer scientists value capital-N novelty more than researchers in other fields do, and papers without a claim to methodological innovation stand little chance of publication in the most prestigious venues. My post is an attempt to push back against this a bit.

      I agree with your comment about replication as well. I’m embarrassed by how little most CS communities value replication.

  3. Robert Harper says

    Thoughtful and helpful, and timely.

    A point I would suggest as a coda to your essay is that scientific debates are to be held in the open air of the meeting or publication, not fought in the darkness of anonymous, unaccountable reviews. The benefit of doubt should go to the authors; it is their right, not privilege, to present controversial, contrarian, or possibly even incorrect claims, and to defend them in public. Sure, if their Theorem 3 can be definitively refuted, the paper should be rejected. Or if there is clearly insufficient data to support an empirical claim. On the other hand, the very idea of shepherding is contrary to the scientific process. If you as reviewer don’t like the way something is said or spun, buck up and argue your case at the venue. It’s neither the right not the responsibility of a reviewer to try to influence a priori what is said or how it is said.

    This is hardly all that is seriously wrong with CS review and publication practices, but one point at a time.

  4. John Field says

    Thanks. This was well stated, and useful.

    There may be a small copy editing error in point 3 of the 4 suggestions.

    “that’s not what novelty is supposed to (be) about.”

    Thanks, again.