One of my least favorite tasks as a professor is grading papers. So there’s good news – of a sort – in J. Greg Phelan’s New York Times article from last week, about the use of computer programs to grade essays.
The computers are surprisingly good at grading – essentially as accurate as human graders, where an “accurate” grade is defined as one that correlates with the grade given by another human. To put it another way, the variance between a human grader and a computer is no greater than between two human graders.
Eric Rescorla offers typically interesting commentary on this. He points out, first, that the lesson here might not be that computers are good at grading, but that human graders are surprisingly bad. I know how hard it is to give the thirtieth essay in the stack the careful reading it deserves. If the grader’s brain is on autopilot, you’ll get the kind of formulaic grading that a computer might be able to handle.
Another possibility, which Eric also discusses, is that there is something simple – I’ll call it the X-factor – about an essay’s language or structure that happens to correlate very well with good writing. If this is true, then a computer program that looks only for the X-factor will give “accurate” grades that correlate well with the grades assigned by a human reader who actually understands the essays. The computer’s grade will be “accurate” even though the computer doesn’t really understand what the student is trying to say.
The article even gives hints about the nature of the X-factor:
For example, a high score almost always contains topically relevant vocabulary, a variety of sentence structures, and the use of cue terms like “in summary,” for example, and “because” to organize an argument. By analyzing 50 of these features in a sampling of essays on a particular topic that were scored by human beings, the system can accurately predict how the same human readers would grade additional essays on the same topic.
This is all very interesting, but the game will be up as soon as students and their counselors figure out what the X-factor is and how to maximize it. Then the SAT-prep companies will teach students how to crank out X-factor-maximizing essays, in some horrendous stilted writing style that only a computerized grader could love. The correlation between good writing and the X-factor will be lost, and we’ll have to switch back to human graders – or move on to the next generation of computerized graders, looking for a new improved X-factor.