The gymnastics scoring in this year’s Olympics has generated some controversy, as usual. Some of the controversy feel manufactured: NBC tried to create a hubbub over Nastia Liukin losing the uneven bars gold medal on the Nth tiebreaker; but top-level sporting events whose rules do not admit ties must sometimes decide contests by tiny margins.
A more interesting discussion relates to a change in the scoring system, moving from the old 0.0 to 10.0 scale, to a new scale that adds together an “A score” measuring the difficulty of the athlete’s moves and a “B score” measuring how well the moves were performed. The B score is on the old 0-10 scale, but the A score is on an open-ended scale with fixed scores for each constituent move and bonuses for continuously connecting a series of moves.
One consequence of the new system is that there is no predetermined maximum score. The old system had a maximum score, the legendary “perfect 10”, whose demise is mourned old-school gymnastics gurus like Bela Karolyi. But of course the perfect 10 wasn’t really perfect, at least not in the sense that a 10.0 performance was unsurpassable. No matter how flawless a gymnast’s performance, it is always possible, at least in principle, to do better, by performing just as flawlessly while adding one more flip or twist to one of the moves. The perfect 10 was in some sense a myth.
What killed the perfect 10, as Jordan Ellenberg explained in Slate, was a steady improvement in gymnastic performance that led to a kind of grade inflation in which the system lost its ability to reward innovators for doing the latest, greatest moves. If a very difficult routine, performed flawlessly, rates 10.0, how can you reward an astonishingly difficult routine, performed just as flawlessly? You have to change the scale somehow. The gymnastics authorities decided to remove the fixed 10.0 limit by creating an open-ended difficulty scale.
There’s an interesting analogy to the “grade inflation” debate in universities. Students’ grades and GPAs have increased slowly over time, and though this is not universally accepted, there is plausible evidence that today’s students are doing better work than past students did. (At the very least, today’s student bodies at top universities are drawn from a much larger pool of applicants than before.) If you want a 3.8 GPA to denote the same absolute level of performance that it denoted in the past, and if you also want to reward the unprecendented performance of today’s very best students, then you have to expand the scale at the top somehow.
But maybe the analogy from gymnastics scores to grades is imperfect. The only purpose of gymnastics scores is to compare athletes, to choose a winner. Grades have other purposes, such as motivating students to pay attention in class, or rewarding students for working hard. Not all of these purposes require consistency in grading over time, or even consistency within a single class. Which grading policy is best depends on which goals we have in mind.
One thing is clear: any discussion of gymnastics scoring or university grading will inevitably be colored by nostalgic attachment to the artists or students of the past.