February 24, 2018

Singularity Skepticism 4: The Value of Avoiding Errors

[This is the fourth in a series of posts. The other posts in the series are here: 1 2 3.]

In the previous post, we did a deep dive into chess ratings, as an example of a system to measure a certain type of intelligence. One of the takeaways was that the process of numerically measuring intelligence, in order to support claims such as “intelligence is increasing exponentially”, is fraught with complexity.

Today I want to wrap up the discussion of quantifying AI intelligence by turning to a broad class of AI systems whose performance is measured as an error rate, that is, the percentage of examples from population for which the system gives a wrong answer. These applications include  facial recognition, image recognition, and so on.

For these sorts of problems, the error rate tends to change over time as shown on this graph:

The human error rate doesn’t change, but the error rate for the AI system tends to fall exponentially, crossing the human error rate at a time we’ll call t*, and continuing to fall after that.

How does this reduction in error rate translate into outcomes? We can get a feel for this using a simple model, where a wrong answer is worth W and a right answer is worth R, with R>W, naturally.

In this model, the value created per decision changes over time as shown in this graph:

Before t*, humans perform better, and the value is unchanged. At t*, AI becomes better and the graph takes a sharp turn upward. After that, the growth slows as the value approaches its asymptote of R.

This graph has several interesting attributes. First, AI doesn’t help at all until t*, when it catches up with people. Second, the growth rate of value (i.e., the slope of the curve) is zero while humans are better, then it lurches upward at t*, then the growth rate falls exponentially back to zero. And third, most of the improvement that AI can provide will be realized in a fairly short period after t*.

Viewed over a long time-frame, this graph looks a lot like a step function: the effect of AI is a sudden step up in the value created for this task. The step happens in a brief interval after AI passes human performance. Before and after that interval, the value doesn’t change much at all.

Of course, this simple model can’t be the whole story. Perhaps a better solution to this task enables other tasks to be done more effectively, multiplying the improvement. Perhaps people consume more of this tasks’s output because it is better. For these and other reasons, things will probably be somewhat better than this model predicts. But the model is still a long way from establishing that any kind of intelligence explosion or Singularity is going to happen.

Next time, we’ll dive into the question of how different AI tasks are connected, and how to think about the Singularity in a world where task-specific AI is all we have.

Singularity Skepticism 3: How to Measure AI Performance

[This is the third post in a series. The other posts are here: 1 2 4]

On Thursday I wrote about progress in computer chess, and how a graph of Elo rating (which I called the natural measure of playing skill) versus time showed remarkably consistent linear improvement over several decades. I used this to argue that sometimes exponential improvements in the inputs to AI systems (computer speed and algorithms) lead to less-than-exponential improvement in AI performance.

Readers had various objections to this. Some said that linear improvements in Elo rating should really be seen as exponential improvements in quality; and some said that the arrival of the new AI program AlphaZero (which did not appear in my graph and was not discussed in my post) is a game-changer that invalidates my argument.  I’ll address those objections in this post.

First, let’s talk about how we measure AI performance. For chess, I used Elo rating, which is defined so that if Player A has a rating 100 points higher than Player B, we should expect A to collect 64% of the points when playing B. (Winning a game is one point, a drawn game is half a point for each player, and losing gets you zero points.)

There is an alternative rating system, which I’ll call ExpElo, which turns out to be equivalent to Elo in its predictions.  Your ExpElo rating is determined by exponentiating your Elo rating. Where Elo uses the difference of two player’s ratings to predict win percentage, ExpElo uses a ratio of the ratings. Both Elo and ExpElo are equally compelling from an abstract mathematical standpoint, and they are entirely equivalent in their predictions.  But where a graph of improvement in Elo is linear, a graph of improvement in ExpElo would be exponential. So is the growth in chess performance linear or exponential?

Before addressing that question, let’s stop to consider that this situation is not unique to chess. Any linearly growing metric can be rescaled (by exponentiating the metric) to get a new metric that grows exponentially. And any exponentially growing metric can be rescaled (by taking the logarithm) to get a new metric that grows linearly.  So for any quantity that is improving, we will always be able to choose between a metric that grows linearly and one that grows exponentially.

The key question for thinking about AI is: which metric is the most natural measure of what we mean by intelligence on this particular task? For chess, I argue that this is Elo (and not ExpElo).  Long before this AI debate, Arpad Elo proposed the Elo system and that was the one adopted by chess officials.  The U.S. Chess Federation divides players into skill classes (master, expert, A, B, C, and so on) that are evenly spaced, 200 Elo points wide. For classifying human chess performance, Elo was chosen. So why should we switch to a different metric for thinking about AI?

Now here’s the plot twist: the growth in computer chess rating, whether Elo or ExpElo, is likely to level off soon, because the best computers seem to be approaching perfect play, and you can’t get better than perfect.

In every chess position, there is some move (or moves) that is optimal, in the sense of leading to the best possible game outcome.  For an extremely strong player, we might ask what that player’s error rate is: in high-level play, for what fraction of the positions it encounters will it make a non-optimal move?

Suppose a player, Alice, has an error rate of 1%, and suppose (again to simplify the explanation) that a chess game lasts fifty moves for each player. Then in the long run Alice will make a non-optimal move once every two games–in half of the games she will play optimally.  This implies that if Alice plays a chess match against God (who always makes optimal moves), Alice will get at least 25% of the points, because she will play God evenly in the half of games where she makes all optimal moves, and (worst case) she will lose the games where she errs.  And if Alice can score at least 25% against God, then Alice’s Elo rating is no more than 200 points below God’s. The upshot is that there is some rating–the “Rating of God”–that cannot be exceeded, and that is true in both Elo and ExpElo systems.

Clever research by Ken Regan and others has shown that the best chess programs today have fairly low error rates and therefore are approaching the Rating of God.  Regan’s research suggests that the RoG is around 3600, which is notable because the best program on my graph, Stockfish, is around 3400, and AlphaZero, the new AI chess player from Google’s DeepMind, may be around 3500. If Regan’s estimate is right, then AlphaZero is playing the majority of its games optimally and would score about 36% against God.  The historical growth rate of AI Elo ratings has been about 50 points per year, so it would appear that growth can continue for only a couple of years before leveling off. Whether the growth in chess performance has been linear or exponential so far, it seems likely to flatline within a few years.



Singularity Skepticism 2: Why Self-Improvement Isn’t Enough

[This is the second post in a series. The other posts are here: 1 3 4]

Yesterday, I wrote about the AI Singularity, and why it won’t be a literal singularity, that is, why the growth rate won’t literally become infinite. So if the Singularity won’t be a literal singularity, what will it be?

Recall that the Singularity theory is basically a claim about the growth rate of machine intelligence. Having ruled out the possibility of faster-than-exponential growth, the obvious hypothesis is exponential growth.

Exponential growth doesn’t imply that any “explosion” will occur. For example, my notional savings account paying 1% interest will grow exponentially but I will not experience a “wealth explosion” that suddenly makes me unimaginably rich.

But what if the growth rate of the exponential is much higher? Will that lead to an explosion?

The best historical analogy we have is Moore’s Law. Over the past several decades computing power has growth exponentially at a 60% annual rate–or a doubling time of 18 months–leading to a roughly ten-billion-fold improvement. That has been a big deal, but it has not fundamentally changed the nature of human existence. The effect of that growth on society and the economy has been more gradual.

The reason that a ten-billion-fold improvement in computing has not made us ten billion times happier is obvious: computing power is not something we value deeply for its own sake. For computing power to make us happier, we have to find ways to use computing to improve the things we do care mostly deeply about–and that isn’t easy.

More to the point, efforts to turn computing power into happiness all seem to have sharply diminishing returns. For example, each new doubling in computing power can be used to improve human health, by finding new drugs, better evaluating medical treatments, or applying health interventions more efficiently. The net result is that health improvement is more like my savings account than like Moore’s Law.

Here’s an example from AI. The graph below shows improvement in computer chess performance from the 1980s up to the present. The vertical axis shows Elo rating, the natural measure of chess-playing skill, which is defined so that if A is 100 Elo points above B, then A is expected to beat B 64% of the time. (source: EFF)

The result is remarkably linear over more than 30 years, despite exponential growth in underlying computing capacity and similar exponential growth in algorithm performance. Apparently, rapid exponential improvements in the inputs to AI chess-playing lead to merely linear improvement in the natural measure of output.

What does this imply for the Singularity theory? Consider the core of the intelligence explosion claim. Quoting Good’s classic paper:

… an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ …

What if “designing even better machines” is like chess, in that exponential improvements in the input (intelligence of a machine) lead to merely linear improvements in the output (that machine’s performance at designing other machines)? If that were the case, there would be no intelligence explosion. Indeed, the growth of machine intelligence would be barely more than linear.  (For the mathematically inclined: if we assume the derivative of intelligence is proportional to log(intelligence), then intelligence at time T will grow like T log(T), barely more than linear in T.)

Is designing new machines like chess in this way? We can’t know for sure. It’s a question in computational complexity theory, which is basically the study of how much more of some goal can be achieved as computational resources increase. Having studied complexity theory more deeply than most humans, I find it very plausible that machine design will exhibit the kind of diminishing returns we see in chess.  Regardless, this possibility does cast real doubt on Good’s claim that self-improvement leads “unquestionably” to explosion.

So Singularity theorists have the burden of proof to explain why machine design can exhibit the kind of feedback loop that would be needed to cause an intelligence explosion.

In the next post, we’ll look at another challenge faced by Singularity theorists: they have to explain, consistently with their other claims, why the Singularity hasn’t happened already.

[Update (Jan. 8, 2018): The next post responds to some of the comments on this one, and gives more detail on how to measure intelligence in chess and other domains.  I’ll get to that other challenge to Singularity theorists in a subsequent post.]