September 18, 2019

Singularity Skepticism 3: How to Measure AI Performance

[This is the third post in a series. The other posts are here: 1 2 4]

On Thursday I wrote about progress in computer chess, and how a graph of Elo rating (which I called the natural measure of playing skill) versus time showed remarkably consistent linear improvement over several decades. I used this to argue that sometimes exponential improvements in the inputs to AI systems (computer speed and algorithms) lead to less-than-exponential improvement in AI performance.

Readers had various objections to this. Some said that linear improvements in Elo rating should really be seen as exponential improvements in quality; and some said that the arrival of the new AI program AlphaZero (which did not appear in my graph and was not discussed in my post) is a game-changer that invalidates my argument.  I’ll address those objections in this post.

First, let’s talk about how we measure AI performance. For chess, I used Elo rating, which is defined so that if Player A has a rating 100 points higher than Player B, we should expect A to collect 64% of the points when playing B. (Winning a game is one point, a drawn game is half a point for each player, and losing gets you zero points.)

There is an alternative rating system, which I’ll call ExpElo, which turns out to be equivalent to Elo in its predictions.  Your ExpElo rating is determined by exponentiating your Elo rating. Where Elo uses the difference of two player’s ratings to predict win percentage, ExpElo uses a ratio of the ratings. Both Elo and ExpElo are equally compelling from an abstract mathematical standpoint, and they are entirely equivalent in their predictions.  But where a graph of improvement in Elo is linear, a graph of improvement in ExpElo would be exponential. So is the growth in chess performance linear or exponential?

Before addressing that question, let’s stop to consider that this situation is not unique to chess. Any linearly growing metric can be rescaled (by exponentiating the metric) to get a new metric that grows exponentially. And any exponentially growing metric can be rescaled (by taking the logarithm) to get a new metric that grows linearly.  So for any quantity that is improving, we will always be able to choose between a metric that grows linearly and one that grows exponentially.

The key question for thinking about AI is: which metric is the most natural measure of what we mean by intelligence on this particular task? For chess, I argue that this is Elo (and not ExpElo).  Long before this AI debate, Arpad Elo proposed the Elo system and that was the one adopted by chess officials.  The U.S. Chess Federation divides players into skill classes (master, expert, A, B, C, and so on) that are evenly spaced, 200 Elo points wide. For classifying human chess performance, Elo was chosen. So why should we switch to a different metric for thinking about AI?

Now here’s the plot twist: the growth in computer chess rating, whether Elo or ExpElo, is likely to level off soon, because the best computers seem to be approaching perfect play, and you can’t get better than perfect.

In every chess position, there is some move (or moves) that is optimal, in the sense of leading to the best possible game outcome.  For an extremely strong player, we might ask what that player’s error rate is: in high-level play, for what fraction of the positions it encounters will it make a non-optimal move?

Suppose a player, Alice, has an error rate of 1%, and suppose (again to simplify the explanation) that a chess game lasts fifty moves for each player. Then in the long run Alice will make a non-optimal move once every two games–in half of the games she will play optimally.  This implies that if Alice plays a chess match against God (who always makes optimal moves), Alice will get at least 25% of the points, because she will play God evenly in the half of games where she makes all optimal moves, and (worst case) she will lose the games where she errs.  And if Alice can score at least 25% against God, then Alice’s Elo rating is no more than 200 points below God’s. The upshot is that there is some rating–the “Rating of God”–that cannot be exceeded, and that is true in both Elo and ExpElo systems.

Clever research by Ken Regan and others has shown that the best chess programs today have fairly low error rates and therefore are approaching the Rating of God.  Regan’s research suggests that the RoG is around 3600, which is notable because the best program on my graph, Stockfish, is around 3400, and AlphaZero, the new AI chess player from Google’s DeepMind, may be around 3500. If Regan’s estimate is right, then AlphaZero is playing the majority of its games optimally and would score about 36% against God.  The historical growth rate of AI Elo ratings has been about 50 points per year, so it would appear that growth can continue for only a couple of years before leveling off. Whether the growth in chess performance has been linear or exponential so far, it seems likely to flatline within a few years.



Singularity Skepticism 2: Why Self-Improvement Isn’t Enough

[This is the second post in a series. The other posts are here: 1 3 4]

Yesterday, I wrote about the AI Singularity, and why it won’t be a literal singularity, that is, why the growth rate won’t literally become infinite. So if the Singularity won’t be a literal singularity, what will it be?

Recall that the Singularity theory is basically a claim about the growth rate of machine intelligence. Having ruled out the possibility of faster-than-exponential growth, the obvious hypothesis is exponential growth.

Exponential growth doesn’t imply that any “explosion” will occur. For example, my notional savings account paying 1% interest will grow exponentially but I will not experience a “wealth explosion” that suddenly makes me unimaginably rich.

But what if the growth rate of the exponential is much higher? Will that lead to an explosion?

The best historical analogy we have is Moore’s Law. Over the past several decades computing power has growth exponentially at a 60% annual rate–or a doubling time of 18 months–leading to a roughly ten-billion-fold improvement. That has been a big deal, but it has not fundamentally changed the nature of human existence. The effect of that growth on society and the economy has been more gradual.

The reason that a ten-billion-fold improvement in computing has not made us ten billion times happier is obvious: computing power is not something we value deeply for its own sake. For computing power to make us happier, we have to find ways to use computing to improve the things we do care mostly deeply about–and that isn’t easy.

More to the point, efforts to turn computing power into happiness all seem to have sharply diminishing returns. For example, each new doubling in computing power can be used to improve human health, by finding new drugs, better evaluating medical treatments, or applying health interventions more efficiently. The net result is that health improvement is more like my savings account than like Moore’s Law.

Here’s an example from AI. The graph below shows improvement in computer chess performance from the 1980s up to the present. The vertical axis shows Elo rating, the natural measure of chess-playing skill, which is defined so that if A is 100 Elo points above B, then A is expected to beat B 64% of the time. (source: EFF)

The result is remarkably linear over more than 30 years, despite exponential growth in underlying computing capacity and similar exponential growth in algorithm performance. Apparently, rapid exponential improvements in the inputs to AI chess-playing lead to merely linear improvement in the natural measure of output.

What does this imply for the Singularity theory? Consider the core of the intelligence explosion claim. Quoting Good’s classic paper:

… an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ …

What if “designing even better machines” is like chess, in that exponential improvements in the input (intelligence of a machine) lead to merely linear improvements in the output (that machine’s performance at designing other machines)? If that were the case, there would be no intelligence explosion. Indeed, the growth of machine intelligence would be barely more than linear.  (For the mathematically inclined: if we assume the derivative of intelligence is proportional to log(intelligence), then intelligence at time T will grow like T log(T), barely more than linear in T.)

Is designing new machines like chess in this way? We can’t know for sure. It’s a question in computational complexity theory, which is basically the study of how much more of some goal can be achieved as computational resources increase. Having studied complexity theory more deeply than most humans, I find it very plausible that machine design will exhibit the kind of diminishing returns we see in chess.  Regardless, this possibility does cast real doubt on Good’s claim that self-improvement leads “unquestionably” to explosion.

So Singularity theorists have the burden of proof to explain why machine design can exhibit the kind of feedback loop that would be needed to cause an intelligence explosion.

In the next post, we’ll look at another challenge faced by Singularity theorists: they have to explain, consistently with their other claims, why the Singularity hasn’t happened already.

[Update (Jan. 8, 2018): The next post responds to some of the comments on this one, and gives more detail on how to measure intelligence in chess and other domains.  I’ll get to that other challenge to Singularity theorists in a subsequent post.]

Why the Singularity is Not a Singularity

This is the first in a series of posts about the Singularity, that notional future time when machine intelligence explodes in capability, changing human life forever. Like many computer scientists, I’m a Singularity skeptic. In this series I’ll be trying to express the reasons for my skepticism–and workshopping ideas for an essay on the topic that I’m working on. Your comments and feedback are even more welcome that usual!

[Later installments in the series are here: 2 3 4]

What is the Singularity?  It is a notional future moment when technological change will be so rapid that we have no hope of understanding its implications. The Singularity is seen as a cultural event horizon beyond which humanity will become … something else that we cannot hope to predict. Singularity talk often blends into theories about future superintelligence posing an existential risk to humanity.

The essence of Singularity theory was summarized in an early (1965) paper by the British mathematician I.J. Good:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

Vernor Vinge was the first to describe this as a “singularity”, adopting a term from mathematics that applies when the growth rate of a quantity goes to infinity. The term was further popularized by Ray Kurzweil’s book, “The Singularity is Near.”

Exponential Growth

The Singularity theory is fundamentally a claim about the future growth rate of machine intelligence. Before evaluating that claim, let’s first review some concepts useful for thinking about growth rates.

A key concept is exponential growth, which means simply that the increase in something is proportional to how big that thing already is. For example, if my bank account grows at 1% annually, this means that the every year the bank will add to my account 1% of the current balance. That’s exponential growth.

Exponential growth can happen at different speeds. There are two natural ways to characterize the speed of exponential growth. The first is a growth rate, typically stated as a percentage per some time unit. For example, my notional bank account has a growth rate of 1% per year. The second natural measure is the doubling time–how long it will take the quantity to double. For my bank account, that works out to about 70 years.  

A good way to tell if a quantity is growing exponentially is to look at how its growth is measured. If the natural measure is a growth rate in percent per time, or a doubling time, then that quantity is growing exponentially. For example, economic growth in most countries is measured as a percent increase in (say) GDP, which tells us that GDP tends to grow exponentially over the long term–with short-term ups and downs, of course. If a country’s GDP is growing at 3% per year, that corresponds to a doubling time of about 23 years.

Exponential growth is very common in nature and in human society. So the fact that a quantity is growing exponentially does not in itself make that quantity special nor does it give that quantity unusual, counterintuitive dynamics.

The speed and capacity of computers has grown exponentially, which is not remarkable. What is remarkable is the growth rate in computing capacity. A rule of thumb called “Moore’s Law” states that the speed and capacity of computers will have a doubling time of 18 months, which corresponds to a growth rate of 60% per year.  Moore’s Law has held true for roughly fifty years–that’s 33 doublings, or roughly a ten-billion-fold increase in capacity.

The Singularity is Not a Literal Singularity

As a first step in considering the plausibility of the Singularity hypothesis, let’s consider the prospect of a literal singularity–where the rate of improvement in machine intelligence literally becomes infinite at some point in the future. This requires that machine intelligence grows faster than any exponential, so that the doubling time gets smaller and smaller, and eventually goes to zero.

I don’t know of any theoretical basis for expecting a literal singularity.  There is virtually nothing in the natural or human world that grows super-exponentially over time–and even “ordinary” super-exponential growth does not yield a literal singularity. In short, it’s hard to see how the AI “Singularity” could possibly be a true mathematical singularity.

So if the Singularity is not literally a singularity, what is it?  The next post will start with that question.