May 25, 2018

Archives for January 2018

Roundup: My First Semester as a Post-Doc at Princeton

As Princeton thaws from under last week’s snow hurricane, I’m taking a moment to reflect on my first four months in the place I now call home.

This roundup post shares highlights from my first semester as a post-doc in Psychology, CITP, and Sociology.

Here in Princeton, I’m surviving winter in the best way I know how 🙂

So far, I have had an amazing experience:

  • The Paluck Lab (Psychology) and the Center for IT Policy, my main anchor points at Princeton, have been welcoming and supportive. When colleagues from both departments showed up at my IgNobel Prize viewing party in my first month, I knew I had found a good home <grin>
  • The Paluck Lab have become a wonderful research family, and they even did the LEGO duck challenge together with me!
    • Weekly lab meetings with the Paluck Lab have been a master-class in thinking about the relationship between research design and theory in the social sciences. I am so grateful to observe and participate in these conversations, since so much about research is unspoken, tacit knowledge.
    • With the help of my new colleagues, I’ve started to learn how to write papers for general science journals. I’ve also learned more about publishing in the field of psychology.
  • At CITP, I’ve learned much about thinking simultaneously as a regulator and computer scientist.
  • I’ve loved the conversations at the Kahneman-Treisman Center for Behavioral Policy, where I am now an affiliated postdoc
  • I’m looking forward to meeting more of my colleagues in Sociology this spring, now that I’ll be physically based in Princeton more consistently

Travel and Speaking

 

View of the French Alps from Lausanne

 

I’m so glad that I can scale down my travel this spring, phew!

A flock of birds takes flight in Antigua, Guatemala

 

Writing and Research

Princeton Life

Rockefeller College Cloisters, Princeton. On evenings when I have dinner here, I walk through these cloisters on my way hime.

 

 

Singularity Skepticism 4: The Value of Avoiding Errors

[This is the fourth in a series of posts. The other posts in the series are here: 1 2 3.]

In the previous post, we did a deep dive into chess ratings, as an example of a system to measure a certain type of intelligence. One of the takeaways was that the process of numerically measuring intelligence, in order to support claims such as “intelligence is increasing exponentially”, is fraught with complexity.

Today I want to wrap up the discussion of quantifying AI intelligence by turning to a broad class of AI systems whose performance is measured as an error rate, that is, the percentage of examples from population for which the system gives a wrong answer. These applications include  facial recognition, image recognition, and so on.

For these sorts of problems, the error rate tends to change over time as shown on this graph:

The human error rate doesn’t change, but the error rate for the AI system tends to fall exponentially, crossing the human error rate at a time we’ll call t*, and continuing to fall after that.

How does this reduction in error rate translate into outcomes? We can get a feel for this using a simple model, where a wrong answer is worth W and a right answer is worth R, with R>W, naturally.

In this model, the value created per decision changes over time as shown in this graph:

Before t*, humans perform better, and the value is unchanged. At t*, AI becomes better and the graph takes a sharp turn upward. After that, the growth slows as the value approaches its asymptote of R.

This graph has several interesting attributes. First, AI doesn’t help at all until t*, when it catches up with people. Second, the growth rate of value (i.e., the slope of the curve) is zero while humans are better, then it lurches upward at t*, then the growth rate falls exponentially back to zero. And third, most of the improvement that AI can provide will be realized in a fairly short period after t*.

Viewed over a long time-frame, this graph looks a lot like a step function: the effect of AI is a sudden step up in the value created for this task. The step happens in a brief interval after AI passes human performance. Before and after that interval, the value doesn’t change much at all.

Of course, this simple model can’t be the whole story. Perhaps a better solution to this task enables other tasks to be done more effectively, multiplying the improvement. Perhaps people consume more of this tasks’s output because it is better. For these and other reasons, things will probably be somewhat better than this model predicts. But the model is still a long way from establishing that any kind of intelligence explosion or Singularity is going to happen.

Next time, we’ll dive into the question of how different AI tasks are connected, and how to think about the Singularity in a world where task-specific AI is all we have.

Singularity Skepticism 3: How to Measure AI Performance

[This is the third post in a series. The other posts are here: 1 2 4]

On Thursday I wrote about progress in computer chess, and how a graph of Elo rating (which I called the natural measure of playing skill) versus time showed remarkably consistent linear improvement over several decades. I used this to argue that sometimes exponential improvements in the inputs to AI systems (computer speed and algorithms) lead to less-than-exponential improvement in AI performance.

Readers had various objections to this. Some said that linear improvements in Elo rating should really be seen as exponential improvements in quality; and some said that the arrival of the new AI program AlphaZero (which did not appear in my graph and was not discussed in my post) is a game-changer that invalidates my argument.  I’ll address those objections in this post.

First, let’s talk about how we measure AI performance. For chess, I used Elo rating, which is defined so that if Player A has a rating 100 points higher than Player B, we should expect A to collect 64% of the points when playing B. (Winning a game is one point, a drawn game is half a point for each player, and losing gets you zero points.)

There is an alternative rating system, which I’ll call ExpElo, which turns out to be equivalent to Elo in its predictions.  Your ExpElo rating is determined by exponentiating your Elo rating. Where Elo uses the difference of two player’s ratings to predict win percentage, ExpElo uses a ratio of the ratings. Both Elo and ExpElo are equally compelling from an abstract mathematical standpoint, and they are entirely equivalent in their predictions.  But where a graph of improvement in Elo is linear, a graph of improvement in ExpElo would be exponential. So is the growth in chess performance linear or exponential?

Before addressing that question, let’s stop to consider that this situation is not unique to chess. Any linearly growing metric can be rescaled (by exponentiating the metric) to get a new metric that grows exponentially. And any exponentially growing metric can be rescaled (by taking the logarithm) to get a new metric that grows linearly.  So for any quantity that is improving, we will always be able to choose between a metric that grows linearly and one that grows exponentially.

The key question for thinking about AI is: which metric is the most natural measure of what we mean by intelligence on this particular task? For chess, I argue that this is Elo (and not ExpElo).  Long before this AI debate, Arpad Elo proposed the Elo system and that was the one adopted by chess officials.  The U.S. Chess Federation divides players into skill classes (master, expert, A, B, C, and so on) that are evenly spaced, 200 Elo points wide. For classifying human chess performance, Elo was chosen. So why should we switch to a different metric for thinking about AI?

Now here’s the plot twist: the growth in computer chess rating, whether Elo or ExpElo, is likely to level off soon, because the best computers seem to be approaching perfect play, and you can’t get better than perfect.

In every chess position, there is some move (or moves) that is optimal, in the sense of leading to the best possible game outcome.  For an extremely strong player, we might ask what that player’s error rate is: in high-level play, for what fraction of the positions it encounters will it make a non-optimal move?

Suppose a player, Alice, has an error rate of 1%, and suppose (again to simplify the explanation) that a chess game lasts fifty moves for each player. Then in the long run Alice will make a non-optimal move once every two games–in half of the games she will play optimally.  This implies that if Alice plays a chess match against God (who always makes optimal moves), Alice will get at least 25% of the points, because she will play God evenly in the half of games where she makes all optimal moves, and (worst case) she will lose the games where she errs.  And if Alice can score at least 25% against God, then Alice’s Elo rating is no more than 200 points below God’s. The upshot is that there is some rating–the “Rating of God”–that cannot be exceeded, and that is true in both Elo and ExpElo systems.

Clever research by Ken Regan and others has shown that the best chess programs today have fairly low error rates and therefore are approaching the Rating of God.  Regan’s research suggests that the RoG is around 3600, which is notable because the best program on my graph, Stockfish, is around 3400, and AlphaZero, the new AI chess player from Google’s DeepMind, may be around 3500. If Regan’s estimate is right, then AlphaZero is playing the majority of its games optimally and would score about 36% against God.  The historical growth rate of AI Elo ratings has been about 50 points per year, so it would appear that growth can continue for only a couple of years before leveling off. Whether the growth in chess performance has been linear or exponential so far, it seems likely to flatline within a few years.