May 26, 2018

Blockchain: What is it good for?

Blockchain and cryptocurrencies are surrounded by world-historic levels of hype and snake oil. For people like me who take the old-fashioned view that technical claims should be backed by sound arguments and evidence, it’s easy to fall into the trap of concluding that there is no there there–and that blockchain and cryptocurrencies are fundamentally useless. This post is my attempt to argue that if we strip away the fluff, some valuable computer science ideas remain.

Let’s start by setting aside the currency part, for now, and focusing on blockchains. The core idea goes back to at least the 1990s: replicate a system’s state across a set of machines; use some kind of distributed consensus algorithm to agree on an append-only log of events that change the state; and use cryptographic hash-chaining to make the log tamper-evident. Much of the legitimate excitement about “blockchain” is driven by the use of this approach to enhance transparency and accountability, by making certain types of actions in a system visible. If an action is recorded in your blockchain, everyone can see it. If it is not in your blockchain, it is ignored as invalid.

An example of this basic approach is certificate transparency, in which certificate authorities (“CAs,” which vouch for digital certificates connecting a cryptographic key to the owner of a DNS name) must publish the certificates they issue on a public list, and systems refuse to accept certificates that are not on the list. This ensures that if a CA issues a certificate without permission from a name’s legitimate owner, the bogus certificate cannot be used without publishing it and thereby enabling the legitimate owner to raise an alarm, potentially leading to public consequences for the misbehaving CA.

In today’s world, with so much talk about the policy advantages of technological transparency, the use of blockchains for transparency can an important tool.

What about cryptocurrencies? There is a lot of debate about whether systems like Bitcoin are genuinely useful as a money transfer technology. Bitcoin has many limitations: transactions take a long time to confirm, and the mining-based consensus mechanism burns a lot of energy. Whether and how these limitations can be overcome is a subject of current research.

Cryptocurrencies are most useful when coupled with “smart contracts,” which allow parties to define the behavior of a virtual actor in code, and have the cryptocurrency’s consensus system enforce that the virtual actor behaves according to its code. The name “smart contract” is misleading, because these mechanisms differ significantly from legal contracts.  (A legal contract is an explicit agreement among an enumerated set of parties that constrains the behavior of those parties and is enforced by ex post remedies. A “smart contract” doesn’t require explicit agreement from parties, doesn’t enumerate participating parties, doesn’t constrain behavior of existing parties but instead creates a new virtual party whose behavior is constrained, and is enforced by ex ante prevention of deviations.) It is precisely these differences that make “smart contracts” useful.

From a computer science standpoint, what is exciting about “smart contracts” is that they let us make conditional payments an integral part of the toolbox for designing distributed protocols. A party can be required to escrow a deposit as a condition of participating in some process, and the return of that deposit, in part or in whole, can be conditioned on the party performing arbitrary required steps, as long as compliance can be checked by a computation.

Another way of viewing the value of “smart contracts” is by observing that we often define correctness for a new distributed protocol by postulating a hypothetical trusted third party who “referees” the protocol, and then proving some kind of equivalence between the new referee-free protocol we have designed and the notional refereed protocol. It sure would be nice if we could just turn the notional referee into a smart contract and let the consensus system enforce correctness.

But all of this requires a “smart contract” system that is efficient and scalable–otherwise the cost of using “smart contracts” will be excessive. Existing systems like Ethereum scale poorly. This too is a problem that will need to be overcome by new research. (Spoiler alert: We’ll be writing here about a research solution in the coming months.)

These are not the only things that blockchain and cryptocurrencies are good for. But I hope they are convincing examples. It’s sad that the hype and snake oil has gotten so extreme that it can be hard to see the benefits. The benefits do exist.


Singularity Skepticism 4: The Value of Avoiding Errors

[This is the fourth in a series of posts. The other posts in the series are here: 1 2 3.]

In the previous post, we did a deep dive into chess ratings, as an example of a system to measure a certain type of intelligence. One of the takeaways was that the process of numerically measuring intelligence, in order to support claims such as “intelligence is increasing exponentially”, is fraught with complexity.

Today I want to wrap up the discussion of quantifying AI intelligence by turning to a broad class of AI systems whose performance is measured as an error rate, that is, the percentage of examples from population for which the system gives a wrong answer. These applications include  facial recognition, image recognition, and so on.

For these sorts of problems, the error rate tends to change over time as shown on this graph:

The human error rate doesn’t change, but the error rate for the AI system tends to fall exponentially, crossing the human error rate at a time we’ll call t*, and continuing to fall after that.

How does this reduction in error rate translate into outcomes? We can get a feel for this using a simple model, where a wrong answer is worth W and a right answer is worth R, with R>W, naturally.

In this model, the value created per decision changes over time as shown in this graph:

Before t*, humans perform better, and the value is unchanged. At t*, AI becomes better and the graph takes a sharp turn upward. After that, the growth slows as the value approaches its asymptote of R.

This graph has several interesting attributes. First, AI doesn’t help at all until t*, when it catches up with people. Second, the growth rate of value (i.e., the slope of the curve) is zero while humans are better, then it lurches upward at t*, then the growth rate falls exponentially back to zero. And third, most of the improvement that AI can provide will be realized in a fairly short period after t*.

Viewed over a long time-frame, this graph looks a lot like a step function: the effect of AI is a sudden step up in the value created for this task. The step happens in a brief interval after AI passes human performance. Before and after that interval, the value doesn’t change much at all.

Of course, this simple model can’t be the whole story. Perhaps a better solution to this task enables other tasks to be done more effectively, multiplying the improvement. Perhaps people consume more of this tasks’s output because it is better. For these and other reasons, things will probably be somewhat better than this model predicts. But the model is still a long way from establishing that any kind of intelligence explosion or Singularity is going to happen.

Next time, we’ll dive into the question of how different AI tasks are connected, and how to think about the Singularity in a world where task-specific AI is all we have.

Singularity Skepticism 3: How to Measure AI Performance

[This is the third post in a series. The other posts are here: 1 2 4]

On Thursday I wrote about progress in computer chess, and how a graph of Elo rating (which I called the natural measure of playing skill) versus time showed remarkably consistent linear improvement over several decades. I used this to argue that sometimes exponential improvements in the inputs to AI systems (computer speed and algorithms) lead to less-than-exponential improvement in AI performance.

Readers had various objections to this. Some said that linear improvements in Elo rating should really be seen as exponential improvements in quality; and some said that the arrival of the new AI program AlphaZero (which did not appear in my graph and was not discussed in my post) is a game-changer that invalidates my argument.  I’ll address those objections in this post.

First, let’s talk about how we measure AI performance. For chess, I used Elo rating, which is defined so that if Player A has a rating 100 points higher than Player B, we should expect A to collect 64% of the points when playing B. (Winning a game is one point, a drawn game is half a point for each player, and losing gets you zero points.)

There is an alternative rating system, which I’ll call ExpElo, which turns out to be equivalent to Elo in its predictions.  Your ExpElo rating is determined by exponentiating your Elo rating. Where Elo uses the difference of two player’s ratings to predict win percentage, ExpElo uses a ratio of the ratings. Both Elo and ExpElo are equally compelling from an abstract mathematical standpoint, and they are entirely equivalent in their predictions.  But where a graph of improvement in Elo is linear, a graph of improvement in ExpElo would be exponential. So is the growth in chess performance linear or exponential?

Before addressing that question, let’s stop to consider that this situation is not unique to chess. Any linearly growing metric can be rescaled (by exponentiating the metric) to get a new metric that grows exponentially. And any exponentially growing metric can be rescaled (by taking the logarithm) to get a new metric that grows linearly.  So for any quantity that is improving, we will always be able to choose between a metric that grows linearly and one that grows exponentially.

The key question for thinking about AI is: which metric is the most natural measure of what we mean by intelligence on this particular task? For chess, I argue that this is Elo (and not ExpElo).  Long before this AI debate, Arpad Elo proposed the Elo system and that was the one adopted by chess officials.  The U.S. Chess Federation divides players into skill classes (master, expert, A, B, C, and so on) that are evenly spaced, 200 Elo points wide. For classifying human chess performance, Elo was chosen. So why should we switch to a different metric for thinking about AI?

Now here’s the plot twist: the growth in computer chess rating, whether Elo or ExpElo, is likely to level off soon, because the best computers seem to be approaching perfect play, and you can’t get better than perfect.

In every chess position, there is some move (or moves) that is optimal, in the sense of leading to the best possible game outcome.  For an extremely strong player, we might ask what that player’s error rate is: in high-level play, for what fraction of the positions it encounters will it make a non-optimal move?

Suppose a player, Alice, has an error rate of 1%, and suppose (again to simplify the explanation) that a chess game lasts fifty moves for each player. Then in the long run Alice will make a non-optimal move once every two games–in half of the games she will play optimally.  This implies that if Alice plays a chess match against God (who always makes optimal moves), Alice will get at least 25% of the points, because she will play God evenly in the half of games where she makes all optimal moves, and (worst case) she will lose the games where she errs.  And if Alice can score at least 25% against God, then Alice’s Elo rating is no more than 200 points below God’s. The upshot is that there is some rating–the “Rating of God”–that cannot be exceeded, and that is true in both Elo and ExpElo systems.

Clever research by Ken Regan and others has shown that the best chess programs today have fairly low error rates and therefore are approaching the Rating of God.  Regan’s research suggests that the RoG is around 3600, which is notable because the best program on my graph, Stockfish, is around 3400, and AlphaZero, the new AI chess player from Google’s DeepMind, may be around 3500. If Regan’s estimate is right, then AlphaZero is playing the majority of its games optimally and would score about 36% against God.  The historical growth rate of AI Elo ratings has been about 50 points per year, so it would appear that growth can continue for only a couple of years before leveling off. Whether the growth in chess performance has been linear or exponential so far, it seems likely to flatline within a few years.