June 25, 2019

The Stock-market Flash Crash: Attack, Bug, or Gamesmanship?

Andrew wrote last week about the stock market’s May 6 “flash crash”, and whether it might have been caused by a denial-of-service attack. He points to a detailed analysis by nanex.com that unpacks what happened and postulates a DoS attack as a likely cause. The nanex analysis is interesting and suggestive, but I see the situation as more complicated and even more interesting.

Before diving in, two important caveats: First, I don’t have access to raw data about what happened in the market that day, so I will accept the facts as posited by nanex. If nanex’s description is wrong or incomplete, my analysis won’t be right. Second, I am not a lawyer and am not making any claims about what is lawful or unlawful. With that out of the way …

Here’s a short version of what happened, based on the nanex data:
(1) Some market participants sent a large number of quote requests to the New York Stock Exchange (NYSE) computers.
(2) The NYSE normally puts outgoing price quotes into a queue before they are sent out. Because of the high rate of requests, this queue backed up, so that some quotes took a (relatively) long time to be sent out.
(3) A quote lists a price and a time. The NYSE determined the price at the time the quote was put into the queue, and timestamped each quote at the time it left the queue. When the queues backed up, these quotes would be “stale”, in the sense that they had an old, no-longer-accurate price — but their timestamps made them look like up-to-date quotes.
(4) These anomalous quotes confused other market participants, who falsely concluded that a stock’s price on the NYSE differed from its price on other exchanges. This misinformation destabilized the market.
(5) The faster a stock’s price changed, the more out-of-kilter the NYSE quotes would be. So instability bred more instability, and the market dropped precipitously.

The first thing to notice here is that (assuming nanex has the facts right) there appears to have been a bug in the NYSE’s system. If a quote goes out with price P and time T, recipients will assume that the price was P at time T. But the NYSE system apparently generated the price at one time (on entry to the queue) and the timestamp at another time (on exit from the queue). This is wrong: the timestamp should have been generated at the same time as the price.

But notice that this kind of bug won’t cause much trouble under normal conditions, when the queue is short so that the timestamp discrepancy is small. The problem might not have be noticed in normal operation, and might not be caught in testing, unless the testing procedure takes pains to create a long queue and to check for the consistency of timestamps with prices. This looks like the kind of bug that developers dread, where the problem only manifests under unusual conditions, when the system is under a certain kind of strain. This kind of bug is an accident waiting to happen.

To see how the accident might develop and be exploited, let’s consider the behavior of three imaginary people, Alice, Bob, and Claire.

Alice knows the NYSE has this timestamping bug. She knows that if the bug triggers and the NYSE starts issuing dodgy quotes, she can make a lot of money by exploiting the fact that she is the only market participant who has an accurate view of reality. Exploiting the others’ ignorance of real market conditions—and making a ton of money—is just a matter of technique.

Alice acts to exploit her knowledge, deliberately triggering the NYSE bug by flooding the NYSE with quote requests. The nanex analysis implies that this is probably what happened on May 6. Alice’s behavior is ethically questionable, if not illegal. But, the nanex analysis notwithstanding, deliberate triggering of the bug is not the only possibility.

Bob also knows about the bug, but he doesn’t go as far as Alice. Bob programs his systems to exploit the error condition if it happens, but he does nothing to cause the condition. He just waits. If the error condition happens naturally, he will exploit it, but he’ll take care not to cause it himself. This is ethically superior to a deliberate attack (and might be more defensible legally).

(Exercise for readers: Is it ethical for Bob to deliberately refrain from reporting the bug?)

Claire doesn’t know that the NYSE has a bug, but she is a very careful programmer, so she writes code that watches other systems for anomalous behavior and ignores systems that seem to be misbehaving. When the flash crash occurs, Claire’s code detects the dodgy NYSE quotes and ignores them. Claire makes a lot of money, because she is one of the few market participants who are not fooled by the bad quotes. Claire is ethically blameless — her virtuous programming was rewarded. But Claire’s trading behavior might look a lot like Alice’s and Bob’s, so an investigator might suspect Claire of unethical or illegal behavior.

Notice that even if there are no Alices or Bobs, but only virtuous Claires, the market might still have a flash crash and people might make a lot of money from it, even in the absence of a denial-of-service attack or indeed of any unethical behavior. The flood of quote requests that trigged the queue backup might have been caused by another bug somewhere, or by an unforeseen interaction between different systems. Only careful investigation will be able to untangle the causes and figure out who is to blame.

If the nanex analysis is at all correct, it has sobering implications. Financial markets are complex, and when we inject complex, buggy software into them, problems are likely to result. The May flash crash won’t be the last time a financial market gyrates due to software problems.

How Not to Fix Soccer

With the World Cup comes the quadrennial ritual in which Americans try to redesign and improve the rules of soccer. As usual, it’s a bad idea to redesign something you don’t understand—and indeed, most of the proposed changes would be harmful. What has surprised me, though, is how rarely anyone explains the rationale behind soccer’s rules. Once you understand the rationale, the rules will make a lot more sense.

So here’s the logic underlying soccer’s rules: the game is supposed to scale down, so that an ordinary youth or recreation-league game can be played under the exact same rules used by the pros. This means that the rules must be designed so that the game can be run by a single referee, without any special equipment such as a scoreboard.

Most of the popular American team sports don’t scale down in this way. American football, basketball, and hockey — the most common inspirations for “reformed” soccer rules — all require multiple referees and special equipment. To scale these sports down, you have to change the rules. For example, playground basketball has no shot clock, no counting of fouls, and nonstandard rules for awarding free throws and handling restarts—it’s fun but it’s not the same game the Lakers play. Baseball is the one popular American spectator sport that does scale down.

The scaling principle accounts for soccer’s seemingly odd timekeeping. The clock isn’t stopped and started, because we can’t assume a separate timekeeping official and we don’t want to burden the referee’s attention with a lot of clock management. The time is not displayed to the players, because we can’t assume the availability of a scoreboard. And because the players don’t know the exact remaining time, the referee gives the players some leeway to finish an attack even if the nominal finishing time has been reached. Most of the scalable sports lack a clock — think of baseball and volleyball — but soccer manages to reconcile a clock with scalability. Americans often want to “fix” this by switching to a scheme that requires a scoreboard and timekeeper.

The scaling principle also explains the system of yellow and red cards. A hockey-style penalty box system requires special timing and (realistically) a special referee to manage the penalty box and timer. Basketball-style foul handling allows penalties to mount up as more fouls are committed by the same player or team, which is good, but it requires elaborate bookkeeping to keep track of fouls committed by each player and team fouls per half. We don’t want to make the soccer referee keep such detailed records, so we simply ask him to record yellow and red cards, which are rare. He uses his judgment to decide when repeated fouls merit a yellow card. This may seem arbitrary in particular cases but it does seem fair on average. (There’s a longer essay that could be written applying the theory of efficient liability regimes to the design of sports penalties.)

It’s no accident, I think, that scalable sports such as soccer and baseball/softball are played by many Americans who typically watch non-scalable sports. There’s something satisfying about playing the same game that the pros play. So, my fellow Americans, if you’re going to fix soccer, please keep the game simple enough that the rest of us can still play it.

NJ Voting Machines Left Unattended, Despite Court Opinion

It’s Election Day in New Jersey. Longtime readers know that in advance of elections I visit polling places in Princeton, looking for voting machines left unattended, where they are vulnerable to tampering. In the past I have always found unattended machines in multiple polling places.

I hoped this time would be different, given that Judge Feinberg, in her ruling on the New Jersey voting machine case, urged the state not to leave voting machines unattended in public.

Despite the judge’s ruling, I found voting machines unattended in three of the four Princeton polling places I visited on Sunday and Monday. Here are my photos from three polling places.

This morning I cast my ballot on one of these machines.