January 11, 2025

Did a denial-of-service attack cause the stock-market "flash crash?"

On May 6, 2010, the stock market experienced a “flash crash”; the Dow plunged 998 points (most of which was in just a few minutes) before (mostly) recovering. Nobody was quite sure what caused it. An interesting theory from Nanex.com, based on extensive analysis of the actual electronic stock-quote traffic in the markets that day and other days, is that the flash crash was caused (perhaps inadvertently) by a kind of denial-of-service attack by a market participant. They write,

While analyzing HFT (High Frequency Trading) quote counts, we were shocked to find cases where one exchange was sending an extremely high number of quotes for one stock in a single second: as high as 5,000 quotes in 1 second! During May 6, there were hundreds of times that a single stock had over 1,000 quotes from one exchange in a single second. Even more disturbing, there doesn’t seem to be any economic justification for this.

They call this practice “quote stuffing”, and they present detailed graphs and statistics to back up their claim.

The consequence of “quote stuffing” is that prices on the New York Stock Exchange (NYSE), which bore the brunt of this bogus quote traffic, lagged behind prices on other exchanges. Thus, when the market started dropping, quotes on the NYSE were higher than on other exchanges, which caused a huge amount of inter-exchange arbitrage, perhaps exacerbating the crash.

Why would someone want to do quote stuffing? The authors write,

After thoughtful analysis, we can only think of one [reason]. Competition between HFT systems today has reached the point where microseconds matter. Any edge one has to process information faster than a competitor makes all the difference in this game. If you could generate a large number of quotes that your competitors have to process, but you can ignore since you generated them, you gain valuable processing time. This is an extremely disturbing development, because as more HFT systems start doing this, it is only a matter of time before quote-stuffing shuts down the entire market from congestion.

The authors propose a “50ms quote expiration rule” that they claim would eliminate quote-stuffing.

I am not an expert on finance, so I cannot completely evaluate whether this article makes sense. Perhaps it is in the category of “interesting if true, and interesting anyway”.

How Not to Fix Soccer

With the World Cup comes the quadrennial ritual in which Americans try to redesign and improve the rules of soccer. As usual, it’s a bad idea to redesign something you don’t understand—and indeed, most of the proposed changes would be harmful. What has surprised me, though, is how rarely anyone explains the rationale behind soccer’s rules. Once you understand the rationale, the rules will make a lot more sense.

So here’s the logic underlying soccer’s rules: the game is supposed to scale down, so that an ordinary youth or recreation-league game can be played under the exact same rules used by the pros. This means that the rules must be designed so that the game can be run by a single referee, without any special equipment such as a scoreboard.

Most of the popular American team sports don’t scale down in this way. American football, basketball, and hockey — the most common inspirations for “reformed” soccer rules — all require multiple referees and special equipment. To scale these sports down, you have to change the rules. For example, playground basketball has no shot clock, no counting of fouls, and nonstandard rules for awarding free throws and handling restarts—it’s fun but it’s not the same game the Lakers play. Baseball is the one popular American spectator sport that does scale down.

The scaling principle accounts for soccer’s seemingly odd timekeeping. The clock isn’t stopped and started, because we can’t assume a separate timekeeping official and we don’t want to burden the referee’s attention with a lot of clock management. The time is not displayed to the players, because we can’t assume the availability of a scoreboard. And because the players don’t know the exact remaining time, the referee gives the players some leeway to finish an attack even if the nominal finishing time has been reached. Most of the scalable sports lack a clock — think of baseball and volleyball — but soccer manages to reconcile a clock with scalability. Americans often want to “fix” this by switching to a scheme that requires a scoreboard and timekeeper.

The scaling principle also explains the system of yellow and red cards. A hockey-style penalty box system requires special timing and (realistically) a special referee to manage the penalty box and timer. Basketball-style foul handling allows penalties to mount up as more fouls are committed by the same player or team, which is good, but it requires elaborate bookkeeping to keep track of fouls committed by each player and team fouls per half. We don’t want to make the soccer referee keep such detailed records, so we simply ask him to record yellow and red cards, which are rare. He uses his judgment to decide when repeated fouls merit a yellow card. This may seem arbitrary in particular cases but it does seem fair on average. (There’s a longer essay that could be written applying the theory of efficient liability regimes to the design of sports penalties.)

It’s no accident, I think, that scalable sports such as soccer and baseball/softball are played by many Americans who typically watch non-scalable sports. There’s something satisfying about playing the same game that the pros play. So, my fellow Americans, if you’re going to fix soccer, please keep the game simple enough that the rest of us can still play it.

Rebooting the CS Publication Process

The job of an academic is to conduct research, and that means publishing manuscripts for the world to read. Computer science is somewhat unusual, among the other disciplines in science and engineering, in that our primary research output goes to highly competitive conferences rather than journals. Acceptance rates at the “top” conferences are often 15% or lower, and the process of accepting those papers and rejecting the rest is famously problematic, particularly for the papers on the bubble.

Consequently, a number of computer scientists have been writing about making changes to the way we do what we do. Some changes may be fairly modest, like increasing acceptance rates by fiat, and eliminating printed paper proceedings to save costs. Other changes would be more invasive and require more coordination.

If we wanted to make a concerted effort to really overhaul the process, what would we do? If we can legitimately concern ourselves with “clean slate” redesign of the Internet as an academic discipline, why not look at our own processes in the same light? I raised this during the rump session of the last HotOS Workshop and it seemed to really get the room talking. The discipline of computer science is clearly ready to have this discussion.

Over the past few months, I’ve been working on and off to flesh out how a clean-slate publishing process might work, taking advantage of our ability to build sophisticated tools to manage the process, and including a story for how we might get from here to there. I’ve written this up as a manuscript and I’d like to invite our blog readers, academic or otherwise, to read it over and offer their feedback. At some point, I’ll probably compress this down to fit the tight word limit of a CACM article, but first things first.

Have a look. Post your feedback here on Freedom to Tinker or send me an email and I’ll followup, no doubt with a newer draft of my manuscript.

Developing Texts Like We Develop Software

Recently I was asked to speak at a conference for university librarians, about how the future of academic publication looks to me as a computer scientist. It’s an interesting question. What do computer scientists have to teach humanists about how to write? Surely not our elegant prose style.

There is something distinctive about how computer scientists write: we tend to use software development tools to “develop” our texts. This seems natural to us. A software program, after all, is just a big text, and the software developers are the authors of the text. If a tool is good for developing the large, complex, finicky text that is a program, why not use it for more traditional texts as well?

Like software developers, computer scientist writers tend to use version control systems. These are software tools that track and manage different versions of a text. What makes them valuable is not just the ability to “roll back” to old versions — you can get that (albeit awkwardly) by keeping multiple copies of a file. The big win with version control tools is the level of control they give you. Who wrote this line? What did Joe write last Tuesday? Notify me every time section 4 changes. Undo the changes Fred made last Wednesday, but leave all subsequent changes in place. And so on. Version control systems are a much more powerful relative of the “track changes” and “review” features of standard word processors.

Another big advantage of advanced version control is that it enables parallel development, a style of operation in which multiple people can work on the text, separately, at the same time. Of course, it’s easy to work in parallel. What’s hard is to merge the parallel changes into a coherent final product — which is a huge pain in the neck with traditional editing tools, but is easy and natural with a good version control system. Parallel development lets you turn out a high-quality product faster — it’s a necessity when you have hundred or thousands of programmers working on the same product — and it vastly reduces the amount of human effort spent on coordination. You still need coordination, of course, but you can focus it where it matters, on the conceptual clarity of the document, without getting distracted by version-wrangling.

Interestingly, version control and parallel development turn out to be useful even for single-author works. Version control lets you undo your mistakes, and to reconstruct the history of a problematic section. Parallel development is useful if you want to try an experiment — what happens if I swap sections 3 and 4? — and try out this new approach for a while yet retain the ability to accept or reject the experiment as a whole. These tools are so useful that experienced computer scientists tend to use them to write almost anything longer than a blog post.

While version control and parallel development have become standard in computer science writing, there are other software development practices that are only starting to cross the line into CS writing: issue tracking and the release early and often strategy.

Issue tracking systems are used to keep track of problems, bugs, and other issues that need to be addressed in a text. As with version control, you can do this manually, or rely on a simple to-do list, but specialized tools are more powerful and give you better control and better visibility into the past. As with software, issues can range from small problems (our terminology for X is confusing) to larger challenges (it would be nice if our dataset were bigger).

“Release early and often” is a strategy for rapidly improving a text by making it available to users (or readers), getting feedback, and rapidly turning out a new version that addresses the feedback. Users’ critiques become issues in the issue tracking system; authors modify the text to address the most urgent issues; and a new version is released as soon as the text stabilizes. The result is rapid improvement, aligned with the true desires of users. This approach requires the right attitude from users, who need to be willing to tolerate problems, in exchange for a promise that their critiques will be addressed promptly.

What does all of this mean for writers who are not computer scientists? I won’t be so bold as to say that the future of writing will be just exactly like software development. But I do think that the tools and techniques of software development, which are already widely used by computer scientist writers, will diffuse into more common usage. It will be hard to retrofit them into today’s large, well-established editing software, but as writing tools move into the cloud, I wouldn’t be surprised to see them take on more of the attributes of today’s software development tools.

One consequence of using these tools is that you end up with a fairly complete record of how the text developed over time, and why. Imagine having a record like that for the great works of the past. We could know what the writer did every hour, every day while writing. We could know which issues and problems the author perceived in earlier versions of the text, and how these were addressed. We could know which issues the author saw as still unfixed in the final published text. This kind of visibility will be available into our future writing — assuming we produce works that are worthy of study.

The Gizmodo Warrant: Searching Journalists in the Terabyte Age

Last Friday night, police officers in California used a warrant to search the home of Jason Chen, the Gizmodo blogger who wrote about the iPhone prototype found in a Redwood City bar. Orin Kerr has written an interesting post assessing the legality of the search. I wanted to touch on an important issue he didn’t discuss: Whether the search the police are conducting is unconstitutionally overbroad.

Orin discusses two laws that specifically shield journalists from being the target of a search, the California Reporter’s Shield Law, found jointly at California Penal Code 1524(g) and California Evidence Code 1070, and the federal Privacy Protection Act (PPA), 42 U.S.C. 2000aa. Both laws were written to limit the impact of Zurcher v. Stanford Daily, a U.S. Supreme Court case authorizing the use of a warrant to search a newspaper’s offices. The Supreme Court decided Zurcher in 1978, and Congress enacted the PPA in 1980 (and amended it in unrelated ways in 1996). I’m not sure when the California law was enacted, but I bet it’s of similar vintage. In other words, all of the rules that govern police searches of news offices were created in the age of typewriters, desks, filing cabinets, and stacks of paper.

Now, flash forward thirty years. The police who searched Jason Chen’s home seized the following: A macbook, HP server, two Dell desktop computers, iPad, ThinkPad, two MacBook Pros, IOmega NAS, three external hard drives, and three flash drives. They also seized other storage-containing devices, including two digital cameras and two smart phones. If Jason Chen’s computing habits are anything like mine, the police likely seized many terabytes of disk space, storing hundreds of thousands (millions?) of files, containing information stretching back years. And they took all of this information to investigate an alleged crime (the sale of the iPhone prototype) that could not have happened more than 37 days before the search (the iPhone was found on March 18th), which they learned about from a blog post published four days before the search.

I’m deeply concerned about overbreadth as the police begin to search through these terabytes of information. The police now possess, intermingled with the evidence of the alleged crime they are investigating, hundreds of thousands of documents belonging to a journalist/blogger that are utterly irrelevant to their investigation. Jason Chen has been blogging for Gizmodo since 2006, and he’s probably written hundreds of stories. The police likely have thousands of email messages revealing confidential sources, detailing meetings, and trading comments with editors, and thousands of other documents bearing notes from interviews, drafts of articles, and other sensitive information. Because of Chen’s beat, some of these documents probably reveal secrets of great economic and business value in the Silicon Valley. Under traditional, outmoded Fourth Amendment rules, the police can read every single document they possess, so long as they intend only to look for evidence of the crime, and under the “plain view rule,” they can use any evidence they find of other, unrelated crimes in court against Chen or anyone else.

If the California state courts share my concerns about overbreadth, they should consider embracing the very sensible rules for search warrants for computer hard drives (in any case, not just those involving journalists) adopted last year by the Ninth Circuit in United States v. Comprehensive Drug Testing. To paraphrase, in cases involving the search and seizure of computers, the Ninth Circuit requires five things: (1) the government must waive the plain view rule, meaning they must agree not to use evidence of crimes other than the one under investigation that led to the warrant; (2) the government must wall off the forensic experts who search the hard drive from the investigating the case; (3) the government must explain the “actual risks of destruction of information” they would face if they weren’t allowed to seize entire computers; (4) the government must use a search protocol to designate what information they can give to the investigating agents; and (5) the government must destroy or return non-responsive data.

These rules are especially needed when the target of a police search is a journalist (in fact, they may not go far enough). And these rules may be required under Zurcher. In justifying the search of the newspaper’s offices in Zurcher, the Supreme Court agreed that when the Fourth Amendment’s search and seizure rules collide with First Amendment values, like freedom of the press, the “Fourth Amendment must be applied with ‘scrupulous exactitude.'” The court went on to explain why ordinary search warrants for news offices (remember, back in the age of paper files) meet this heightened standard:

There is no reason to believe, for example, that magistrates cannot guard against searches of the type, scope, and intrusiveness that would actually interfere with the timely publication of a newspaper. Nor, if the requirements of specificity and reasonableness are properly applied, policed, and observed, will there be any occasion or opportunity for officers to rummage at large in newspaper files or to intrude into or to deter normal editorial and publication decisions.

When the California state courts combine this thirty-year-old statement of the law with the modern realities of terabyte storage devices, they should hold that the Fourth Amendment requires magistrate judges to play an integral and active role in the administration of the search of Jason Chen’s computers and other storage devices. At the very least, the courts should forbid the police from looking at any file timestamped before March 18, 2010, and in addition, they should force the police to comply with the Comprehensive Drug Testing rules. In the terabyte age, these rules are necessary at a minimum to prevent the police from interfering with a free press.