March 26, 2017


Today’s cryptography news is that researchers have discovered a collision in the SHA-1 cryptographic hash function. Though long-expected, this is a notable milestone in the evolution of crypto standards.

Kudos to Marc Stevens, Elie Bursztein, Pierre Karpma, Ange Albertine, and Yarik Markov of CWI Amsterdam and Google Research for their result.

SHA-1 was standardized by NIST in 1995 for use as a cryptographic hash function (or simply “hash”).  Hashes have many uses, most notably as unique short “fingerprints” for data. A secure hash will be collision-resistant, which means it is infeasible to find two files that have the same hash.

One consequence of collision-resistance is that if you want to detect whether anyone has tampered with a file, you can just remember the hash of the file. This is nice because the hash will be small: a SHA-1 hash is only 20 bytes, and other popular hashes are typically 32 bytes. Later, if you want to verify that the file hasn’t changed, you can recompute the hash of the (possibly modified) file , and verify that the result is the same as the hash you remembered. If the hashes of two files are the same, then the files must be the same–otherwise the two files would constitute a collision.

By 2011 it had become clear that SHA-1 was not as strong as expected. Any hash can be defeated by a brute-force search, so hashes are designed so that the cost of brute-force search is too high to be feasible. But methods had been discovered that reduced the cost of finding a collision by a factor of about 100,000 below the cost of a brute-force search.  All was not lost, because even with that cost reduction, defeating SHA-1 was still massively costly by 2011 standards. But the writing was on the wall, and NIST deprecated SHA-1 in 2011, which is standards-speak for advising people to stop using it as soon as practical.

The new result demonstrates a collision in SHA-1. The researchers found two PDF files that have the same hash. This required a lot of computation: 6500 machine-years on standard computers (CPUs), plus 100 machine-years on slightly specialized computers (GPUs).

Today, some systems in the field still rely on SHA-1, despite stronger hashes like SHA-2 getting more use, and the presumably stronger SHA-3 standard was issued in 2015. It is well past time to stop using SHA-1, but the process of phasing it out has taken longer than expected.

There are two lessons here about crypto standards. First, it can take a long time to phase out a standard, even if it is deprecated and known to be vulnerable. Second, the work by NIST and the crypto community to plan ahead, deprecate SHA-1 early, and push forward successor standards, will pay many dividends.

[Post updated (24 Feb) to improve terminology (collision-resistant, rather than collision-free), and to reflect the correct status of the SHA-3 standard.]


  1. Jonathan Frankle says:

    Back-of-the-envelope cost of finding a collision on EC2 according to Google’s numbers: ~$3 million.

  2. When you say researchers “discovered a collision,” do you mean they discovered on “in the wild?”