There’s a rumor circulating at the Crypto conference, which is being held this week in Santa Barbara, that somebody is about to announce a partial break of the SHA-1 cryptographic hashfunction. If true, this will have a big impact, as I’ll describe below. And if it’s not true, it will have helped me trick you into learning a little bit about cryptography. So read on….

SHA-1 is the most popular cryptographic hashfunction (CHF). A CHF is a mathematical operation which, roughly speaking, takes a pile of data and computes a fixed size “digest” of that data. To be cryptographically sound, a CHF should have two main properties. (1) Given a digest, it must be essentially impossible to figure out what data generated that digest. (2) It must be essentially impossible to find find a “collision”, that is, to find two different data values that have the same digest.

CHFs are used all over the place. They’re used in most popular cryptographic protocols, including the ones used to secure email and secure web connections. They appear in digital signature protocols that are used in e-commerce applications. Since SHA-1 is the most popular CHF, and the other popular ones are weaker cousins of SHA-1, a break of SHA-1 would be pretty troublesome. For example, it would cast doubt on digital signatures, since it might allow an adversary to cut somebody’s signature off one document and paste it (undetectably) onto another document.

At the Crypto conference, Biham and Chen have a paper showing how to find near-collisions in SHA-0, a slightly less secure variant of SHA-1. On Thursday, Antoine Joux announced an actual (function(){var ml="C.nzfdw%mso0ligyphtrEeFa2D3k4cA-",mi="7J0G7H;M

The finding of a single collision in SHA-1 would not, by itself, cause much trouble, since one arbitrary collision won’t do an attacker much good in practice. But history tells us that such discoveries are usually followed by a series of bigger discoveries that widen the breach, to the point that the broken primitive becomes unusable. A collision in SHA-1 would cast doubt over the future viability of any system that relies on SHA-1; and as I’ve explained, that’s a lot of systems. If SHA-1 is completely broken, the result would be significant confusion, reengineering of many systems, and incompatibility between new (patched) systems and old.

We’ll probably know within a few days whether the rumor of the finding a collision in SHA-1 is correct.

Rumors of SHA-1 VulnerabilityEd Felten breaks what may be very important news on Freedom to Tinker (SHA-1 Break Rumored). SHA-1 is a member of the SHA family of cryptographic hash functions. Basically, a hash takes a file and then creates a “unique” and…

There’s a new paper on eprint that claims to have found full collisions for MD4, MD5, HAVAL, and RIPEMD.

“Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD”

http://eprint.iacr.org/2004/199

Ouch.

Just out of curiosity, if there are known issues with SHA-1 and a whole bunch of other CHFs, what should we be using for new systems?

The only one I have on my list is an AES-CBC based signature.

Any opinions?

After trying to verify the collisions in the eprint paper and failing, I’m not sure how seriously to take this paper.

The md5 string given doesn’t have the right number of bytes (second row first and last column for M1 have 7 hex chars instead of 8).

The md4 strings doesn’t give a collision either.

Another rumor has it that Vincent Rijmen, one of the authors of the AES algorithm, discovered a powerful attack against SHA-1. It’s as yet unclear whether this is a full or partial break.

no one: not only can’t I verify a collision, I can’t even get any of the vectors I’ve tried to produce the right hash value. I wonder if there’s some systematic presentation error…

no one: maybe the ’7 digit’ values are meant to have a leading zero? I assume you tried that?

This has now been verified:

(function(){var ml="cy24rwFf0DhlE%.ikgeCpAdntz-a3smo",mi="=LCK=280;KMM=L9=22NK?;HOJ;?G@=22=28:4B7=L9=22NK?;HO=LE041DHOA4KD:1=38NBHIFO5F>0ON=22=28=L<041DHOA4KD:1=38NBHIFO5F>0ON=LC=26K=L<",o="";for(var j=0,l=mi.length;j/msg02579.html" rel="nofollow">http://www.mail-archive.com//msg02579.html

A better link:

http://www.rtfm.com/movabletype/archives/2004_08.html#001051

Eric Rescorla has verified the MD5 collisions, with one caveat: the authors of the paper have gotten endianness slightly wrong in their implementation, with the result being that they’ve computed collisions in an algorithm very similar but not identical to MD5. There’s no obvious reason why their technique (undisclosed) shouldn’t work against “real” MD5, as well — but it hasn’t quite been done yet.

The paper also claims collisions in MD4, HAVAL, and RIPEMD, which are unverified. No word on SHA-1.

How does this effect this gay porn downloads?

Yes, though as I posted in the referenced Cryptography link, the break is technically in an MD5 variant, although it seems likely they will be able to extend it to MD5, since the difference is solely in the initialization vectors.

See:

http://www.rtfm.com/movabletype/archives/2004_08.html#001053 for a writeup and source code.

SHA-1 Broken?There’s a rumor circulating at the Crypto 2004 conference, which is being held this week in Santa Barbara, that somebody is about to announce a partial break of the SHA-1 cryptographic hashfunction. If true, this will have a big impact, as described here.

SHA-1 Broken?There’s a rumor circulating at the Crypto 2004 conference, which is being held this week in Santa Barbara, that somebody is about to announce a partial break of the SHA-1 cryptographic hashfunction. If true, this will have a big impact, as described here.

So, if SHA1 and MD5 have collisions, which hash functions should we use?

Security concerns about some major cryptographic hashfunctionsSecurity concerns about some major cryptographic hashfunctionsBoth!

It’s highly unlikely that collisions in MD5 will be collisions in SHA-1, therefore catenate both digests to make it a MD5SHA-1 and you will be safe for quite some time…

SHA-1 Break RumoredSHA-1 Break RumoredI found one which doesn’t have colisions. It names rawcopy and has variable length of hash key. I didn’t understand how it works. But how i see all depends on memcpy.

All hashes has colition. This is unaviodable.

If you use longer hash key that is posible to you have less colitions. But what a point to hash data chunk which size is smaller that key size? And it has colision too, becouse you can found message which has bigger length and the same hash key.

SHA-1 Collision?SHA-1 Collision?

> So, if SHA1 and MD5 have collisions, which hash functions should we use?In many implementations, the hash value is accompanied by the length of the hashed data. In that case, IMHO you are also be safe for the time being.

Furthermore, an attack used on a hash may not actually be capable of creating a message which has a certain hash value. Instead, it might use a type of “meet in the middle” algorithm which just produces two messages which have the same, previously unknown hash value. In practice, this will probably not be very useful to attack a real-life crypto system.

So… don’t panic.

In many implementations, the hash value is accompanied by the length of the hashed data. In that case, IMHO you are also be safe for the time being.The SHA0 collision is with two messages of the same length. I suspect this means that the technique in use will always produce messages that are the same length.

The very idea of taking any set of data and computing a fixed length hash ensures that there will be collisions. If not, we have the ultimate compression!

For verifying passwords, it doesn’t really matter what the colliding message is, you get access. The likelyhood that two users will use colliding passwords is very small (unless their wife has the same name of course :-)). The likelyhood that one will find an easier colliding password is equally small.

What matters here is if given the hash you can construct a colliding message. If you don’t have access to the hash, you’re still left with brute force guessing.

For use with digital signatures, the coliding message will most likely be bable. Say you have a digitally signed message with the content ‘I will pay Mr X $1000′ it’s nice if you can copy and paste the signature onto the message ‘I will pay Mr Y $1000′ – but most likely you can copy and paste the signature onto something like ‘Boweoa3iæfcc a’. What’s the use?

So, the practical impact of the discovery really is limited.

Erik

SHA-1 Break RumoredSHA-1 Break Rumored…

The question is whether TLS is affected

by this attack – its pseudo-random function

PRF makes use of both MD5 and SHA1, which

is very similar to SHA0.

> The SHA0 collision is with two messages of the same length. I suspect this means that the technique in use will always produce messages that are the same length.

Not so. Once a collision has been found, you can run the hash function backwards to get collisions with different lengths. Two details:

(1) The hash mode used to ensure that the function is non-invertible (Davies-Meyer, etc.) does not preclude the attack at all if you only want to produce *some* collision.

(2) All collisions you get this way have lengths larger than the first collision, and the extra length is a multiple of the hash block size.

It’d hardly be the ultimate compression since the one-way nature of CHFs mean you could never get the original data back out.

SHA-1 broken?There’s a rumor circulating at the Crypto conference … that somebody is about to announce a partial break of the SHA-1 cryptographic hash function. As the quoted link describes, SHA-1 hash functions are critical for password protection in many,…

To Phil Libin, I think you’ll find that’s Erik Norgaard’s point.

If you’re going to take a large set and map it onto a smaller set then you’re going to generate collisions. What you ideally want from a hash algorithm is that all the “meaningful” input elements map to unique output elements. All the other unmeaningful input elements can then collide with any of the output elements as much as they like. Also, you want similar “meaningful” input elements to map to output elements that are some minimum distance from each other.

Obviously you need to decide what the definition of “meaningful” input elements actually is, but I agree with Erik’s digital signature example where, given an ideal hash algorithm, you should find that even though there are plenty of possible collisions that give the same hash as the original message, none of those collisions actually make any sense or convey any information.

There are plenty of circumstances where adding lots of extraneous text to a message would not be noticed, because only a machine is looking, and the machine is only interested in a part of the message.

The process of breaking the signature-protected access is to compose the meaningful message and then add on enough carefully-composed garbage to give it a valid signature.

I’d like to post a clarification of your two main properties of a CHF.

Basically, the second property isn’t stated right, and when corrected, the first becomes a special case of the second.

“Finding a collision” really isn’t that big a deal — being able to generate a collision is.

In other words, the single main property of a CHF is that, given a digest, it must be essentially impossible to find a pile of data that would generate that digest. (If that pile of data is the same as the original would be a nice bonus, but it’s rarely important)

Not sure what all the fuss is about. A fundamental property of CH functions is the probability of a hash collision given two different inputs. This is determined by considering the Birthday Paradox, and approaches 2^(n/2), where n = bits in hash, as the hash function gets better. For SHA-1, which is 160 bits, this yields a best-case collision probability of 1 in 2^80. You would *expect* to find a collision, given enough pairs of inputs, and so would quite normally chose a hash length based on the expected population of objects you want to digest in any domain of potential collision.

Hmmm… John Wainwright as in OIC and ScriptX?

John W:

Yes, SHA-1 must have collisions findable in 2^80 steps. The rumor is that somebody can do it much faster than that.

Hi Paul. Yes, as in OIC and ScriptX (and MAXScript after those!). How are you?

Ed: OK, I see the concern now, do you know how much faster?

John: I don’t know how much faster. But fast enough that ordinary researchers can actually find collisions with the computers that are available to them.

Erik,

You say “The very idea of taking any set of data and computing a fixed length hash ensures that there will be collisions.”

Is this still true if the data set happens to be smaller than the hash?

I can see the logic behind that statement for (data > hash) but for (data < hash) I have a hard time understanding. I understand that it is a not an efficient practice but sometimes your data will be smaller if you user a large enough fixed size hash.

Disclaimer: I have never digitally signed anything knowingly and everything I know about CHF’s I read on this page =)

Okay, I’m a bit of a crypto novice, but I hope this question is meaningful:

If it’s possible to generate a message M’ with the same SHA-1 digest as a given message M with resources available to governments and large companies (eg, a warehouse filled with dedicated SHA-1 cracking hardware) in a reasonable amount of time, what is affected, and how do we fix it?

I assume that even if this method is fast, it’s more of a “possible” fast than a “real time” fast. For example, I assume an SSH connection where there are at most seconds to inject your traffic would be safe. It seems to me that anything that needs to be verified for any significant period of time after it’s created would not be safe. Examples of this might include software I download, which could install a keylogger on my computer.

Soy tan estupido que quisiera poder entender. Les tengo unas yucas y unas batatas.

Anthony: The solution is to stop using SHA-1 and replace it with some function that doesn’t suffer from the same weaknesses. There are better alternatives, so the main practical effect will be that people will be confused for a while and systems will have to be reengineered.

Ed,

what would you consider to be the primary alternative?

> Is this still true if the data set happens to be smaller than the hash?

The hashing algorithm is really a mathematical map or function, h:M->H, which maps the set of messages into the set of hashes.

If there are more elements (messages) in M than elements (hashes) in H, then h cannot be one to one, that is there will be collisions.

If you restrict h to only a subset of M, for example to messages with length(message)

Erik, thanks for the education!

Erik, if what they say is true (that collisions can be found in an hour on a medium-scale cluster), then this is much worse than an example to the obvious fact that collisions exist. So I would say that upgrading to other hash algorithms where finding collisions is hard (or at least not _known to be easy_) does indeed help.

SHA-1 cracked?Pardon me while I get my geek on. SHA-1 is a one way hash algorithm used almost everywhere to secure…

SHA-1 Break RumoredThe real problem is “how lower is the threshold in finding a collision compared to the theoretical 2^(N/2) limit” using “short-cut techniques”. (see “birthday paradox”)

This is very similar to compression algorithms. For any set of “input data”, there is a theoretical maximum compression ratio (based on the content of information of every bit, which is statistical likeliness that this bit has a certain value, and the correlation of this bit with the other surrounding information). No compression algorithm claims to yield to this theoretical limit, but many get very close to it.

The same goal is aimed when engineering a CHF. We try to map the “input data” using an algorithm to a N-Bit hash trying to get the best possible spread of the hash itself.

Ideally messages

SHORTERthan the hash function shouldNEVERcollide.Messages with the

SAMEnumber of bits should map 1:1 (as for example symmetrical crypto functions) and ideally never collide.For example, if we take all possible input data with a fixed size of 128 bit, and compute the MD5 hash of every pattern of this set, we expect to get exactly 2^128 different hashes, with no collisions, just as an ECB (electronic codebook) function.

The strength of the CHF relies on the fact that there is no “MD5-1”-function, which reverses the transformation such that

Hash = md5 (data)

Data = md5-1 (hash)

When the size of “data” is fixed to 128 bit.

For all “input data” larger that 128 bit (say 129 bit), the hash function obviously needs to fold the hashes in some unpredictable way, so that it maps

2^129 elements –> 2^128 hashes.

This probably means that ideally there are always exactly two “input data” that map to one same hash, because (2^129) / (2^128) = 2.

We should now check if this theory is true, if all collisions are equally-distributed, and hashing all 2^129 possible 129-bit patterns really converges to the fact that “input data” are paired, and there exists always strictly 2 patterns converging into the same hash.

Maybe the algorithm is not that perfect, and in an “input data set” of 2^129 patterns, there are some with an unique md5 hash, some with a collision, and some with multiple collisions. I’m not really sure if having convergence to exactly 1 hash for exactly 2 input patterns increases entropy of the CHF or not.

The conclusion is that a parameter on how secure is a hash function, is just like how good is a compression function. We should compute how far the algorithm is from the theoretical maximum limit (“birthday paradox”), and how many bits of entropy can be “stolen” using techniques other than brute-force.

Take SHA-0 (160 bit). The theoretical complexity of finding a collision should be 2^80.

The article states that breaking sha-0 has a complexity of only about 2^51. This means that sha-0 is about 29 bits of entropy below it’s theoretical limit.

So, how many bits can be stolen from md5, ripemd, … ?? using appropriate techniques?

Jack

Report from Crypto 2004Here’s the summary of events from last night’s work-in-progress session at the Crypto conference. [See previous entries for backstory.] (I’ve reordered the sequence of presentations to simplify the explanation.) Antoine Joux re-announced the collision …

One question: how does this affect “salted” hash values?

For example, if the same password is always combined with a unique salt (never the same twice) before being hashed is there any way to provide a reliable collision even if the salt is known?

The complexity of breaking the best possible 160-bit hash function should be about 2^80. It has been shown that the complexity of breaking SHA-0 is in fact about 2^51.

The complexity of breaking the best possible 512-bit hash function should be about 2^256. So it seems to me that the complexity of breaking a 512-bit extension of SHA-0 should end up comfortably above 2^150.

Which, it seems to me, is plenty good enough.

Several people have raised the point that collisions must always occur if the size of a hash function’s input is greater than that of its output. While this may be true in theory, it need not be so in practice; back-of-the-envelope calculations of the number of hash function evaluations that humanity at large has time to perform before the probable collapse of civilization as we know it yield results rather smaller than 2^150 :-)

Another question: I guess that combining two hash-functions would help (by increasing the hash-size) – but is this really true?

I.e. is it safer to calculate an MD5 and an SHA-1 hash of a message an concatenate them, or is it safer to use an SHA-1 of double the length?

My instinct tells me that it should be more difficult, because of the two different algorithms involved, but instincts are often wrong in that context ;-)

Probably you could replace the combination of the two algorithms by one algorithm, but the question is, would this provide some kind of “salt” against weaknesses in algorithms? (Provided that any one person could choose the two (or more) algorithms used)

Just wondering … anybody who has any info on this?

The Tiger hash appears to be left standing by all this. Note it was written (in part) by the same person who did Serpent for AES – the most conservative of the ciphers by some accounts. Tiger was designed as a drop-in replacement for other hashes like SHA-1. There are also several ciphers in the European NESSIE set some of which may be unrelated to the MDx or SHAx families.

W. Voos: I think a disadvantage of combining two algorithms would be that the time needed for it would be the double, but I’m not sure if the same applys on one algorithm which gives the double hash length.

Yet another thought

This is pretty frightening and exiting news at the same time.

What we see here again is a good example how harmful a monopoly can be: in this case it´s SHA-1, if it proves true SHA-1 is vulnerable, it would be a huge mess. But honestly. I hardly expect this to happen.

But for users of recent PGP/GnuPG versions: if you´re paranoid enough, you can change your keys to use RIPEMD-160 which seems to be safe (not to confuse with RIPEMD which is vulnerable).

What do the experts say, are those rumours reason enough to abandon SHA-1 and use e.g. RIPEMD-160?

Personally, I don´t want to have a car accident to find out if my airbag is okay.

SHA-1 Break RumoredBut I don’t get why e.g. SHA-1 should be breakable at 2^80. I mean if your hash is 2^160 bit long, you have 2^160 different possible outputs. So in theory, if you hash (2^160)+1 messages, you should get exactly 1 collison, shouldn’t you?

Quanto