Let’s talk about hashing passwords

July 12, 2011

I read an article last night (this one in fact) which included the following sentence:

“That said, it is no longer secure to hash your passwords with MD5, much less when it is unsalted.”

I cringed. I understand this sentence comes from a common misunderstanding of what security a cryptographic hash brings to your password protection scheme. Today, we’re going to try and understand it. A most basic understanding of password cracking is assumed.

We know that we do not want attackers to discover our passwords (here forth known as PLAINTEXTs). Discovery of the PLAINTEXT is bad. We need a method to store this password in a manner that, when discovered by an attacker, does not reveal the PLAINTEXT. We need an intractable equation. Our intractable equation will take an input PLAINTEXT and produce a result CIPHERTEXT, such that we cannot feasibly discover PLAINTEXT from CIPHERTEXT. The most commonly used type of intractable equation for password storage is a cryptographic hash.

Cryptographic hashes have, by definition, numerous properties. The property we are interested in for storing our passwords is known as pre-image resistance. More specifically, we want our cryptographic hash to withstand a first pre-image attack. Let’s take a look at the three basic attacks all cryptographic hashes should resist:

  • Collision It is infeasible to find any two different and arbitrary PLAINTEXTs which will hash to the same arbitrary CIPHERTEXT
  • First Pre-Image It is infeasible, given a CIPHERTEXT, to find a PLAINTEXT which will hash to the same CIPHERTEXT
  • Second Pre-Image It is infeasible, given a PLAINTEXT_1 which hashes to CIPHERTEXT, to find another PLAINTEXT_2 which will hash to CIPHERTEXT

It is naturally easier to find a collision for a cryptographic hashing algorithm because of the Birthday Attack. A cryptographic hash with 128 bits, given a pure birthday attack, would require a complexity of roughly 2^64 (to reach a 50% success rate) to find a collision. To brute force for a pre-image, however, would take a complexity of 2^127. That’s 2^127 times we would need to run our hashing algorithm to have a 50% chance of finding a pre-image. Much more difficult. The academic research and vetting process behind these cryptographic hashing algorithms uses properties of these equations in an attempt to reduce that complexity. As breakthroughs in cryptography take place, we learn more about the hashes and can begin to reduce the complexity needed for these attacks.

MD5 is a 128-bit cryptographic hashing algorithm. It “compresses” 512-bit blocks to 128-bits of output. Its collision resistance is currently 2^20, AKA its collision resistance is broken. A modern laptop can run 2^20 iterations of the MD5 hash “quickly” (Wikipedia has a good, quick comparison of attack resistances for popular cryptographic hashes). However, the best known pre-image attack against MD5 reduces those 127 bits to 123.5. After 20 years, MD5′s pre-image resistance is still very strong.

So how do we “crack” these passwords when we can’t conduct a mathematical pre-image attack against them? We use pre-existing knowledge about the PLAINTEXT to reduce the complexity. For example, if we are executing a simple brute-force password attack, eight lowercase letters, the complexity is 26^8. At (27^8)/2 we have a 50% chance to find our plaintext. So how many times smaller is (26^8)/2 than 2^127? 1,629,493,608,081,135,236,420,508,748 times. (No logs, I wanted the full effect). That’s how many times stronger the pre-image resistance of MD5 is when compared to a typical brute-force attack of all eight lowercase letters.

So now we are beginning to see that when attackers crack a MD5 password, they don’t even bother with the mathematical or cryptological properties of the hash. Those properties still keep finding a suitable PLAINTEXT an infeasible task. Instead, we use known information about the plaintext (these are your dictionaries, mangling rules, markov models, etc) to reduce the possibilities we need to attempt. As long as the pre-image resistance of MD5 is greater than the complexity of the typical password attacks, MD5 will retain its viability as a hash to secure passwords.

The fact that someone cracked a bunch of MD5 hashes means nothing about MD5′s suitability as a cryptographic hash function to intractably hash your plaintexts. People aren’t attacking the properties of cryptographic hashes, they are attacking statistical weaknesses in the methods people choose their passwords.

posted in Uncategorized by endeavormac

Follow comments via the RSS Feed | Leave a comment | Trackback URL

3 Comments to "Let’s talk about hashing passwords"

  1. Justin Bailey wrote:

    Thanks for this post – your explanation of “pre-image” resistance is startling and enlightening. Google tells me it would still take a billion years to brute-force a PLAINTEXT, given its CIPHERTEXT.

    However, does that matter with the collision resistance being so much smaller? Most password schemes perform a one-way hash. If an attacker gets the password file and can find a collision, then they have effectively found the password (assuming the application doesn’t compare PLAINTEXT to the actual password somehow). Is MD5 any good in the end?

  2. endeavormac wrote:

    @JustinBailey A collision allows you to create two plaintexts with the same hash, but you do not get to choose what the resulting hash is. So, if I have a fictional password hash abcd8765ef01, and I want to find a plaintext which will hash to this value, I need to be able to choose the resulting hash (IE I need my resulting hash to be abcd8765ef01).

    The Birthday Attack (or Birthday Paradox) doesn’t let you choose the birthday, it just says you will find two people with the same birthday. Because your requirements, that the person be born on, perhaps, the 7th of March, or the resulting hash be abcd8765ef01, means collisions are not useful to you.

    An example where a collision may matter? (I’m skipping some detail, but you’ll get the gist) You create two versions of a file, be it a contract, software, you name it. One version is innocent, and the other is harmful. Now you generate a collision. Your collision will give you values that you place after the changes in the bad/harmful file. The two files will have the same resulting cryptographic hash. You vet the good file (for example have someone cryptographically sign it (see the last paragraph in A Basic Introduction to Communicating Securely with PGP to understand how hashes are used in that process)), and once the good file is signed you can swap it for the bad file. The signatures will still match, and people will transfer their trust in the signature (originally used to sign the good file) to the bad file.

  3. Justin wrote:

    Thanks for the follow up. I was temporarily confused in thinking you could generate all possible MD5 hashes in 2^20 iterations. Your explanation helped a lot!

Leave Your Comment


9 × = twenty seven

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org