Thursday 13 December 2007

Don't use passwords. Use passphrases.

We still use passwords everywhere, and they're usually stored as hash values in the database of the service that we log into. I ran into this story about a guy, who looked up the md5 hash value on Google and this way reverse engineered a password. His conclusion is, not to use a password that anybody else on this planet may have used.

The reason that this is a problem, is that many users use the same passwords in multiple places, so if you know their password in one place, you can probably log into other services using that password. If you store all passwords as hash values, and you lose these hash values to people that may abuse them, it is important that they cannot get the original password from it. There are many ways to crack passwords, and lostpassword.com is a good site to know, if you want to know how easy it is to crack passwords.

But how fast can md5 hashes be cracked? Let's try to imagine that we produce all thinkable passwords and generate their md5 hashes, and then use the resulting list as a lookup table, sorted by md5 hash. Let's make a few presumptions:
  • The password is only using lowercase letters and digits, 36 different characters in total.
  • It is totally random.
Let's say the password has the length n. The md5 hash is 32 bytes, so each lookup item is size=32+n. There will be 36n records, using (32+n)×36n bytes of space. How long would it take to find a password for an md5 value? With binary lookup it would use c = log2(36n) = n×log2(36) = n×5 lookups. This is the space needed for various values of n, assuming that a lookup takes 20ms:
  • n=5 uses 2GB crack time: 100ms
  • n=6 uses 82GB crack time: 120ms
  • n=7 uses 3TB crack time: 140ms
  • n=8 uses 112TB crack time: 160ms
  • n=9 uses 4163TB crack time: 180ms
  • n=10 uses 1×1017 bytes
  • n=15 uses 1×1025 bytes
  • n=20 uses 1×1032 bytes
You can buy 1TB drives today, so these are realistic amounts of storage up to n=10. If you want to use a good password, you should therefore ensure, that it's at least 10 characters, and if you want to be well protected, also in the future, go for at least 15 characters.

As you can see, these are bad passwords:
  • j4fsk2
  • this is fun
  • my dog ate my homework (somebody else probably used that, too)
These are good passwords:
  • slashdot8fischk (15 characters, spelling errors etc.)
  • roskilde/1997/annie (25 characters, but who is Annie and why Roskilde?)
It is a good thing that a long password can be typed very fast, so it usually needs to contain some real-life words, but make sure to pick some words that other's wouldn't use.

As a programmer, you can help your users make better passwords by providing more space to type the password. Usability research has shown, that this actually helps, although I cannot remember the source for that information. Some systems also use the word "passphrase" instead of password in order to encourage users to type more characters.

4 comments:

Anonymous said...

All it all works wonderfully until you realized that it often all boils down to a md5 hash making all passwords EXACTLY the same length. The more you screw around, the more likely it is that your passphrase's hash will have the same hash as something simple like "abc"

Lars D said...

An md5 has sum has 128 bits, which is equivalent to a password with 128/5=25 characters, if each character is chosen randomly from a set of 36 characters. It doesn't make sense to make md5 checksums from purely random passwords longer than that, but it does make sense to make passwords longer than 25 characters if it makes them easier to remember.

Anonymous said...

Every coder knows (or should know) that you must salt a password before hashing it to defeat lookup tables. Using MD5 to store passwords these days is almost as bad as storing them in plain text.

Anonymous said...

Jeff at codinghorror also has some interesting posts on this topic. Last one in october.