Whenever I hear about users credential data being stolen from servers I wonder how good (or bad) it was stored in regards of security. There are several layers of security I apply to passwords. The following article will explain these layers and my reasons for them. I’ll explain these starting with the worst example I’ve ever seen in production.
Starting of really bad
I was assinged to work on a web application once, that contained the following table:
This was a table storing the plain text password along with the username and the email address. This is the absolute worst case scenario, because users tend to reuse the same username, email and password combinations. It is even likely that the password for the e-mail address would be the same as stored in the database. If an attacker would get access to this database it would not only harm the application the password was stolen from, but possibly the whole digital identity. If the same credentials were used for an online store it is possible the attacker could get access to credit card numbers and similar sensitive information.
Another very critical point is that the password may contain personal information (like religious view or sexual orientation) you are not allowed to store without explicit permission of the user. I’m not aware of any lawsuite regarding this topic, but I wouldn’t want to be the first.
To improve this it is actually a wise decision never to store the users password, only a hashcode of this password. I rely on hashing algorithms natively supported the technology stack used when developing an application. In the past I commonly used MD5, but nowadays I tend to use SHA-1. Feel free to pick whatever hashing algorithm suits your needs or that you are familiar with. The only restriction is that you should never use an algorithm known to be weak or even one that was already broken.
Important note: Some might argue SHA-1 is already broken. This is formally correct, but right now there is no implementation for the strategy proposed in breaking it. Next to that it’s an algorithm supported by many layers of the technology stack (e.g. MariaDB and Java). I would no longer use it if I would be starting a new application, but rather SHA-256.
Now you only have stored the password with a little bit of security applied.
Protecting users with weak passwords
It is hard to determine the plain text password from the stored hashes, and it is really expensive. You can use brute force to calculate the hashcode for lots and lots of passwords until you find a match. But the greatest threat to breaking the passwords are rainbow tables. A rainbow table basically is a really large table storing the hashcode along with its input. The table is then used to lookup hashcodes to quickly get the input. Common passwords, like ‚123456‘, will be broken within a fraktion of seconds. To protect users with weak passwords you need to use ’salting‘. This basically means you append a constant random string to the password prior to calculating the hashcode. If the users passwords was 123456 and our salt would be atho3aifeiR2, the string sent to the hashcode calculation would be atho3aifeiR2123456. If you choose your random string wisely it’s unlikely rainbow tables will contain this. Next to that most of the rainbow tables were built using only short passwords. If the salt itself is lengthy (e.g. 12 chars and more) it provides and additional chance of it not being in the rainbow table. So never use short and easy salts, like ‚1‘ because this does not provide any security at all.
Double hashing passwords
Now your passwords are stored pretty secure. But if the attacker was able to obtain your salt with the hashed passwords he could still build a rainbow table for all passwords starting with your salt. For simple passwords this would still harm the security. I’m using a chain of hashing algorithms to work around this. Basically you salt the password and hash it as done before, but after that you go ahead and calculate the hashcode of the hashcode. You can repeat this step several times, like
md5(md5(md5('atho3aifeiR2' + password))). This way a possible attacker can not use any rainbow table but only rely on brute force attacks. The great advantage of this approach is that the attacker needs to know your implementation to actually produce usable results. If he was able to break into your database server but not your application server, your users passwords are safe.
Protecting against weak algorithms
f you used a weak algorithm (or one that was broken after you introduced it) your users still could be at risk. If you use a chain of different algorithms this threat is actually minimized. So our pseudo-code looks like this
SHA2(sha1(md5('atho3aifeiR2' + password)), 256). In case all of these were broken, you can still wrap the whole chain using a stronger algorithm, like
SHA2(SHA2(sha1(md5('atho3aifeiR2' + password)), 256), 512), and apply the outer most hashfunction to your existing data, e.g.
UPDATE user SET password = SHA2(password, 512).