Password Hashing - the views of a noob

The Intro

Okay! I might know a thing or two about computers, but I'm pretty dull at cryptography. Throughout this post, there's a good chance I might make a few ludicrous assumptions, so be kind in case you comment.

So this was our predicament. At our startup, we are sort of building a login mechanism. We had previously used Facebook authentication, and that worked well. But that severely limits our audience. Throw common sense out the window - that describes our startup. Hence, after throwing common sense out the window, I was impelled to build a login mechanism.

While I was looking at how these password mechanisms are implemented everywhere, I stumbled upon the usual stackoverflow links. I had by then built a pretty mundane system where I took login credentials from the user from a webpage and matched them in the DB. What I found startled me! I could see the password in plain-text at the admin panel. I'm sure this wasn't supposed to happen, but it did! I kept searching if I coded it wrong, and if the webpage would 'encrypt' (pardon me if I use the wrong words in the wrong place) the credentials before passing them across.

What struck me was that everytime we enter our passwords in any website, we cannot be sure that our passwords are/aren't being transmitted in plain-text. Sure, almost all websites these days have SSL turned on, and that (I think) ensures that a Man-In-The-Middle (MITM) attack cannot happen, but what if an admin, who doesn't have your best interests at heart stares at the plain-text passwords that arrive one after another? Well, you could argue that the passwords are immediately hashed and matched in the DB and thrown away. But what if someone places a print statement in the server application code just before the passwords are matched? Doesn't that mean anybody who has access to the command line could view those passwords being printed? Look at this server-side code:

class WS2Handler(tornado.websocket.WebSocketHandler):  
    def on_message(self, message):
        print("Received message %s"%message)

I'm using Tornado's WebSockets to receive the login credentials. And that print statement scared me. If, let's just say if, somebody gets hold of the root password, logs in into the server, modifies the code to send the credentials to some remote address, we wouldn't even know that such a thing had happened, since our code would still work awfully well. Why would we even look at code that's working well (This is why I said I have no clue about security)?

The solution or at least something I can live with

What if we used a not-very-complicated hashing mechanism at the browser level itself? Consider this example:

Case 1:

At the browser, without hashing:  
Password: aWeakPassword

At the server, just when data is received:  
Password: aWeakPassword  

The password can then be hashed and either checked for a match, or can be added against a new user. Sure, the password is hashed, but what if an attacker modifies the code and does something nefarious with it? I'm sure that might be a frequent occurrence too. So isn't the onus on us, that as developers, we need to take into account such "frequent occurrences" too?

Case 2:

At the browser, with hashing:  
Password: aWeakPassword

Immediately, before transmission, the password is salted and hashed to let's say, '1udkkhbc897!8*7639230hccjojnnHjIO'

At the server, just when data is received:  
Password: 1udkkhbc897!8*7639230hccjojnnHjIO  

The password received here is not the original password, and from the very definition of a hash, its plain-text value cannot be reproduced. Now, even if some not-so-cool attacker modifies the code, all that he'd/she'd (yes, feminism!) have access to is the hashed password. What could possibly be done with that? Well, things like Rainbow Tables could be created, but with the right choice of browser side hash algorithms, I think we can safely say that the hashes created are really hashes. Also, we'd be leveraging the user's computing power, so, that's another benefit. This would additionally sort of guarantee that every password received at the server would have a decent password strength.

The architecture

The browser takes in the username, password and the salt and computes a hash. But, there's a chance that an attacker might view the JS source code, and figure out the parameters of the hash algorithm. True, so to thwart such a possibility, we could either have random values generated for the required parameters of the hash function and transmit these values along with the username and the password, or, trasmit these values via a WebSocket at runtime. Hey, I'm gonna flog this incredible horse called WebSockets. Either option would mean that there's no real way to find out the values of the parameters just by looking at the client-side source code.

Client_Hash_Digest = ClientSideHashFunction(USERNAME, PASSWORD, SALT_1)  

Communication happens via HTTPS.


The browser then transmits the hash calculated and the server computes a stronger hash.

Server_Hash_Digest = ServerSideHashFunction(Client_Hash_Digest, SALT_2)  

The server code could then use Server_Hash_Digest for further processing.

Can things still go horribly wrong?

Let's look at the various parameters that could possibly go wrong.

  • Password Compromise: If the password is compromised, I'm pretty sure that as a developer, there's not much we can do. By this, I mean that some person figures out the user's password as he/she enters it in the browser. Or, to tackle this problem, we could use two-step verification.

  • Salt-1 Compromise: Well, salts aren't meant to be private. So this compromise might not really affect (Or does it? Like I said, I'm a noob).

  • Client-Hash-Digest Compromise: The hash is computed just before being transmitted, and since the connection is established via a Secure Channel (HTTPS), I doubt if this is even possible. Sure, if it's a HTTP channel, then the compromise of the hash here essentially means that this hash could be used as password when presenting it to the server. And even if it is captured, there's no way to recompute the original password from the hash. But, it can be used as password since this hash is essentially the password the server receives. Hence, I cannot stress this enough - Always use HTTPS and sessions (explained a little below) for any sort of data transmission.

  • Password Compromise: The password received here is the hashed version of the actual password. Hence, this password is pretty much useless, since it cannot be entered in the password textbox, as the browser would rehash this to compute another unique value. But again, this hash could be used to inject the system, and the entire security mechanism that I have been calling as ultra safe would fall flat on its face. I think this could be solved by establishing a session everytime a client connects to the server. This means that a rogue client cannot simply establish a session, and hence, the Client-Hash-Digest cannot simply be transmitted to the server. Establishing a session ensures that only passwords of authentic clients that have been hashed at the browser level can be transmitted.

  • Salt-2 Compromise: Again, just as in the previous case, the salt really isn't private. So....

  • Server-Hash-Digest Compromise: If this hash is compromised, there's "NO" way to recompute the original hash, and further, compute the plain-text password. There's "NO" way a rainbow table would work if a strong hashing algorithm such as bcrypt or PBKDF2 or scrypt is used.

So all in all, this architecture (I think) would suffice. Another thing I'd like to mention is that all this communication needs to be done via HTTPS (i.e., via SSL), and we've done that even when we used Facebook login. I'm not saying that having browser-side hashing would mean that we could have a safe client-server channel even via HTTP, but my post here describes the concern I have with an admin/attacker looking at our passwords. I think this solves that problem.