Luke a Pro

Luke Sun

Developer & Marketer

๐Ÿ‡บ๐Ÿ‡ฆ

Password Storage: Why You Should Never Encrypt Passwords

| , 15 minutes reading.

1. Why Should You Care?

Youโ€™re building an application with user accounts. Users enter passwords, and you need to store something in the database to verify them later.

If your database gets breached (and assume it will), what happens to your usersโ€™ passwords?

Bad password storage has led to billions of leaked credentials. Letโ€™s understand what โ€œgoodโ€ looks like.

2. Why Not Encryption?

The Problem with Encrypting Passwords

If you encrypt passwords:

Storage: AES-GCM(key, "password123") โ†’ ciphertext
Verify:  AES-GCM-decrypt(key, ciphertext) โ†’ "password123"

Problems:
1. You have the decryption key
2. Anyone who gets the key gets ALL passwords
3. You can see users' actual passwords
4. Key management becomes critical weakness

This is WRONG. You should never be able to recover passwords.

What We Actually Need

Requirements for password storage:
1. Verify: Can check if entered password is correct
2. One-way: Cannot recover original password from storage
3. Unique: Same password โ†’ different storage values (per user)
4. Slow: Expensive to compute (resists brute force)
5. Future-proof: Can increase difficulty over time

3. Why Not Plain Hash?

The Naive Approach (Very Broken)

import hashlib

# WRONG: Plain hash
def store_password(password):
    return hashlib.sha256(password.encode()).hexdigest()

def verify_password(password, stored):
    return hashlib.sha256(password.encode()).hexdigest() == stored

# Problem: Same password = same hash
store_password("password123")  # Always same output!

Attack 1: Rainbow Tables

Pre-compute hashes for common passwords:

Rainbow Table:
password123  โ†’ ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
123456       โ†’ 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92
qwerty       โ†’ 65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5
...millions more...

Attack: Look up hash in table โ†’ instant password recovery

Attack 2: Same Hash = Same Password

Database leak:
user1: ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
user2: ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
user3: 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92

Attacker sees: user1 and user2 have the same password!
Crack one, get both.

4. Salting: Unique Per User

Adding a Salt

import hashlib
import os

def store_password(password):
    salt = os.urandom(16)  # Random per user
    hash_input = salt + password.encode()
    password_hash = hashlib.sha256(hash_input).hexdigest()
    return salt.hex() + ":" + password_hash

def verify_password(password, stored):
    salt_hex, stored_hash = stored.split(":")
    salt = bytes.fromhex(salt_hex)
    hash_input = salt + password.encode()
    computed_hash = hashlib.sha256(hash_input).hexdigest()
    return computed_hash == stored_hash

# Now same password โ†’ different hashes
print(store_password("password123"))  # Different each time!
print(store_password("password123"))  # Different again!

Salt Solves Some Problems

With salt:
โœ“ Rainbow tables useless (need table per salt)
โœ“ Same password โ†’ different stored values
โœ“ Can't identify users with same password

Still broken:
โœ— SHA-256 is too fast!
โœ— GPU can compute billions of hashes/second
โœ— Brute force still practical

5. The Speed Problem

Modern GPU Attack Speeds

Hashcat on RTX 4090 (approximate):

SHA-256:           22,000,000,000 H/s (22 billion/second)
MD5:               164,000,000,000 H/s

For 8-char lowercase password (26^8 = 208 billion):
SHA-256: 208B / 22B = ~10 seconds
MD5: 208B / 164B = ~1.3 seconds

For 8-char mixed case + digits (62^8 = 218 trillion):
SHA-256: 218T / 22B = ~2.7 hours
MD5: 218T / 164B = ~22 minutes

This is why we need SLOW hash functions!

The Solution: Work Factors

Password hashing algorithms include deliberate slowness:

bcrypt:    Cost factor (2^cost iterations)
scrypt:    CPU cost, memory cost, parallelization
Argon2:    Time cost, memory cost, parallelism

Goal: Make each hash attempt take ~100ms-1s
Attacker doing 1 billion attempts now takes 3+ years

6. bcrypt

How bcrypt Works

bcrypt design:
1. Based on Blowfish cipher
2. Expensive key setup phase
3. Cost factor controls iterations (2^cost)
4. Built-in salt (22 chars)
5. Output: 60 characters

Format: $2b$cost$salt(22)hash(31)
Example: $2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/BoIYq6h.Cg0f3Fy/q
         โ”€โ”ฌโ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
          โ”‚  โ”‚         salt                      hash
          โ”‚  โ””โ”€โ”€ cost factor (12 = 2^12 = 4096 iterations)
          โ””โ”€โ”€ algorithm version (2b = modern bcrypt)

bcrypt in Python

import bcrypt

def hash_password(password: str) -> str:
    """Hash a password for storage"""
    # Generate salt and hash (cost factor 12 is good default)
    password_bytes = password.encode('utf-8')
    salt = bcrypt.gensalt(rounds=12)  # 2^12 = 4096 iterations
    hashed = bcrypt.hashpw(password_bytes, salt)
    return hashed.decode('utf-8')

def verify_password(password: str, hashed: str) -> bool:
    """Verify a password against stored hash"""
    password_bytes = password.encode('utf-8')
    hashed_bytes = hashed.encode('utf-8')
    return bcrypt.checkpw(password_bytes, hashed_bytes)

# Usage
stored = hash_password("my_secure_password")
print(f"Stored: {stored}")
# $2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/BoIYq6h.Cg0f3Fy/q

# Verify
print(verify_password("my_secure_password", stored))  # True
print(verify_password("wrong_password", stored))       # False

bcrypt Limitations

bcrypt issues:
- 72-byte password limit (truncates longer passwords)
- Fixed memory usage (not memory-hard)
- Can be accelerated with specialized hardware

Workaround for long passwords:
def hash_long_password(password: str) -> str:
    # Pre-hash to handle any length
    import hashlib
    pre_hash = hashlib.sha256(password.encode()).digest()
    import base64
    shortened = base64.b64encode(pre_hash)[:72]
    return hash_password(shortened.decode())

Why Argon2?

Argon2 won the Password Hashing Competition (2015):

Three variants:
- Argon2d: Maximum GPU resistance, vulnerable to side-channels
- Argon2i: Side-channel resistant, for password hashing
- Argon2id: Hybrid (recommended), best of both

Features:
- Memory-hard (configurable memory usage)
- Time-configurable (iterations)
- Parallelism-configurable (CPU threads)
- No password length limit

Argon2 in Python

from argon2 import PasswordHasher
from argon2.exceptions import VerifyMismatchError

# Create hasher with recommended parameters
ph = PasswordHasher(
    time_cost=3,        # Number of iterations
    memory_cost=65536,  # 64 MB of memory
    parallelism=4,      # 4 parallel threads
    hash_len=32,        # Output length
    salt_len=16         # Salt length
)

def hash_password(password: str) -> str:
    """Hash a password using Argon2id"""
    return ph.hash(password)

def verify_password(password: str, hashed: str) -> bool:
    """Verify a password against stored hash"""
    try:
        ph.verify(hashed, password)
        return True
    except VerifyMismatchError:
        return False

def needs_rehash(hashed: str) -> bool:
    """Check if hash needs to be updated with new parameters"""
    return ph.check_needs_rehash(hashed)

# Usage
stored = hash_password("my_secure_password")
print(f"Stored: {stored}")
# $argon2id$v=19$m=65536,t=3,p=4$c2FsdHNhbHRzYWx0$hash...

print(verify_password("my_secure_password", stored))  # True

# Upgrade parameters over time
if verify_password("my_secure_password", stored) and needs_rehash(stored):
    new_hash = hash_password("my_secure_password")
    # Store new_hash in database

Choosing Argon2 Parameters

OWASP recommendations (2024):

Minimum:
- Argon2id
- m=19456 (19 MB), t=2, p=1

Recommended:
- Argon2id
- m=65536 (64 MB), t=3, p=4

High security:
- Argon2id
- m=262144 (256 MB), t=4, p=8

Tuning approach:
1. Set memory to maximum your server can spare
2. Increase time_cost until hash takes ~0.5-1 second
3. Set parallelism to number of available cores

8. scrypt

When to Use scrypt

scrypt advantages:
- Memory-hard (like Argon2)
- Well-studied since 2009
- Used in some cryptocurrencies

When to use:
- When Argon2 isn't available
- For key derivation (HKDF-like use cases)
- Compatibility with existing systems

scrypt in Python

from cryptography.hazmat.primitives.kdf.scrypt import Scrypt
import os

def hash_password_scrypt(password: str) -> tuple[bytes, bytes]:
    """Hash password with scrypt"""
    salt = os.urandom(16)

    kdf = Scrypt(
        salt=salt,
        length=32,
        n=2**17,  # CPU/memory cost (must be power of 2)
        r=8,      # Block size
        p=1       # Parallelization
    )

    key = kdf.derive(password.encode())
    return salt, key

def verify_password_scrypt(password: str, salt: bytes, stored_key: bytes) -> bool:
    """Verify password with scrypt"""
    kdf = Scrypt(
        salt=salt,
        length=32,
        n=2**17,
        r=8,
        p=1
    )

    try:
        kdf.verify(password.encode(), stored_key)
        return True
    except Exception:
        return False

# Usage
salt, key = hash_password_scrypt("my_password")
print(verify_password_scrypt("my_password", salt, key))  # True

9. Comparison

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Algorithm   โ”‚ Memory   โ”‚ Parallelismโ”‚ Recommended โ”‚ Notes          โ”‚
โ”‚             โ”‚ Hard     โ”‚ Resistant  โ”‚             โ”‚                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Argon2id    โ”‚ โœ“        โ”‚ โœ“          โ”‚ โœ“โœ“โœ“         โ”‚ Best choice    โ”‚
โ”‚ scrypt      โ”‚ โœ“        โ”‚ Partial    โ”‚ โœ“โœ“          โ”‚ Good fallback  โ”‚
โ”‚ bcrypt      โ”‚ โœ—        โ”‚ Partial    โ”‚ โœ“           โ”‚ Still OK       โ”‚
โ”‚ PBKDF2      โ”‚ โœ—        โ”‚ โœ—          โ”‚ Legacy only โ”‚ Use 600k iters โ”‚
โ”‚ SHA-256     โ”‚ โœ—        โ”‚ โœ—          โ”‚ โœ—           โ”‚ Never use      โ”‚
โ”‚ MD5         โ”‚ โœ—        โ”‚ โœ—          โ”‚ โœ—โœ—โœ—         โ”‚ Never use      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Memory-hard: Requires significant RAM, harder to parallelize on GPUs
Parallelism resistant: Difficult to speed up with multiple cores/GPUs

10. Complete Implementation

"""
Production-ready password hashing module
"""
from argon2 import PasswordHasher, Type
from argon2.exceptions import VerifyMismatchError, InvalidHashError
import secrets
import hmac

class PasswordManager:
    """Secure password hashing with Argon2id"""

    def __init__(
        self,
        time_cost: int = 3,
        memory_cost: int = 65536,  # 64 MB
        parallelism: int = 4,
        pepper: bytes = None  # Server-side secret
    ):
        self.hasher = PasswordHasher(
            time_cost=time_cost,
            memory_cost=memory_cost,
            parallelism=parallelism,
            hash_len=32,
            salt_len=16,
            type=Type.ID  # Argon2id
        )
        self.pepper = pepper

    def _apply_pepper(self, password: str) -> str:
        """Add pepper to password before hashing"""
        if self.pepper:
            # HMAC prevents length extension attacks
            peppered = hmac.new(
                self.pepper,
                password.encode(),
                'sha256'
            ).hexdigest()
            return peppered
        return password

    def hash(self, password: str) -> str:
        """Hash a password for storage"""
        if not password:
            raise ValueError("Password cannot be empty")

        peppered = self._apply_pepper(password)
        return self.hasher.hash(peppered)

    def verify(self, password: str, hash: str) -> bool:
        """Verify a password against a hash"""
        if not password or not hash:
            return False

        peppered = self._apply_pepper(password)

        try:
            self.hasher.verify(hash, peppered)
            return True
        except (VerifyMismatchError, InvalidHashError):
            return False

    def needs_rehash(self, hash: str) -> bool:
        """Check if hash needs updating with new parameters"""
        try:
            return self.hasher.check_needs_rehash(hash)
        except InvalidHashError:
            return True

    def verify_and_rehash(self, password: str, hash: str) -> tuple[bool, str | None]:
        """Verify password and return new hash if parameters changed"""
        if not self.verify(password, hash):
            return False, None

        if self.needs_rehash(hash):
            return True, self.hash(password)

        return True, None


# Usage example
def example_usage():
    # Initialize with optional pepper (store in env var, not code!)
    pepper = secrets.token_bytes(32)  # In production: from environment
    pm = PasswordManager(pepper=pepper)

    # Registration
    password = "user_password_123"
    hashed = pm.hash(password)
    print(f"Stored hash: {hashed[:50]}...")

    # Login
    is_valid = pm.verify(password, hashed)
    print(f"Password valid: {is_valid}")

    # Check for rehash (after upgrading parameters)
    is_valid, new_hash = pm.verify_and_rehash(password, hashed)
    if new_hash:
        print("Hash upgraded, store new_hash in database")


if __name__ == "__main__":
    example_usage()

11. Common Mistakes

Mistake 1: Comparing Hashes Insecurely

# WRONG: Timing attack vulnerability
def verify_bad(password, stored_hash):
    computed = hash_password(password)
    return computed == stored_hash  # String comparison leaks timing

# RIGHT: Use constant-time comparison
import hmac
def verify_good(password, stored_hash):
    computed = hash_password(password)
    return hmac.compare_digest(computed, stored_hash)

# BEST: Use library's built-in verify function
# (bcrypt.checkpw, argon2.verify already handle this)

Mistake 2: Hardcoding Parameters

# WRONG: Parameters in code
def hash_password(pwd):
    return argon2.hash(pwd, time_cost=2, memory_cost=32768)

# RIGHT: Configurable, allows upgrades
class PasswordConfig:
    TIME_COST = int(os.environ.get('ARGON2_TIME_COST', 3))
    MEMORY_COST = int(os.environ.get('ARGON2_MEMORY_KB', 65536))
    PARALLELISM = int(os.environ.get('ARGON2_PARALLELISM', 4))

Mistake 3: Not Handling Upgrades

# Always check if rehashing is needed after successful login
def login(username, password):
    user = get_user(username)

    if not verify_password(password, user.password_hash):
        return False

    # Upgrade hash if using old parameters
    if needs_rehash(user.password_hash):
        user.password_hash = hash_password(password)
        save_user(user)

    return True

12. Summary

Three things to remember:

  1. Never encrypt passwords, never use plain hashes. Encryption is reversible, plain hashes are too fast. Use purpose-built password hashing algorithms.

  2. Argon2id is the best choice. Itโ€™s memory-hard, configurable, and won the Password Hashing Competition. Use bcrypt if Argon2 isnโ€™t available.

  3. Tune parameters for ~0.5-1 second hash time. This makes brute force impractical while keeping login acceptable. Increase parameters over time as hardware improves.

13. Whatโ€™s Next

We can hash passwords securely. But where do we store the pepper? How do we manage encryption keys? What happens when keys need to be rotated?

In the next article: Key Managementโ€”generating, storing, rotating, and destroying cryptographic keys safely.