Luke a Pro

Luke Sun

Developer & Marketer

🇺🇦
EN||

Hash Functions: Why They're Not Encryption

| , 8 minutes reading.

1. Why Should You Care?

A developer stores user passwords like this:

hashed = sha256(password)
database.store(hashed)

“It’s secure,” they say. “I’m using SHA-256, a strong hash function.”

Then their database leaks. Within hours, attackers have recovered 80% of the passwords.

What went wrong?

The developer confused “hashing” with “secure password storage.” These are not the same thing. SHA-256 is a hash function, not a password storage solution. Using it directly for passwords is like using a hammer as a screwdriver—it’s the wrong tool for the job.

2. Definition

A hash function takes input of any size and produces a fixed-size output (the “hash” or “digest”). It’s designed to be one-way: you can compute the hash from the input, but you cannot compute the input from the hash.

Key properties of cryptographic hash functions:

  • Deterministic: Same input always produces same output
  • Fixed output size: SHA-256 always outputs 256 bits, regardless of input size
  • One-way: Computationally infeasible to reverse
  • Collision-resistant: Hard to find two different inputs with the same hash
  • Avalanche effect: Small input change creates drastically different output

3. The Fundamental Difference

Encryption: Two-Way by Design

Plaintext ──[Encrypt with Key]──► Ciphertext ──[Decrypt with Key]──► Plaintext

Encryption is reversible. Given the key, you can always recover the original data.

Hashing: One-Way by Design

Input ──[Hash Function]──► Hash

    (No way back)

Hashing is irreversible. There is no key. There is no decryption. The original data is mathematically destroyed—only a fingerprint remains.

Why the Confusion?

Both produce “garbled output” from readable input. But the purposes are completely different:

FeatureEncryptionHashing
PurposeHide data temporarilyCreate fingerprint permanently
ReversibleYes (with key)No
Key requiredYesNo
Output sizeVaries with inputFixed
Use caseProtect data in transit/storageVerify integrity, store passwords

4. How Hash Functions Work

The High-Level Process

┌─────────────────────────────────────────────────────────────┐
│ 1. Padding                                                  │
│    - Add bits to make input a multiple of block size        │
├─────────────────────────────────────────────────────────────┤
│ 2. Block Processing                                         │
│    - Split into fixed-size blocks                           │
│    - Process each block through compression function        │
│    - Each block's output feeds into next block              │
├─────────────────────────────────────────────────────────────┤
│ 3. Finalization                                             │
│    - Output the final internal state as the hash            │
└─────────────────────────────────────────────────────────────┘

The Avalanche Effect

This is what makes hashes useful for integrity checking:

import hashlib

text1 = "Hello, World!"
text2 = "Hello, World."  # Just changed ! to .

print(hashlib.sha256(text1.encode()).hexdigest())
# dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

print(hashlib.sha256(text2.encode()).hexdigest())
# f8c3bf62a9aa3e6fc1619c250e48abe7519373d3edf41be62eb5dc45199af2ef

One character change → completely different hash. This makes it impossible to “guess your way” to the original input.

5. Why You Can’t “Decrypt” a Hash

Information Loss

A hash function compresses arbitrary-length input into fixed-length output. Information is mathematically lost.

"Hello" (5 bytes)           → 256-bit hash
"War and Peace" (3MB)       → 256-bit hash
Every possible file ever    → 256-bit hash

Infinite inputs map to finite outputs. Multiple inputs will produce the same hash (collisions). You can’t reverse this because you don’t know which of the infinite possible inputs was used.

No Key, No Decryption

Encryption without the key is secure because finding the key is computationally infeasible.

Hashing has no key. There’s nothing to find. The “reversal” would require inverting a mathematical function designed to be non-invertible.

What “Breaking” a Hash Means

When we say a hash is “broken,” we mean:

  • Collision attack: Found two different inputs with the same hash
  • Preimage attack: Given a hash, found an input that produces it (not necessarily the original)

Neither means “decryption.” Even a broken hash function doesn’t become reversible.

6. MD5: Why It Won’t Die

MD5 was designed in 1991. It’s been “broken” since 2004. Yet you still see it everywhere.

Why MD5 Is Broken

  1. Collision attacks are practical: You can create two different files with the same MD5 hash
  2. Chosen-prefix attacks work: Given any two prefixes, you can append data to each that results in the same hash
  3. It’s too fast: 9.5 billion MD5 hashes per second on modern GPUs

Why MD5 Won’t Die

# Still seen in production systems:
file_checksum = hashlib.md5(file_content).hexdigest()  # "Just for integrity"
cache_key = hashlib.md5(query).hexdigest()  # "Just for cache keys"

People argue: “I’m not using it for security, just checksums.”

The problem: Requirements change. Today’s “just a checksum” becomes tomorrow’s security control. And MD5 is so fast that even non-security uses enable attacks.

When MD5 Is Actually Fine

  • Comparing files you fully control
  • Non-security checksums in closed systems
  • Legacy system compatibility (with full awareness of risks)

When MD5 Is Not Fine

  • Any security-sensitive application
  • User-facing file verification
  • Password hashing (never!)
  • Digital signatures
  • Certificate validation

7. Password Storage: Hashing Done Right

Here’s why sha256(password) fails:

Problem 1: Speed

SHA-256 is designed to be fast. Very fast.

SHA-256:     ~8,500,000,000 hashes/second (GPU)
bcrypt:      ~71,000 hashes/second (same GPU)
Argon2:      ~1,000 hashes/second (same GPU, tuned)

Fast hashing means fast cracking. An 8-character password has about 6 quadrillion possibilities. At 8.5 billion/second, that’s 8 days to try them all.

Problem 2: No Salt

Without salt, identical passwords have identical hashes.

Database leak:
user1: 5e884898da28047d9...  ← "password"
user2: 5e884898da28047d9...  ← Also "password"
user3: 5e884898da28047d9...  ← Also "password"

Attackers precompute hashes for common passwords (rainbow tables). One lookup, thousands of accounts compromised.

Problem 3: Rainbow Tables

Precomputed tables mapping common passwords to their hashes. With SHA-256 alone, a 10GB rainbow table can crack most weak passwords instantly.

The Solution: Password Hashing Functions

import bcrypt
import argon2

# bcrypt: Proven, widely supported
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))

# argon2: Modern winner of Password Hashing Competition
ph = argon2.PasswordHasher(
    time_cost=2,
    memory_cost=102400,  # 100 MB
    parallelism=8
)
hashed = ph.hash(password)

These functions are:

  • Deliberately slow: Configurable work factor
  • Memory-hard: (Argon2) Requires significant RAM, defeating GPU attacks
  • Automatically salted: Each hash is unique even for identical passwords

The Correct Password Storage Flow

Registration:
password → [Salt + Slow Hash] → stored_hash

Verification:
input_password + stored_salt → [Same Slow Hash] → compare with stored_hash

8. Hash Function Selection Guide

Use CaseRecommendedAvoid
Password storageArgon2id, bcrypt, scryptSHA-*, MD5
File integritySHA-256, SHA-3, BLAKE3MD5, SHA-1
Digital signaturesSHA-256, SHA-3MD5, SHA-1
HMACSHA-256, SHA-3MD5
Non-security checksumsCRC32, xxHash(Any is fine)
Content-addressable storageSHA-256, BLAKE3MD5

9. Code Example: Proper Password Handling

import argon2
from argon2 import PasswordHasher, exceptions

# Configuration: Adjust based on your server's capabilities
ph = PasswordHasher(
    time_cost=2,      # Number of iterations
    memory_cost=65536, # 64 MB memory usage
    parallelism=4,     # Number of parallel threads
    hash_len=32,       # Output hash length
    salt_len=16        # Salt length
)

def hash_password(password: str) -> str:
    """Hash a password for storage."""
    return ph.hash(password)

def verify_password(stored_hash: str, password: str) -> bool:
    """Verify a password against stored hash."""
    try:
        ph.verify(stored_hash, password)
        return True
    except exceptions.VerifyMismatchError:
        return False
    except exceptions.InvalidHashError:
        # Hash format is invalid
        return False

def needs_rehash(stored_hash: str) -> bool:
    """Check if password needs to be rehashed (parameters changed)."""
    return ph.check_needs_rehash(stored_hash)

# Usage
password = "user_password_here"

# Registration
hashed = hash_password(password)
print(f"Stored hash: {hashed}")
# $argon2id$v=19$m=65536,t=2,p=4$...

# Login
if verify_password(hashed, password):
    print("Login successful")

    # Check if we should upgrade the hash
    if needs_rehash(hashed):
        new_hash = hash_password(password)
        # Update database with new_hash

10. Common Misconceptions

MisconceptionReality
”I can decrypt a hash with enough computing power”No. Hashing destroys information. There’s nothing to decrypt.
”SHA-256 is good for password storage”SHA-256 is too fast. Use bcrypt/Argon2.
”Longer hash = more secure”Security depends on the algorithm, not just length. SHA-512 isn’t “twice as secure” as SHA-256.
”I’ll hash the password twice for extra security”This doesn’t help and can actually reduce security in some cases.
”MD5 is fine for non-security purposes”Until requirements change. Use SHA-256 even for “just checksums.”

11. Summary

Three things to remember:

  1. Hashing is not encryption. Hash functions are one-way by design. You cannot and should not expect to “decrypt” a hash. There’s no key, no reversal, just a fingerprint.

  2. Speed is the enemy for password storage. General-purpose hash functions like SHA-256 are designed to be fast. Password hashing functions like Argon2 are designed to be slow. Use the right tool for the job.

  3. MD5 and SHA-1 are deprecated for security. Even for “just checksums,” prefer SHA-256 or BLAKE3. Requirements change, and you don’t want to be caught with broken cryptography when they do.

12. What’s Next

We’ve covered encryption, hashing, and their differences. But we’ve glossed over something critical: where do keys and salts come from?

In the next article, we’ll explore: Random numbers—the most underestimated component in cryptographic systems, and why rand() can kill your security.