Database Normalization
2026-01-02
What is Database Normalization?
Database normalization is a design process that organizes data into well-structured tables to:
- reduce redundancy
- prevent inconsistent data
- improve data integrity
- make updates easier and safer
Instead of duplicating the same information across multiple tables, normalization helps ensure that each fact is stored in one place — and only one place.
Most relational database schemas aim to follow Third Normal Form (3NF). Higher levels exist, but 1NF–3NF already solve the majority of real-world problems.
Let’s walk through the normal forms step-by-step.
First Normal Form (1NF)
A table is in 1NF when:
✔ Every column holds atomic (indivisible) values
✔ There are no repeating groups or arrays
✔ Each row is uniquely identifiable
Not in 1NF
| order_id | customer | items |
|---|---|---|
| 101 | Alice | Apple, Banana |
| 102 | Bob | Orange, Apple |
The items column stores multiple values — that breaks 1NF.
In 1NF
| order_id | customer | item |
|---|---|---|
| 101 | Alice | Apple |
| 101 | Alice | Banana |
| 102 | Bob | Orange |
| 102 | Bob | Apple |
Now every field is atomic.
Second Normal Form (2NF)
A table is in 2NF when:
✔ It is already in 1NF
✔ Every non-key column depends on the entire primary key
✔ No columns depend on just part of a composite key
This matters when your primary key has multiple columns.
Not in 2NF
| order_id | item | customer_email |
|---|---|---|
| 101 | Apple | [email protected] |
| 101 | Banana | [email protected] |
Primary key = (order_id, item)
But customer_email only depends on order_id, not on the item.
So if Alice orders 10 items… her email repeats 10 times.
Fix: Separate the data
Orders table
| order_id | customer_email |
|---|---|
| 101 | [email protected] |
OrderItems table
| order_id | item |
|---|---|
| 101 | Apple |
| 101 | Banana |
Now every non-key value depends on the whole key.
Third Normal Form (3NF)
A table is in 3NF when:
✔ It is already in 2NF
✔ No column depends on another non-key column
(This is called no transitive dependencies)
Not in 3NF
| customer_id | domain | |
|---|---|---|
| 1 | [email protected] | mail.com |
Here, domain depends on email, not the key.
That's redundant — you could always compute it.
In 3NF
| customer_id | |
|---|---|
| 1 | [email protected] |
If you need the domain, derive it in the app or a view.
Boyce-Codd Normal Form (BCNF)
BCNF is like 3NF with stricter rules.
A table is in BCNF when:
✔ Every determinant is a candidate key
You usually only hit BCNF issues in edge-case schemas where non-key attributes determine keys.
If you're designing standard business systems,
BCNF problems are rare — but knowing it exists helps.
Fourth Normal Form (4NF)
4NF eliminates multi-valued dependencies — cases where:
One key relates independently to multiple columns of repeated data.
Example of a 4NF violation
| student | language | sport |
|---|---|---|
| Alice | English | Tennis |
| Alice | French | Tennis |
| Alice | English | Soccer |
| Alice | French | Soccer |
Languages and sports are unrelated attributes — yet the combination explodes into multiple rows.
Normalize into separate relationships
StudentLanguages
| student | language |
|---|---|
| Alice | English |
| Alice | French |
StudentSports
| student | sport |
|---|---|
| Alice | Tennis |
| Alice | Soccer |
Now there's no meaningless cross-product.
Fifth Normal Form (5NF)
5NF deals with complex join dependencies, ensuring data can be reconstructed from smaller pieces without redundancy or invalid combinations.
You’ll almost never need 5NF in typical app design.
It mainly appears in highly-normalized analytical or research databases.
Do We Always Normalize This Far?
Not always.
Normalization improves integrity — but sometimes:
- denormalization improves read performance
- analytics workflows prefer wider tables
- caching duplicated data simplifies queries
Most production apps target:
3NF for OLTP systems
Go beyond only when you need to
It’s about balance — not dogma.
Summary Cheat Sheet
| Normal Form | Prevents |
|---|---|
| 1NF | Repeating groups & non-atomic values |
| 2NF | Partial key dependencies |
| 3NF | Transitive dependencies |
| BCNF | Non-key determinants |
| 4NF | Multi-valued dependencies |
| 5NF | Complex join redundancy |
Final Thoughts
Normalization is ultimately about respecting your data:
One fact → stored once → trusted everywhere.
Once you internalize the normal forms, you’ll find they naturally guide you toward cleaner and safer database designs.
Have thoughts or questions?
Drop a comment — always happy to talk data. 😊