1 Missing Foreign Keys

Data Integrity Critical

The Mistake

Using integer columns to reference other tables without adding FOREIGN KEY constraints. The database happily accepts orphaned rows and invalid IDs.

⚠️ What goes wrong
  • Orphaned rows pile up over time
  • Application code must validate every reference
  • Deleting a parent leaves dangling child records
  • Impossible for the database to enforce referential integrity

The Fix

CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
  user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_comments_post ON comments(post_id);
CREATE INDEX idx_comments_user ON comments(user_id);
See the diff →

2 FLOAT for Money

Data Integrity Critical

The Mistake

Storing monetary values in FLOAT or DOUBLE columns. IEEE 754 floating-point numbers cannot represent most decimal fractions exactly.

⚠️ What goes wrong
  • $0.10 + $0.20 becomes 0.30000000000000004
  • Rounding errors compound across calculations
  • Financial audits fail
  • Tax calculations can be off by cents — or worse

The Fix

Use INTEGER (cents) or NUMERIC(19,4) (decimal). Never FLOAT for money.

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id),
  total_cents INT NOT NULL CHECK (total_cents >= 0),
  currency CHAR(3) DEFAULT 'USD',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Alternative with NUMERIC for sub-cent precision:
-- total NUMERIC(19,4) NOT NULL CHECK (total >= 0)
See the diff →

3 Comma-Separated Values in a Column

Normalization Performance

The Mistake

Storing multiple values in a single VARCHAR column as comma-separated strings. Breaks 1NF and makes every query a nightmare.

⚠️ What goes wrong
  • Cannot index individual tags/values
  • LIKE '%tag%' queries are table scans
  • No referential integrity on individual values
  • Parsing logic duplicated across the codebase

The Fix

Use a junction table for many-to-many relationships. Every value gets its own row.

CREATE TABLE tags (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL UNIQUE
);

CREATE TABLE post_tags (
  post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
  tag_id INT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
  PRIMARY KEY (post_id, tag_id)
);

CREATE INDEX idx_post_tags_tag ON post_tags(tag_id);
See the diff →

4 EAV (Entity-Attribute-Value)

Performance Queryability

The Mistake

Using a generic entity_attributes table with entity_id, key, value columns to store dynamic schema data. Looks flexible, destroys performance.

⚠️ What goes wrong
  • Every attribute requires a self-JOIN to query
  • No type safety — everything is a string
  • Cannot create indexes on specific attributes
  • Queries become unreadable nightmares

The Fix

Use proper columns. If you truly need dynamic fields, use JSONB (PostgreSQL) or a dedicated document store. Never EAV.

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  price_cents INT NOT NULL,
  weight_g INT,
  color VARCHAR(50),
  dimensions_cm JSONB,
  specs JSONB
);

-- Query specs with GIN index:
CREATE INDEX idx_products_specs ON products USING GIN(specs);
See the diff →

5 No created_at / updated_at

Debugging Observability

The Mistake

Creating tables without created_at and updated_at columns. When something breaks, you have no idea when the data changed.

⚠️ What goes wrong
  • Cannot debug "when did this row appear?"
  • No way to sort by recency for admin UIs
  • Cannot implement time-based data retention
  • Impossible to build analytics on table activity

The Fix

Add timestamps to every table. Use DEFAULT NOW() and update triggers for updated_at.

CREATE TABLE tasks (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  status VARCHAR(20) DEFAULT 'todo',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Auto-update trigger for updated_at:
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tasks_updated_at
  BEFORE UPDATE ON tasks
  FOR EACH ROW
  EXECUTE FUNCTION update_updated_at();
See the diff →

6 Unindexed Foreign Keys

Performance Scalability

The Mistake

Adding FOREIGN KEY constraints without indexes on the referencing column. FKs ensure integrity, but without indexes, JOINs and cascades are slow.

⚠️ What goes wrong
  • Every JOIN becomes a sequential scan
  • ON DELETE CASCADE locks the entire table
  • Query performance degrades linearly with table size
  • Foreign key checks during inserts block writes

The Fix

Index every foreign key column. Consider composite indexes if you filter by FK + another column.

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
  product_id INT NOT NULL REFERENCES products(id),
  quantity INT NOT NULL CHECK (quantity > 0),
  price_cents INT NOT NULL
);

-- Always index FKs:
CREATE INDEX idx_order_items_order ON order_items(order_id);
CREATE INDEX idx_order_items_product ON order_items(product_id);

-- Composite index for common query pattern:
CREATE INDEX idx_order_items_order_product ON order_items(order_id, product_id);
See the diff →

7 VARCHAR(255) for Everything

Semantics Validation

The Mistake

Defaulting every string column to VARCHAR(255) regardless of what it stores. Emails, country codes, URLs, and usernames all get the same treatment.

⚠️ What goes wrong
  • Wasted storage on wide rows
  • No domain validation at the database level
  • Silent truncation of legitimate data
  • Harder to reason about data boundaries

The Fix

Use domain-appropriate lengths and CHECK constraints. Be intentional about every column.

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(254) NOT NULL UNIQUE,
  username VARCHAR(30) NOT NULL UNIQUE,
  display_name VARCHAR(100),
  country_code CHAR(2) CHECK (country_code ~ '^[A-Z]{2}$'),
  bio VARCHAR(500),
  website_url VARCHAR(2048)
);
See the diff →

8 No CHECK Constraints on Enums

Validation Critical

The Mistake

Using VARCHAR for status/state columns without CHECK constraints. Application bugs can insert garbage values that propagate silently.

⚠️ What goes wrong
  • Typos in status values create invisible bugs
  • Application code must validate everywhere
  • Queries break when unexpected values appear
  • Data migrations become risky guesswork

The Fix

Use CHECK constraints or ENUM types. The database should be the last line of defense for invalid data.

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id),
  status VARCHAR(20) NOT NULL
    CHECK (status IN ('pending','paid','shipped','delivered','cancelled')),
  payment_method VARCHAR(20)
    CHECK (payment_method IN ('card','bank_transfer','crypto','cash')),
  total_cents INT NOT NULL CHECK (total_cents >= 0),
  created_at TIMESTAMPTZ DEFAULT NOW()
);
See the diff →

9 Missing Soft Deletes

Data Safety Recovery

The Mistake

Hard-deleting user-facing content with DELETE. Accidental deletes are permanent, and cascading foreign keys can wipe out related data.

⚠️ What goes wrong
  • User accidentally deletes content — it's gone forever
  • Cascading deletes remove comments, votes, history
  • Cannot audit "who deleted this and when?"
  • No way to recover from application bugs

The Fix

Add deleted_at and filter it in queries. Use partial indexes for active rows.

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id),
  title VARCHAR(255) NOT NULL,
  body TEXT,
  deleted_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_posts_active ON posts(created_at DESC)
  WHERE deleted_at IS NULL;
See the diff →

10 Repeating Groups (1NF Violation)

Normalization Scalability

The Mistake

Storing multiple related values as numbered columns: phone_1, phone_2, phone_3. Violates First Normal Form and limits you to N values.

⚠️ What goes wrong
  • Hard limit on how many values you can store
  • Querying "find all phone numbers" requires ORing N columns
  • Wasted storage when most rows use only 1-2 values
  • Schema changes required to add more slots

The Fix

Use a separate table with one row per value. Add a type or label column if values are heterogeneous.

CREATE TABLE contacts (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  type VARCHAR(20) NOT NULL
    CHECK (type IN ('phone','email','address')),
  label VARCHAR(50) DEFAULT 'primary',
  value TEXT NOT NULL,
  is_primary BOOLEAN DEFAULT FALSE,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_contacts_user ON contacts(user_id, type);
See the diff →

Diff Your Schema Against These Fixes

Paste your current schema and the fixed version. See every missing index, constraint, and column in seconds.

Open Schema Diff →

Learn the Right Way

Explore 10 production-ready schema design patterns with before/after diffs.

Design Patterns →