10 Database Schema Anti-Patterns (And How to Fix Them)

Anti-Patterns

01 Missing Foreign Keys
02 FLOAT for Money
03 Comma-Separated Values
04 EAV Tables
05 No Timestamps
06 Unindexed Foreign Keys
07 VARCHAR(255) for Everything
08 No CHECK Constraints
09 Missing Soft Deletes
10 Repeating Groups

1 Missing Foreign Keys

Data Integrity Critical

The Mistake

Using integer columns to reference other tables without adding FOREIGN KEY constraints. The database happily accepts orphaned rows and invalid IDs.

⚠️ What goes wrong

Orphaned rows pile up over time
Application code must validate every reference
Deleting a parent leaves dangling child records
Impossible for the database to enforce referential integrity

The Fix

CREATE TABLE comments (
  id SERIAL PRIMARY KEY,
  post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
  user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_comments_post ON comments(post_id);
CREATE INDEX idx_comments_user ON comments(user_id);

See the diff →

2 FLOAT for Money

Data Integrity Critical

The Mistake

Storing monetary values in FLOAT or DOUBLE columns. IEEE 754 floating-point numbers cannot represent most decimal fractions exactly.

⚠️ What goes wrong

$0.10 + $0.20 becomes 0.30000000000000004
Rounding errors compound across calculations
Financial audits fail
Tax calculations can be off by cents — or worse

The Fix

Use INTEGER (cents) or NUMERIC(19,4) (decimal). Never FLOAT for money.

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id),
  total_cents INT NOT NULL CHECK (total_cents >= 0),
  currency CHAR(3) DEFAULT 'USD',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Alternative with NUMERIC for sub-cent precision:
-- total NUMERIC(19,4) NOT NULL CHECK (total >= 0)

See the diff →

3 Comma-Separated Values in a Column

Normalization Performance

The Mistake

Storing multiple values in a single VARCHAR column as comma-separated strings. Breaks 1NF and makes every query a nightmare.

⚠️ What goes wrong

Cannot index individual tags/values
LIKE '%tag%' queries are table scans
No referential integrity on individual values
Parsing logic duplicated across the codebase

The Fix

Use a junction table for many-to-many relationships. Every value gets its own row.

CREATE TABLE tags (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL UNIQUE
);

CREATE TABLE post_tags (
  post_id INT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
  tag_id INT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
  PRIMARY KEY (post_id, tag_id)
);

CREATE INDEX idx_post_tags_tag ON post_tags(tag_id);

See the diff →

4 EAV (Entity-Attribute-Value)

Performance Queryability

The Mistake

Using a generic entity_attributes table with entity_id, key, value columns to store dynamic schema data. Looks flexible, destroys performance.

⚠️ What goes wrong

Every attribute requires a self-JOIN to query
No type safety — everything is a string
Cannot create indexes on specific attributes
Queries become unreadable nightmares

The Fix

Use proper columns. If you truly need dynamic fields, use JSONB (PostgreSQL) or a dedicated document store. Never EAV.

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  price_cents INT NOT NULL,
  weight_g INT,
  color VARCHAR(50),
  dimensions_cm JSONB,
  specs JSONB
);

-- Query specs with GIN index:
CREATE INDEX idx_products_specs ON products USING GIN(specs);

See the diff →

5 No created_at / updated_at

Debugging Observability

The Mistake

Creating tables without created_at and updated_at columns. When something breaks, you have no idea when the data changed.

⚠️ What goes wrong

Cannot debug "when did this row appear?"
No way to sort by recency for admin UIs
Cannot implement time-based data retention
Impossible to build analytics on table activity

The Fix

Add timestamps to every table. Use DEFAULT NOW() and update triggers for updated_at.

CREATE TABLE tasks (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  status VARCHAR(20) DEFAULT 'todo',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Auto-update trigger for updated_at:
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tasks_updated_at
  BEFORE UPDATE ON tasks
  FOR EACH ROW
  EXECUTE FUNCTION update_updated_at();

See the diff →

6 Unindexed Foreign Keys

Performance Scalability

The Mistake

Adding FOREIGN KEY constraints without indexes on the referencing column. FKs ensure integrity, but without indexes, JOINs and cascades are slow.

⚠️ What goes wrong

Every JOIN becomes a sequential scan
ON DELETE CASCADE locks the entire table
Query performance degrades linearly with table size
Foreign key checks during inserts block writes

The Fix

Index every foreign key column. Consider composite indexes if you filter by FK + another column.

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
  product_id INT NOT NULL REFERENCES products(id),
  quantity INT NOT NULL CHECK (quantity > 0),
  price_cents INT NOT NULL
);

-- Always index FKs:
CREATE INDEX idx_order_items_order ON order_items(order_id);
CREATE INDEX idx_order_items_product ON order_items(product_id);

-- Composite index for common query pattern:
CREATE INDEX idx_order_items_order_product ON order_items(order_id, product_id);

See the diff →

7 VARCHAR(255) for Everything

Semantics Validation

The Mistake

Defaulting every string column to VARCHAR(255) regardless of what it stores. Emails, country codes, URLs, and usernames all get the same treatment.

⚠️ What goes wrong

Wasted storage on wide rows
No domain validation at the database level
Silent truncation of legitimate data
Harder to reason about data boundaries

The Fix

Use domain-appropriate lengths and CHECK constraints. Be intentional about every column.

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(254) NOT NULL UNIQUE,
  username VARCHAR(30) NOT NULL UNIQUE,
  display_name VARCHAR(100),
  country_code CHAR(2) CHECK (country_code ~ '^[A-Z]{2}$'),
  bio VARCHAR(500),
  website_url VARCHAR(2048)
);

See the diff →

8 No CHECK Constraints on Enums

Validation Critical

The Mistake

Using VARCHAR for status/state columns without CHECK constraints. Application bugs can insert garbage values that propagate silently.

⚠️ What goes wrong

Typos in status values create invisible bugs
Application code must validate everywhere
Queries break when unexpected values appear
Data migrations become risky guesswork

The Fix

Use CHECK constraints or ENUM types. The database should be the last line of defense for invalid data.

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id),
  status VARCHAR(20) NOT NULL
    CHECK (status IN ('pending','paid','shipped','delivered','cancelled')),
  payment_method VARCHAR(20)
    CHECK (payment_method IN ('card','bank_transfer','crypto','cash')),
  total_cents INT NOT NULL CHECK (total_cents >= 0),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

See the diff →

9 Missing Soft Deletes

Data Safety Recovery

The Mistake

Hard-deleting user-facing content with DELETE. Accidental deletes are permanent, and cascading foreign keys can wipe out related data.

⚠️ What goes wrong

User accidentally deletes content — it's gone forever
Cascading deletes remove comments, votes, history
Cannot audit "who deleted this and when?"
No way to recover from application bugs

The Fix

Add deleted_at and filter it in queries. Use partial indexes for active rows.

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id),
  title VARCHAR(255) NOT NULL,
  body TEXT,
  deleted_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_posts_active ON posts(created_at DESC)
  WHERE deleted_at IS NULL;

See the diff →

10 Repeating Groups (1NF Violation)

Normalization Scalability

The Mistake

Storing multiple related values as numbered columns: phone_1, phone_2, phone_3. Violates First Normal Form and limits you to N values.

⚠️ What goes wrong

Hard limit on how many values you can store
Querying "find all phone numbers" requires ORing N columns
Wasted storage when most rows use only 1-2 values
Schema changes required to add more slots

The Fix

Use a separate table with one row per value. Add a type or label column if values are heterogeneous.

CREATE TABLE contacts (
  id SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  type VARCHAR(20) NOT NULL
    CHECK (type IN ('phone','email','address')),
  label VARCHAR(50) DEFAULT 'primary',
  value TEXT NOT NULL,
  is_primary BOOLEAN DEFAULT FALSE,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_contacts_user ON contacts(user_id, type);

See the diff →

Diff Your Schema Against These Fixes

Paste your current schema and the fixed version. See every missing index, constraint, and column in seconds.

Open Schema Diff →

Learn the Right Way

Explore 10 production-ready schema design patterns with before/after diffs.

Design Patterns →