Design the Database Schema for Twitter

Medium

You are designing the core relational schema for a simplified Twitter-like platform. Focus on the entities and relationships that power the main feed, profiles, and interactions.

Core Entities

  • Users with profiles
  • Tweets (text, media, timestamps)
  • Follow relationships
  • Likes and retweets

Functional Requirements

  • Users can post tweets
  • Users can follow/unfollow others
  • Users can like and retweet
  • Fetch a user's home timeline
  • Fetch a user's profile + tweets

Scale Hints

  • 10M+ daily active users
  • 500M+ tweets per day
  • Follows: average 200 per user
  • Reads >> Writes

Constraints

  • Tweet text max 280 chars
  • Username unique, 15 chars max
  • Prevent self-follows
  • Handle deleted tweets gracefully
Expert Solution — Twitter Schema
CREATE TABLE users (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  username VARCHAR(15) NOT NULL UNIQUE,
  display_name VARCHAR(50) NOT NULL,
  bio TEXT,
  avatar_url VARCHAR(500),
  email VARCHAR(255) NOT NULL UNIQUE,
  password_hash VARCHAR(255) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  is_verified BOOLEAN NOT NULL DEFAULT FALSE,
  follower_count INT NOT NULL DEFAULT 0,
  following_count INT NOT NULL DEFAULT 0,
  tweet_count INT NOT NULL DEFAULT 0
);

CREATE TABLE tweets (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  content VARCHAR(280) NOT NULL,
  reply_to_tweet_id BIGINT REFERENCES tweets(id) ON DELETE SET NULL,
  retweet_of_tweet_id BIGINT REFERENCES tweets(id) ON DELETE SET NULL,
  media_urls TEXT[],
  like_count INT NOT NULL DEFAULT 0,
  retweet_count INT NOT NULL DEFAULT 0,
  reply_count INT NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  is_deleted BOOLEAN NOT NULL DEFAULT FALSE,
  deleted_at TIMESTAMPTZ
);

CREATE TABLE follows (
  follower_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  following_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (follower_id, following_id),
  CONSTRAINT no_self_follow CHECK (follower_id != following_id)
);

CREATE TABLE likes (
  user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  tweet_id BIGINT NOT NULL REFERENCES tweets(id) ON DELETE CASCADE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (user_id, tweet_id)
);

CREATE TABLE bookmarks (
  user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  tweet_id BIGINT NOT NULL REFERENCES tweets(id) ON DELETE CASCADE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (user_id, tweet_id)
);

CREATE INDEX idx_tweets_user_created ON tweets(user_id, created_at DESC) WHERE is_deleted = FALSE;
CREATE INDEX idx_tweets_reply_to ON tweets(reply_to_tweet_id) WHERE reply_to_tweet_id IS NOT NULL;
CREATE INDEX idx_tweets_retweet_of ON tweets(retweet_of_tweet_id) WHERE retweet_of_tweet_id IS NOT NULL;
CREATE INDEX idx_follows_following ON follows(following_id, created_at DESC);
CREATE INDEX idx_likes_tweet ON likes(tweet_id, created_at DESC);

⚠️ Common Mistakes

  • Forgetting the no_self_follow CHECK constraint
  • Storing the full tweet text for retweets instead of referencing the original
  • Missing partial indexes for non-deleted tweets
  • Using ON DELETE CASCADE on the reply/retweet self-references (should be SET NULL)
  • Not indexing follows.following_id which is needed for timeline queries
  • Storing denormalized counts without a strategy to keep them in sync

✅ What Interviewers Look For

  • Understanding of many-to-many relationships (follows, likes)
  • Self-referencing tables (replies, retweets)
  • Soft deletes vs hard deletes and their trade-offs
  • Index design for read-heavy workloads
  • Data normalization vs denormalization decisions

Design the Database Schema for Uber

Hard

Design the core relational schema for a ride-hailing platform. This includes users (riders & drivers), trips, payments, ratings, and location tracking.

Core Entities

  • Riders and Drivers (both users)
  • Trips (from request to completion)
  • Payments and pricing
  • Vehicle types and driver documents
  • Ratings and reviews

Functional Requirements

  • Rider requests a ride, driver accepts
  • Real-time trip status tracking
  • Automatic fare calculation
  • Post-trip rating for both parties
  • Driver onboarding with document verification
  • Promo codes and discounts

Scale Hints

  • 100M+ monthly active riders
  • 5M+ active drivers
  • 20M+ trips per day
  • Payments in 70+ currencies

Constraints

  • Trip fare must be immutable after completion
  • Driver can only have one active vehicle at a time
  • Ratings are 1-5 stars, optional comment
  • Support split fares and refunds
Expert Solution — Uber Schema
CREATE TABLE users (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  phone VARCHAR(20) NOT NULL UNIQUE,
  email VARCHAR(255) UNIQUE,
  full_name VARCHAR(100) NOT NULL,
  password_hash VARCHAR(255),
  avatar_url VARCHAR(500),
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  is_active BOOLEAN NOT NULL DEFAULT TRUE,
  default_payment_method_id BIGINT
);

CREATE TABLE drivers (
  user_id BIGINT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
  license_number VARCHAR(50) NOT NULL UNIQUE,
  license_expiry DATE NOT NULL,
  background_check_status VARCHAR(20) NOT NULL DEFAULT 'pending',
  onboarding_status VARCHAR(20) NOT NULL DEFAULT 'started',
  rating DECIMAL(2,1) NOT NULL DEFAULT 5.0,
  total_trips INT NOT NULL DEFAULT 0,
  lifetime_earnings DECIMAL(12,2) NOT NULL DEFAULT 0.00,
  is_online BOOLEAN NOT NULL DEFAULT FALSE,
  current_vehicle_id BIGINT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE vehicles (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  driver_id BIGINT NOT NULL REFERENCES drivers(user_id) ON DELETE CASCADE,
  make VARCHAR(50) NOT NULL,
  model VARCHAR(50) NOT NULL,
  year INT NOT NULL,
  color VARCHAR(30) NOT NULL,
  license_plate VARCHAR(20) NOT NULL UNIQUE,
  vehicle_type VARCHAR(20) NOT NULL DEFAULT 'economy',
  seat_capacity INT NOT NULL DEFAULT 4,
  is_active BOOLEAN NOT NULL DEFAULT TRUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE vehicle_documents (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  vehicle_id BIGINT NOT NULL REFERENCES vehicles(id) ON DELETE CASCADE,
  document_type VARCHAR(30) NOT NULL,
  document_url VARCHAR(500) NOT NULL,
  verification_status VARCHAR(20) NOT NULL DEFAULT 'pending',
  uploaded_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  verified_at TIMESTAMPTZ,
  expires_at DATE
);

CREATE TABLE trips (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  rider_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  driver_id BIGINT REFERENCES drivers(user_id) ON DELETE SET NULL,
  vehicle_id BIGINT REFERENCES vehicles(id) ON DELETE SET NULL,
  status VARCHAR(20) NOT NULL DEFAULT 'requested',
  pickup_lat DECIMAL(10,8) NOT NULL,
  pickup_lng DECIMAL(11,8) NOT NULL,
  pickup_address VARCHAR(300) NOT NULL,
  dropoff_lat DECIMAL(10,8) NOT NULL,
  dropoff_lng DECIMAL(11,8) NOT NULL,
  dropoff_address VARCHAR(300) NOT NULL,
  requested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  accepted_at TIMESTAMPTZ,
  started_at TIMESTAMPTZ,
  completed_at TIMESTAMPTZ,
  cancelled_at TIMESTAMPTZ,
  cancelled_by VARCHAR(10),
  cancellation_reason TEXT,
  distance_km DECIMAL(8,2),
  duration_minutes INT,
  base_fare DECIMAL(8,2) NOT NULL,
  distance_fare DECIMAL(8,2) NOT NULL DEFAULT 0.00,
  time_fare DECIMAL(8,2) NOT NULL DEFAULT 0.00,
  surge_multiplier DECIMAL(3,2) NOT NULL DEFAULT 1.00,
  promo_discount DECIMAL(8,2) NOT NULL DEFAULT 0.00,
  total_fare DECIMAL(8,2) NOT NULL,
  currency VARCHAR(3) NOT NULL DEFAULT 'USD',
  payment_status VARCHAR(20) NOT NULL DEFAULT 'pending'
);

CREATE TABLE trip_events (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  trip_id BIGINT NOT NULL REFERENCES trips(id) ON DELETE CASCADE,
  event_type VARCHAR(30) NOT NULL,
  lat DECIMAL(10,8),
  lng DECIMAL(11,8),
  metadata JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE payments (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  trip_id BIGINT NOT NULL REFERENCES trips(id) ON DELETE CASCADE,
  user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  amount DECIMAL(10,2) NOT NULL,
  currency VARCHAR(3) NOT NULL DEFAULT 'USD',
  payment_method VARCHAR(20) NOT NULL,
  payment_status VARCHAR(20) NOT NULL DEFAULT 'pending',
  transaction_id VARCHAR(100) UNIQUE,
  processed_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE ratings (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  trip_id BIGINT NOT NULL REFERENCES trips(id) ON DELETE CASCADE,
  from_user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  to_user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  rating INT NOT NULL CHECK (rating BETWEEN 1 AND 5),
  comment TEXT,
  category VARCHAR(20),
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(trip_id, from_user_id)
);

CREATE INDEX idx_trips_rider ON trips(rider_id, requested_at DESC);
CREATE INDEX idx_trips_driver ON trips(driver_id, requested_at DESC);
CREATE INDEX idx_trips_status ON trips(status, requested_at DESC) WHERE status IN ('requested','accepted','started');
CREATE INDEX idx_trip_events_trip ON trip_events(trip_id, created_at DESC);
CREATE INDEX idx_payments_trip ON payments(trip_id);
CREATE INDEX idx_ratings_to_user ON ratings(to_user_id, created_at DESC);

⚠️ Common Mistakes

  • Not separating users from drivers — a driver is also a user who can ride
  • Storing mutable fare fields without an immutable audit trail
  • Forgetting event sourcing for trip status changes (needed for debugging/disputes)
  • Using FLOAT for money — always use DECIMAL
  • Missing unique constraint on ratings(trip_id, from_user_id) to prevent duplicate ratings
  • Not handling driver document expiry and verification states
  • Storing location as a single string instead of separate lat/lng with spatial indexing potential

✅ What Interviewers Look For

  • Entity inheritance patterns (user → driver/rider)
  • Immutability for financial data (fare never changes after trip ends)
  • Audit trails and event logging
  • State machine design for trip lifecycle
  • Handling edge cases: cancellations, no-shows, refunds

Design the Database Schema for a URL Shortener

Medium

Design the database schema for a URL shortening service like bit.ly. Support custom aliases, analytics, expiration, and user accounts.

Core Entities

  • Users (optional anonymous)
  • Shortened URLs
  • Click analytics
  • Custom aliases and domains

Functional Requirements

  • Shorten a long URL to a short code
  • Custom aliases (e.g., bit.ly/my-brand)
  • Redirect with 301/302
  • Track clicks (timestamp, referrer, country, device)
  • URL expiration (optional)
  • QR code generation (metadata only)

Scale Hints

  • 100M+ new URLs per month
  • 10B+ redirects per month
  • Short code: 6-8 alphanumeric chars
  • Reads >> Writes by 100:1

Constraints

  • Short codes must be unique and unguessable
  • Custom aliases: 3-30 chars, alphanumeric + hyphen
  • Max URL length: 2048 chars
  • Soft-delete for analytics retention
Expert Solution — URL Shortener Schema
CREATE TABLE users (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  email VARCHAR(255) UNIQUE,
  password_hash VARCHAR(255),
  display_name VARCHAR(50),
  api_key VARCHAR(64) UNIQUE,
  tier VARCHAR(20) NOT NULL DEFAULT 'free',
  monthly_link_limit INT NOT NULL DEFAULT 100,
  custom_domain VARCHAR(255),
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  last_login_at TIMESTAMPTZ
);

CREATE TABLE short_links (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  user_id BIGINT REFERENCES users(id) ON DELETE SET NULL,
  short_code VARCHAR(30) NOT NULL UNIQUE,
  original_url TEXT NOT NULL,
  is_custom_alias BOOLEAN NOT NULL DEFAULT FALSE,
  title VARCHAR(200),
  description TEXT,
  status VARCHAR(20) NOT NULL DEFAULT 'active',
  expires_at TIMESTAMPTZ,
  password_hash VARCHAR(255),
  total_clicks INT NOT NULL DEFAULT 0,
  unique_clicks INT NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  deleted_at TIMESTAMPTZ,
  CONSTRAINT valid_short_code CHECK (short_code ~ '^[a-zA-Z0-9_-]+$'),
  CONSTRAINT valid_alias_length CHECK (
    is_custom_alias = FALSE OR LENGTH(short_code) BETWEEN 3 AND 30
  )
);

CREATE TABLE link_tags (
  link_id BIGINT NOT NULL REFERENCES short_links(id) ON DELETE CASCADE,
  tag VARCHAR(30) NOT NULL,
  PRIMARY KEY (link_id, tag)
);

CREATE TABLE click_events (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  link_id BIGINT NOT NULL REFERENCES short_links(id) ON DELETE CASCADE,
  clicked_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  ip_hash VARCHAR(64) NOT NULL,
  user_agent TEXT,
  referrer TEXT,
  country_code CHAR(2),
  city VARCHAR(50),
  device_type VARCHAR(20),
  browser VARCHAR(30),
  os VARCHAR(30),
  language VARCHAR(10)
);

CREATE TABLE click_aggregates (
  link_id BIGINT NOT NULL REFERENCES short_links(id) ON DELETE CASCADE,
  date DATE NOT NULL,
  total_clicks INT NOT NULL DEFAULT 0,
  unique_clicks INT NOT NULL DEFAULT 0,
  country_breakdown JSONB,
  device_breakdown JSONB,
  referrer_breakdown JSONB,
  PRIMARY KEY (link_id, date)
);

CREATE TABLE link_rules (
  id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  link_id BIGINT NOT NULL REFERENCES short_links(id) ON DELETE CASCADE,
  rule_type VARCHAR(30) NOT NULL,
  condition_value VARCHAR(255) NOT NULL,
  redirect_url TEXT,
  is_active BOOLEAN NOT NULL DEFAULT TRUE,
  priority INT NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_short_links_code ON short_links(short_code) WHERE deleted_at IS NULL;
CREATE INDEX idx_short_links_user ON short_links(user_id, created_at DESC);
CREATE INDEX idx_click_events_link ON click_events(link_id, clicked_at DESC);
CREATE INDEX idx_click_events_ip ON click_events(link_id, ip_hash, clicked_at);
CREATE INDEX idx_click_aggregates_date ON click_aggregates(date);

⚠️ Common Mistakes

  • Storing the original URL as VARCHAR instead of TEXT — URLs can exceed 2048 chars
  • Not using a separate click_events table and instead incrementing a counter — you lose analytics granularity
  • Missing unique constraint on short_code — race conditions will create duplicates
  • Forgetting soft-delete — hard-deleting loses all click history
  • Not hashing IPs for privacy compliance (GDPR)
  • Storing raw user agents instead of parsed device/browser/os fields
  • No expiration or status field — inactive links should still resolve but not count

✅ What Interviewers Look For

  • Understanding of write-heavy vs read-heavy workloads
  • Separating event data (clicks) from aggregate data for performance
  • Privacy considerations (IP hashing, retention policies)
  • Flexible routing rules (device-based, geo-based redirects)
  • Handling collisions in short code generation

Ready to Diff Real Schemas?

SchemaLens is the fastest way to compare database schemas, spot breaking changes, and generate migration scripts — all in your browser.

Try SchemaLens Free →

No signup required. Your data never leaves your browser.