How to Generate Realistic SQL Test Data for Any Database

April 30, 2026 · 8 min read · SchemaLens Team

Every developer has been there. You're building a new feature, writing tests, and suddenly you need realistic data. Not test1@example.com and User A. You need thousands of rows with real-looking names, plausible email addresses, dates that make sense, and foreign keys that actually reference valid records. Manually crafting INSERT statements is soul-crushing and error-prone.

Good test data matters more than most teams admit. Tests with unrealistic data pass when they should fail. Performance tests on 10 rows of data hide query problems that explode at 10,000 rows. Demo environments with placeholder text look unprofessional to stakeholders. And GDPR means you can't just copy production data anymore.

In this guide, we'll cover every practical approach to generating SQL test data — from quick one-liners to full synthetic data pipelines — and show you how to do it for PostgreSQL, MySQL, SQLite, SQL Server, and Oracle.

What makes test data "good"?

Before we dive into tools, let's define what we're aiming for. Good test data has four properties:

Realistic shape: Names look like names, emails contain @ symbols and domains, dates fall within reasonable ranges. This catches validation bugs that garbage data misses.
Referential integrity: Foreign keys point to real rows. If you have an orders table, every user_id should exist in the users table.
Edge case coverage: Include nulls, empty strings, maximum-length values, Unicode characters, and boundary dates. Bugs love edge cases.
Volume flexibility: Generate 10 rows for unit tests, 10,000 for load tests, and 1,000,000 for stress tests — without changing your approach.

Method 1: Database-native generation

PostgreSQL

PostgreSQL has the richest built-in randomization functions. Use random(), md5(), and arrays to generate rows inline:

INSERT INTO users (name, email, created_at)
SELECT
  'User ' || i,
  'user' || i || '@example.com',
  NOW() - (random() * 365 || ' days')::interval
FROM generate_series(1, 1000) AS i;

For more realistic names, install the faker extension if available, or use arrays of real names:

WITH names(name) AS (
  SELECT unnest(ARRAY['Alice','Bob','Charlie','Diana','Evan'])
)
INSERT INTO users (name, email)
SELECT
  n.name || ' ' || (ARRAY['Smith','Jones','Lee','Patel'])[ceil(random()*4)],
  lower(regexp_replace(n.name, '[^a-zA-Z]', '', 'g')) || ceil(random()*999) || '@gmail.com'
FROM generate_series(1, 1000) AS i
CROSS JOIN LATERAL (SELECT name FROM names ORDER BY random() LIMIT 1) n;

MySQL

MySQL is more limited for procedural generation, but you can use variables and stored procedures:

DELIMITER $$
CREATE PROCEDURE generate_users(IN count INT)
BEGIN
  DECLARE i INT DEFAULT 1;
  WHILE i <= count DO
    INSERT INTO users (name, email) VALUES (
      CONCAT('User ', i),
      CONCAT('user', i, '@example.com')
    );
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

CALL generate_users(1000);

SQLite

SQLite has no procedural language, but you can use recursive CTEs:

WITH RECURSIVE cnt(x) AS (
  SELECT 1
  UNION ALL
  SELECT x+1 FROM cnt WHERE x < 1000
)
INSERT INTO users (name, email)
SELECT
  'User ' || x,
  'user' || x || '@example.com'
FROM cnt;

Method 2: Programming language + ORM

For most teams, the cleanest approach is generating data in the same language as the application. This ensures data shape matches your validation logic.

Python with Faker

from faker import Faker
import psycopg2

fake = Faker()
conn = psycopg2.connect("dbname=test")
cur = conn.cursor()

users = [(fake.name(), fake.email(), fake.date_between('-2y'))
         for _ in range(1000)]
cur.executemany(
    "INSERT INTO users (name, email, created_at) VALUES (%s, %s, %s)",
    users
)
conn.commit()

Node.js with @faker-js/faker

import { faker } from '@faker-js/faker';
import { db } from './db';

const users = Array.from({ length: 1000 }, () => ({
  name: faker.person.fullName(),
  email: faker.internet.email(),
  createdAt: faker.date.past({ years: 2 })
}));

await db.insertInto('users').values(users).execute();

The advantage here is full control. You can ensure referential integrity by generating users first, collecting their IDs, then generating orders that reference those IDs. You can match your application's exact validation rules. And you can commit the seed script to version control.

Method 3: Online SQL test data generators

Sometimes you don't want to install anything. You just need a quick .sql file you can import. Online generators are perfect for this.

The best tools let you:

Paste a CREATE TABLE statement and auto-detect columns
Choose data types per column (names, emails, dates, UUIDs, addresses, phone numbers)
Select row count (from a few rows up to thousands)
Export for your specific dialect (PostgreSQL, MySQL, SQLite, SQL Server, Oracle)
Copy to clipboard or download as .sql

🎲 Generate SQL test data instantly

Paste your CREATE TABLE statement and get realistic INSERT statements in seconds. Supports 5 SQL dialects and 20+ data types. No signup required.

Try the SQL Test Data Generator →

Method 4: Snapshot and sanitize production data

If you have production data and need maximum realism, you can anonymize a subset:

Export a random 1% sample of production rows
Scrub PII: replace names, emails, phone numbers, and addresses with fake equivalents
Hash or truncate sensitive identifiers (SSNs, credit card numbers)
Shuffle foreign key relationships to break real associations
Add synthetic edge cases that may not exist in production (nulls, empty strings, Unicode)

This approach requires the most care — one mistake and you've leaked customer data in a test environment. Most teams should prefer fully synthetic generation.

Dialect-specific gotchas

PostgreSQL

Use gen_random_uuid() for UUID columns (requires pgcrypto)
RETURNING id is your friend for collecting generated primary keys
Array and JSONB columns need special formatting — generate valid JSON, not just strings

MySQL

Disable foreign key checks during bulk inserts: SET foreign_key_checks = 0;
Use multi-row INSERT syntax for 10× speed: INSERT INTO t (a,b) VALUES (1,2), (3,4), ...
Be careful with ONLY_FULL_GROUP_BY mode when testing aggregations

SQL Server

Use NEWID() for UUIDs and ABS(CHECKSUM(NEWID())) for random integers
Temporarily disable constraints with ALTER TABLE ... NOCHECK CONSTRAINT ALL
Identity inserts require SET IDENTITY_INSERT table_name ON

SQLite

Wrap large inserts in a transaction — SQLite is dramatically slower without one
Use randomblob(16) for UUID-like values
No stored procedures, so generate data externally or use recursive CTEs

Oracle

Use SYS_GUID() for UUID columns
DBMS_RANDOM package provides string and numeric generation
Tablespace quotas may block large inserts — check with your DBA

A practical workflow for your team

Here's a workflow that scales from solo developers to teams of fifty:

Commit seed scripts: Store test data generation in db/seeds/ or tests/fixtures/. Version control ensures everyone generates the same data.
Separate unit and integration seeds: Unit tests need 10–50 rows. Integration tests need 1,000+. Use separate scripts or parameterized functions.
Reset before each test suite: Truncate tables and re-seed for deterministic tests. Use transactions for speed if your framework supports it.
Generate referential data in order: Users → Products → Orders → OrderItems. Never generate child rows before parents.
Include deliberate edge cases: Every seed script should include at least one null, one empty string, one maximum-length value, one Unicode string, and one future date.
Benchmark with realistic volume: Run your seed script to generate 100× your expected production volume. If queries slow down, you have a performance bug.

Summary

Generating SQL test data doesn't have to be painful. For quick tasks, database-native functions or online generators get you there in seconds. For production applications, seed scripts in your application language with a library like Faker give you control, repeatability, and referential integrity.

The teams that move fastest aren't the ones with the most data — they're the ones with the most realistic data, generated on demand, checked into version control, and run in CI on every commit.

🎲 Need test data right now?

Paste your CREATE TABLE statement into our free SQL Test Data Generator and get realistic INSERT statements in seconds. No signup, no install, no data leaves your browser.

Generate Test Data →

How to Generate Realistic SQL Test Data for Any Database

What makes test data "good"?

Method 1: Database-native generation

PostgreSQL

MySQL

SQLite

Method 2: Programming language + ORM

Python with Faker

Node.js with @faker-js/faker

Method 3: Online SQL test data generators

🎲 Generate SQL test data instantly

Method 4: Snapshot and sanitize production data

Dialect-specific gotchas

PostgreSQL

MySQL

SQL Server

SQLite

Oracle

A practical workflow for your team

Summary

🎲 Need test data right now?

Related articles