How to Generate Realistic SQL Test Data for Any Database
Every developer has been there. You're building a new feature, writing tests, and suddenly you need realistic data. Not test1@example.com and User A. You need thousands of rows with real-looking names, plausible email addresses, dates that make sense, and foreign keys that actually reference valid records. Manually crafting INSERT statements is soul-crushing and error-prone.
Good test data matters more than most teams admit. Tests with unrealistic data pass when they should fail. Performance tests on 10 rows of data hide query problems that explode at 10,000 rows. Demo environments with placeholder text look unprofessional to stakeholders. And GDPR means you can't just copy production data anymore.
In this guide, we'll cover every practical approach to generating SQL test data โ from quick one-liners to full synthetic data pipelines โ and show you how to do it for PostgreSQL, MySQL, SQLite, SQL Server, and Oracle.
What makes test data "good"?
Before we dive into tools, let's define what we're aiming for. Good test data has four properties:
- Realistic shape: Names look like names, emails contain @ symbols and domains, dates fall within reasonable ranges. This catches validation bugs that garbage data misses.
- Referential integrity: Foreign keys point to real rows. If you have an
orderstable, everyuser_idshould exist in theuserstable. - Edge case coverage: Include nulls, empty strings, maximum-length values, Unicode characters, and boundary dates. Bugs love edge cases.
- Volume flexibility: Generate 10 rows for unit tests, 10,000 for load tests, and 1,000,000 for stress tests โ without changing your approach.
Method 1: Database-native generation
PostgreSQL
PostgreSQL has the richest built-in randomization functions. Use random(), md5(), and arrays to generate rows inline:
INSERT INTO users (name, email, created_at)
SELECT
'User ' || i,
'user' || i || '@example.com',
NOW() - (random() * 365 || ' days')::interval
FROM generate_series(1, 1000) AS i;
For more realistic names, install the faker extension if available, or use arrays of real names:
WITH names(name) AS (
SELECT unnest(ARRAY['Alice','Bob','Charlie','Diana','Evan'])
)
INSERT INTO users (name, email)
SELECT
n.name || ' ' || (ARRAY['Smith','Jones','Lee','Patel'])[ceil(random()*4)],
lower(regexp_replace(n.name, '[^a-zA-Z]', '', 'g')) || ceil(random()*999) || '@gmail.com'
FROM generate_series(1, 1000) AS i
CROSS JOIN LATERAL (SELECT name FROM names ORDER BY random() LIMIT 1) n;
MySQL
MySQL is more limited for procedural generation, but you can use variables and stored procedures:
DELIMITER $$
CREATE PROCEDURE generate_users(IN count INT)
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i <= count DO
INSERT INTO users (name, email) VALUES (
CONCAT('User ', i),
CONCAT('user', i, '@example.com')
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL generate_users(1000);
SQLite
SQLite has no procedural language, but you can use recursive CTEs:
WITH RECURSIVE cnt(x) AS (
SELECT 1
UNION ALL
SELECT x+1 FROM cnt WHERE x < 1000
)
INSERT INTO users (name, email)
SELECT
'User ' || x,
'user' || x || '@example.com'
FROM cnt;
Method 2: Programming language + ORM
For most teams, the cleanest approach is generating data in the same language as the application. This ensures data shape matches your validation logic.
Python with Faker
from faker import Faker
import psycopg2
fake = Faker()
conn = psycopg2.connect("dbname=test")
cur = conn.cursor()
users = [(fake.name(), fake.email(), fake.date_between('-2y'))
for _ in range(1000)]
cur.executemany(
"INSERT INTO users (name, email, created_at) VALUES (%s, %s, %s)",
users
)
conn.commit()
Node.js with @faker-js/faker
import { faker } from '@faker-js/faker';
import { db } from './db';
const users = Array.from({ length: 1000 }, () => ({
name: faker.person.fullName(),
email: faker.internet.email(),
createdAt: faker.date.past({ years: 2 })
}));
await db.insertInto('users').values(users).execute();
The advantage here is full control. You can ensure referential integrity by generating users first, collecting their IDs, then generating orders that reference those IDs. You can match your application's exact validation rules. And you can commit the seed script to version control.
Method 3: Online SQL test data generators
Sometimes you don't want to install anything. You just need a quick .sql file you can import. Online generators are perfect for this.
The best tools let you:
- Paste a
CREATE TABLEstatement and auto-detect columns - Choose data types per column (names, emails, dates, UUIDs, addresses, phone numbers)
- Select row count (from a few rows up to thousands)
- Export for your specific dialect (PostgreSQL, MySQL, SQLite, SQL Server, Oracle)
- Copy to clipboard or download as
.sql
๐ฒ Generate SQL test data instantly
Paste your CREATE TABLE statement and get realistic INSERT statements in seconds. Supports 5 SQL dialects and 20+ data types. No signup required.
Try the SQL Test Data Generator โMethod 4: Snapshot and sanitize production data
If you have production data and need maximum realism, you can anonymize a subset:
- Export a random 1% sample of production rows
- Scrub PII: replace names, emails, phone numbers, and addresses with fake equivalents
- Hash or truncate sensitive identifiers (SSNs, credit card numbers)
- Shuffle foreign key relationships to break real associations
- Add synthetic edge cases that may not exist in production (nulls, empty strings, Unicode)
This approach requires the most care โ one mistake and you've leaked customer data in a test environment. Most teams should prefer fully synthetic generation.
Dialect-specific gotchas
PostgreSQL
- Use
gen_random_uuid()for UUID columns (requirespgcrypto) RETURNING idis your friend for collecting generated primary keys- Array and JSONB columns need special formatting โ generate valid JSON, not just strings
MySQL
- Disable foreign key checks during bulk inserts:
SET foreign_key_checks = 0; - Use multi-row INSERT syntax for 10ร speed:
INSERT INTO t (a,b) VALUES (1,2), (3,4), ... - Be careful with
ONLY_FULL_GROUP_BYmode when testing aggregations
SQL Server
- Use
NEWID()for UUIDs andABS(CHECKSUM(NEWID()))for random integers - Temporarily disable constraints with
ALTER TABLE ... NOCHECK CONSTRAINT ALL - Identity inserts require
SET IDENTITY_INSERT table_name ON
SQLite
- Wrap large inserts in a transaction โ SQLite is dramatically slower without one
- Use
randomblob(16)for UUID-like values - No stored procedures, so generate data externally or use recursive CTEs
Oracle
- Use
SYS_GUID()for UUID columns DBMS_RANDOMpackage provides string and numeric generation- Tablespace quotas may block large inserts โ check with your DBA
A practical workflow for your team
Here's a workflow that scales from solo developers to teams of fifty:
- Commit seed scripts: Store test data generation in
db/seeds/ortests/fixtures/. Version control ensures everyone generates the same data. - Separate unit and integration seeds: Unit tests need 10โ50 rows. Integration tests need 1,000+. Use separate scripts or parameterized functions.
- Reset before each test suite: Truncate tables and re-seed for deterministic tests. Use transactions for speed if your framework supports it.
- Generate referential data in order: Users โ Products โ Orders โ OrderItems. Never generate child rows before parents.
- Include deliberate edge cases: Every seed script should include at least one null, one empty string, one maximum-length value, one Unicode string, and one future date.
- Benchmark with realistic volume: Run your seed script to generate 100ร your expected production volume. If queries slow down, you have a performance bug.
Summary
Generating SQL test data doesn't have to be painful. For quick tasks, database-native functions or online generators get you there in seconds. For production applications, seed scripts in your application language with a library like Faker give you control, repeatability, and referential integrity.
The teams that move fastest aren't the ones with the most data โ they're the ones with the most realistic data, generated on demand, checked into version control, and run in CI on every commit.
๐ฒ Need test data right now?
Paste your CREATE TABLE statement into our free SQL Test Data Generator and get realistic INSERT statements in seconds. No signup, no install, no data leaves your browser.
Generate Test Data โ