Normalization rules, naming conventions, soft deletes, audit trails, multi-tenancy patterns, versioning strategies, and the schema decisions I've regretted. PostgreSQL-focused.
A schema is a contract with your future self. Every column you add, every constraint you skip, every "we'll clean this up later" — it all compounds. I've worked on systems where a single bad schema decision made three years ago now costs the team a full sprint every quarter in workarounds.
The database outlives everything. Your frontend framework will change. Your API layer will get rewritten. Your deployment strategy will evolve. But the data? The data stays. And the shape you gave it on day one follows you forever, because migrating a table with 200 million rows is not the same as refactoring a React component.
This is what I've learned about making schema decisions that don't haunt you. PostgreSQL-focused, because that's what I use and trust, but most of these patterns apply to any relational database.
I've seen more arguments about naming conventions than about actual architecture decisions. Here's what I've settled on after years of context-switching between projects:
Snake_case for everything. Tables, columns, indexes, constraints. No camelCase, no PascalCase. PostgreSQL folds unquoted identifiers to lowercase anyway, so createdAt becomes createdat unless you double-quote it everywhere. Don't fight the database.
Plural table names. A table holds many rows. users not user. orders not order. order_items not order_item. This reads naturally in queries: SELECT * FROM users WHERE ... — you're selecting from a collection.
_id suffix for foreign keys. user_id, order_id, tenant_id. The primary key is just id. This is unambiguous. When you join tables, users.id = orders.user_id reads like English.
_at suffix for timestamps. created_at, updated_at, deleted_at, published_at, expires_at. You always know it's a point in time.
is_ prefix for booleans. is_active, is_verified, is_published. Some people use has_ for ownership booleans (has_mfa_enabled), but I keep it simple and use is_ for everything.
_count suffix for denormalized counters. comment_count, follower_count. Makes it clear this is a cached number, not a live calculation.
Here's the thing: consistency beats perfection. I've seen teams spend weeks debating whether it should be email_address or email or email_addr. Just pick one pattern and enforce it everywhere. The worst naming convention is the one that's inconsistently applied.
-- Good: consistent, readable, no surprises
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL,
display_name TEXT NOT NULL,
password_hash TEXT NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT true,
is_verified BOOLEAN NOT NULL DEFAULT false,
login_count INTEGER NOT NULL DEFAULT 0,
last_login_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Bad: inconsistent casing, mixed conventions, ambiguous names
CREATE TABLE User (
UserID SERIAL PRIMARY KEY,
EmailAddress VARCHAR(255),
name VARCHAR(100),
passwd VARCHAR(255),
active BOOLEAN,
verified INT, -- is this 0/1? a verification level?
lastLogin TIMESTAMP, -- camelCase in a snake_case world
created TIMESTAMP -- created what? when?
);One more thing: never use reserved words as column names. type, order, user, group, table — they all work if you quote them, but they'll burn you in ORMs, query builders, and every dynamic SQL generation tool. Use kind instead of type, sort_order instead of order. Your future self will thank you.
Every table in my schemas starts with the same skeleton. No exceptions.
CREATE TABLE things (
id BIGSERIAL PRIMARY KEY,
-- ... domain columns ...
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);This is one of those decisions that generates way more heat than light. Here's the actual trade-off:
BIGSERIAL (auto-incrementing integer):
id to get insertion order.UUID v4 (random):
UUID v7 (time-sorted, RFC 9562):
gen_random_uuid() and you can use uuid_generate_v7() with extensions.My current stance: BIGSERIAL for internal tables, UUID v7 for anything exposed to the outside world. If an ID ever appears in a URL, an API response, or a webhook payload, use UUIDs. If it's a pure join table that users never see, BIGSERIAL is fine.
-- For an API-facing resource
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
owner_id UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- For an internal mapping table
CREATE TABLE project_members (
id BIGSERIAL PRIMARY KEY,
project_id UUID NOT NULL REFERENCES projects(id),
user_id UUID NOT NULL REFERENCES users(id),
role TEXT NOT NULL DEFAULT 'member',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (project_id, user_id)
);Use TIMESTAMPTZ, not TIMESTAMP. The "TZ" version stores the value in UTC and converts on read based on the session timezone. The non-TZ version stores whatever you give it with no timezone context — which means if two servers in different timezones write to the same table, you get silent data corruption.
-- Always this
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
-- Never this
created_at TIMESTAMP NOT NULL DEFAULT NOW()PostgreSQL doesn't have MySQL's ON UPDATE CURRENT_TIMESTAMP. You need a trigger:
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Apply to every table that has updated_at
CREATE TRIGGER set_updated_at
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();
CREATE TRIGGER set_updated_at
BEFORE UPDATE ON projects
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();Yes, you need one trigger per table. Yes, it's tedious. Write a migration helper that creates the trigger automatically when you add a table. It's worth it, because the alternative is remembering to set updated_at = NOW() in every UPDATE query across your entire application — and you will forget.
Every CS course teaches normalization up to 3NF (Third Normal Form). The rules are:
In practice, 3NF is the sweet spot for transactional tables. You should start there and deviate only when you have a specific, measurable reason.
Here's a properly normalized order system:
CREATE TABLE customers (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
display_name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
sku TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
price_cents INTEGER NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
total_cents INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
unit_price_cents INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Notice unit_price_cents on order_items. This is intentional denormalization. We snapshot the price at the time of order, because the product's price might change later. This is one of the most common and correct reasons to denormalize.
Reporting tables. If your analytics dashboard needs to join 8 tables to render, create a denormalized reporting table and populate it with a background job. Your transactional schema stays clean, and your reporting queries stay fast.
-- Denormalized reporting table, populated by a cron job
CREATE TABLE order_reports (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL,
customer_email TEXT NOT NULL,
customer_name TEXT NOT NULL,
product_names TEXT[] NOT NULL, -- array is fine here
item_count INTEGER NOT NULL,
total_cents INTEGER NOT NULL,
order_status TEXT NOT NULL,
ordered_at TIMESTAMPTZ NOT NULL,
report_generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Materialized views. PostgreSQL's MATERIALIZED VIEW is underrated. It's a denormalized snapshot that you refresh on demand. Perfect for dashboards.
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
DATE_TRUNC('month', o.created_at) AS month,
COUNT(DISTINCT o.id) AS order_count,
SUM(o.total_cents) AS revenue_cents,
COUNT(DISTINCT o.customer_id) AS unique_customers
FROM orders o
WHERE o.status != 'cancelled'
GROUP BY DATE_TRUNC('month', o.created_at)
ORDER BY month DESC;
-- Refresh it nightly
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;JSON columns for dynamic attributes. When different rows need different fields — product variants, form submissions, user preferences — a JSONB column is often better than a nightmare EAV (Entity-Attribute-Value) table. More on this later.
Never denormalize your source-of-truth tables. Denormalize copies, snapshots, reports, and caches. The canonical data stays normalized. When the denormalized copy gets stale or corrupted (and it will), you rebuild it from the normalized source.
I've heard every excuse for skipping foreign keys. "They slow down writes." "We enforce it in the application." "We need flexibility."
These are all wrong.
Foreign keys are the single most impactful thing you can add to a schema. They prevent entire categories of bugs that no amount of application code can catch — race conditions, partial failures, orphaned rows from failed transactions. A foreign key is a guarantee from the database engine itself, enforced at the storage level. Your ORM's beforeDelete hook is a suggestion.
-- Always do this
ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(id);
-- The "we'll handle it in code" alternative:
-- Hope. Hope is not a strategy.This is where it gets nuanced. What happens when you delete a parent row?
RESTRICT (default): The delete fails if child rows exist. Use this for most relationships. You can't delete a customer who has orders — that's business logic encoded in the schema.
-- Customer can't be deleted while they have orders
ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE RESTRICT;CASCADE: Child rows are automatically deleted. Use this sparingly and deliberately. Good for "part-of" relationships where the child has no meaning without the parent.
-- Deleting an order deletes its line items — they're meaningless alone
ALTER TABLE order_items
ADD CONSTRAINT fk_order_items_order
FOREIGN KEY (order_id) REFERENCES orders(id)
ON DELETE CASCADE;
-- Deleting a project deletes its memberships
ALTER TABLE project_members
ADD CONSTRAINT fk_project_members_project
FOREIGN KEY (project_id) REFERENCES projects(id)
ON DELETE CASCADE;SET NULL: The foreign key column is set to NULL. Use this when the relationship is optional and the child row is still meaningful on its own.
-- If a manager leaves, their reports still exist — just unassigned
ALTER TABLE employees
ADD CONSTRAINT fk_employees_manager
FOREIGN KEY (manager_id) REFERENCES employees(id)
ON DELETE SET NULL;My rule of thumb: default to RESTRICT, use CASCADE for composition relationships, use SET NULL for optional associations. If you're unsure, RESTRICT is always the safe choice — it's easier to relax a constraint than to recover deleted data.
Check constraints cost almost nothing at write time and prevent garbage data from entering your system forever:
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL CHECK (LENGTH(name) > 0),
price_cents INTEGER NOT NULL CHECK (price_cents >= 0),
discount_pct NUMERIC(5,2) CHECK (discount_pct BETWEEN 0 AND 100),
weight_kg NUMERIC(10,3) CHECK (weight_kg > 0),
status TEXT NOT NULL CHECK (status IN ('draft', 'active', 'archived')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Every check constraint you add is one fewer validation bug you'll ever have to debug in production. The database is the last line of defense. Use it.
Unique constraints are straightforward for simple cases:
ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email);But they get interesting with partial indexes — unique constraints that only apply to certain rows:
-- Only one active subscription per user (but they can have many cancelled ones)
CREATE UNIQUE INDEX uq_active_subscription
ON subscriptions (user_id)
WHERE status = 'active';
-- Only one primary address per user
CREATE UNIQUE INDEX uq_primary_address
ON addresses (user_id)
WHERE is_primary = true;This is one of PostgreSQL's killer features. Use it.
Soft deletes are simple in concept: instead of DELETE FROM users WHERE id = 42, you do UPDATE users SET deleted_at = NOW() WHERE id = 42. The row stays in the database but is filtered out of normal queries.
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL,
display_name TEXT NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT true,
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- The partial index: filter out deleted rows efficiently
CREATE INDEX idx_users_active ON users (email) WHERE deleted_at IS NULL;Every query in your application now needs WHERE deleted_at IS NULL. Every. Single. Query. ORMs help with this (Prisma has middleware, Drizzle has .where(isNull(deletedAt))), but it's a tax on every read operation. Miss it once and you're showing "deleted" data to users.
Here's where soft deletes get ugly. If you have UNIQUE (email) and a user soft-deletes their account, their email is still in the table. They can't re-register with the same email. A new user with that email can't sign up either.
The fix is a partial unique index:
-- Email must be unique, but only among non-deleted users
CREATE UNIQUE INDEX uq_users_email_active
ON users (email)
WHERE deleted_at IS NULL;This works, but now you need to remember this pattern for every unique column on every soft-deletable table. It's manageable with discipline, but it's complexity you've chosen to take on.
Soft deletes interact badly with foreign keys. If orders.user_id references users.id with ON DELETE RESTRICT, and you soft-delete a user... nothing happens. The FK doesn't fire because you didn't actually delete the row. The user is "gone" from the application's perspective but still very much present in the database.
This means your application code has to handle the case where a referenced entity is soft-deleted. Every join, every lookup, every time you follow a foreign key — you need to check deleted_at IS NULL on the referenced table too. Or you don't, and your application shows "Order by [deleted user]" which is either a bug or a feature depending on who you ask.
I use soft deletes only for user-facing entities where recovery is a business requirement — user accounts, projects, documents. Things where a support agent might need to restore a deletion. For everything else, I use hard deletes with an audit trail (more on this next).
-- Soft delete: user-facing, recoverable
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
content TEXT NOT NULL,
owner_id UUID NOT NULL REFERENCES users(id),
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_documents_active ON documents (owner_id) WHERE deleted_at IS NULL;
-- Hard delete: internal, not recoverable from UI (but audited)
CREATE TABLE api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
key_hash TEXT NOT NULL,
last_used_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- When deleted, it's gone. The audit_log records that it existed.Every non-trivial application needs an answer to "what happened to this record?" at some point. Audit trails are how you provide that answer without digging through application logs.
The simplest approach: a single audit_log table that records every change.
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
table_name TEXT NOT NULL,
record_id TEXT NOT NULL, -- text to handle both UUID and BIGINT PKs
operation TEXT NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
old_values JSONB, -- NULL for INSERT
new_values JSONB, -- NULL for DELETE
changed_fields TEXT[], -- which columns changed (for UPDATE)
changed_by UUID REFERENCES users(id),
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ip_address INET,
user_agent TEXT
);
CREATE INDEX idx_audit_log_table_record ON audit_log (table_name, record_id);
CREATE INDEX idx_audit_log_changed_by ON audit_log (changed_by);
CREATE INDEX idx_audit_log_changed_at ON audit_log (changed_at);The challenge is populating it. You can do it in application code (explicit, but easy to forget) or with triggers (automatic, but harder to pass context like changed_by).
Triggers capture every change automatically, even from raw SQL or database admin operations:
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
DECLARE
old_data JSONB;
new_data JSONB;
changed TEXT[];
col TEXT;
BEGIN
IF TG_OP = 'DELETE' THEN
old_data := to_jsonb(OLD);
INSERT INTO audit_log (table_name, record_id, operation, old_values, changed_at)
VALUES (TG_TABLE_NAME, OLD.id::TEXT, 'DELETE', old_data, NOW());
RETURN OLD;
ELSIF TG_OP = 'UPDATE' THEN
old_data := to_jsonb(OLD);
new_data := to_jsonb(NEW);
-- Find changed fields
FOR col IN SELECT key FROM jsonb_each(new_data)
LOOP
IF old_data->col IS DISTINCT FROM new_data->col THEN
changed := array_append(changed, col);
END IF;
END LOOP;
-- Only log if something actually changed
IF array_length(changed, 1) > 0 THEN
INSERT INTO audit_log (table_name, record_id, operation, old_values, new_values, changed_fields, changed_at)
VALUES (TG_TABLE_NAME, NEW.id::TEXT, 'UPDATE', old_data, new_data, changed, NOW());
END IF;
RETURN NEW;
ELSIF TG_OP = 'INSERT' THEN
new_data := to_jsonb(NEW);
INSERT INTO audit_log (table_name, record_id, operation, new_values, changed_at)
VALUES (TG_TABLE_NAME, NEW.id::TEXT, 'INSERT', new_data, NOW());
RETURN NEW;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- Apply to tables you want audited
CREATE TRIGGER audit_users
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();
CREATE TRIGGER audit_orders
AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();The downside: triggers don't know which application user made the change. You can work around this with session variables:
-- In your application, before the query:
SET LOCAL app.current_user_id = 'uuid-of-current-user';
-- In the trigger function:
changed_by_id := current_setting('app.current_user_id', true)::UUID;This works but feels fragile. In practice, I use a hybrid approach: triggers for the data capture, and application code to set the session context.
For tables where you need the full version history (not just "what changed" but "what was the state at time T"), a dedicated history table is cleaner:
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
content TEXT NOT NULL,
version INTEGER NOT NULL DEFAULT 1,
owner_id UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE document_history (
id BIGSERIAL PRIMARY KEY,
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
title TEXT NOT NULL,
content TEXT NOT NULL,
version INTEGER NOT NULL,
changed_by UUID REFERENCES users(id),
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (document_id, version)
);Before every update to documents, copy the current state to document_history and increment the version. Now you can reconstruct the document at any point in time, show diffs between versions, and even restore old versions.
The trade-off is storage. If your content column is large and changes frequently, the history table can grow fast. For most applications, this is fine — storage is cheap and you can archive old versions to cold storage if needed.
Multi-tenancy is one of those things that's easy to add at the start and nearly impossible to add later. If there's any chance your application will serve multiple organizations, build it in from day one.
The most common approach. Every table has a tenant_id column, and every query filters by it.
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
slug TEXT NOT NULL UNIQUE,
plan TEXT NOT NULL DEFAULT 'free' CHECK (plan IN ('free', 'pro', 'enterprise')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
project_id UUID NOT NULL REFERENCES projects(id),
title TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'todo',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Every index should include tenant_id for query performance
CREATE INDEX idx_projects_tenant ON projects (tenant_id);
CREATE INDEX idx_tasks_tenant_project ON tasks (tenant_id, project_id);The risk is data leakage — one missed WHERE tenant_id = ... and you're showing Tenant A's data to Tenant B. PostgreSQL's Row-Level Security (RLS) eliminates this class of bugs:
-- Enable RLS on the table
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
-- Create a policy based on a session variable
CREATE POLICY tenant_isolation ON projects
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);
-- Force RLS even for table owners
ALTER TABLE projects FORCE ROW LEVEL SECURITY;Now, even if your application code forgets the WHERE tenant_id = ... clause, PostgreSQL adds it automatically. This is defense in depth, and it's one of the strongest arguments for PostgreSQL in multi-tenant systems.
-- In your application's connection middleware:
SET LOCAL app.current_tenant_id = 'tenant-uuid-here';
-- Now all queries on RLS-enabled tables are automatically filtered
SELECT * FROM projects;
-- PostgreSQL internally adds: WHERE tenant_id = 'tenant-uuid-here'Pros: Single database, simple ops, efficient resource usage, easy cross-tenant queries for admin. Cons: Requires discipline (or RLS), every query touches tenant_id, harder to give tenants their own backup/restore.
Each tenant gets their own PostgreSQL schema. All schemas share the same database, but the tables are isolated by namespace.
-- Create a schema for each tenant
CREATE SCHEMA tenant_acme;
CREATE SCHEMA tenant_globex;
-- Tables live in the tenant's schema
CREATE TABLE tenant_acme.projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Set the search_path for the current tenant
SET search_path TO tenant_acme, public;
-- Now unqualified queries hit the tenant's schema
SELECT * FROM projects; -- queries tenant_acme.projectsPros: Strong isolation, no risk of cross-tenant data leakage, easy per-tenant backup/restore, tenants can have schema variations. Cons: Schema migration complexity (you need to migrate N schemas), connection pool management, PostgreSQL has practical limits around ~10,000 schemas.
Each tenant gets their own database. Maximum isolation.
Pros: Complete isolation, independent scaling, easy backup/restore, can place big tenants on dedicated hardware. Cons: Connection management nightmare, cross-tenant queries impossible, migration must run N times, significant operational overhead.
For most SaaS applications: start with row-level + RLS. It's the simplest to operate, and RLS gives you strong-enough isolation for the vast majority of use cases. Move to schema-per-tenant only if you have contractual isolation requirements (enterprise customers, regulated industries). Database-per-tenant is for when you absolutely must guarantee physical isolation — and even then, consider managed databases where the operational burden is handled for you.
PostgreSQL's JSONB is remarkable. It gives you document-database flexibility inside a relational system. But like any powerful tool, it's easy to abuse.
Dynamic user-defined attributes. An e-commerce platform where each product category has different attributes — shoes have size and color, electronics have voltage and wattage. Rather than an EAV table or a column for every possible attribute:
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
category_id UUID NOT NULL REFERENCES categories(id),
price_cents INTEGER NOT NULL CHECK (price_cents >= 0),
attributes JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Example data:
-- Shoe: {"size": "10", "color": "black", "material": "leather"}
-- Laptop: {"ram_gb": 16, "storage_gb": 512, "screen_inches": 14.0}Application settings and configuration. User preferences, feature flags, notification settings — things that change shape frequently and don't need relational integrity.
CREATE TABLE user_settings (
user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
preferences JSONB NOT NULL DEFAULT '{
"theme": "system",
"notifications": {"email": true, "push": false},
"locale": "en",
"timezone": "UTC"
}',
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);Webhook payloads, API responses, event data. Anything where the schema is controlled by an external system and might change without notice.
Without indexes, querying inside JSONB requires a full table scan. GIN indexes make it fast:
-- Index the entire JSONB column (good for @> containment queries)
CREATE INDEX idx_products_attributes ON products USING GIN (attributes);
-- Query: find all products with color = "black"
SELECT * FROM products
WHERE attributes @> '{"color": "black"}';
-- Query: find products with any of these attributes
SELECT * FROM products
WHERE attributes ?| ARRAY['color', 'size'];For specific paths you query often, a targeted index is more efficient:
-- Index a specific path
CREATE INDEX idx_products_color ON products ((attributes->>'color'));
-- Now this is a regular B-tree lookup
SELECT * FROM products
WHERE attributes->>'color' = 'black';Here's where JSONB bites you: you can't easily add NOT NULL constraints or default values to fields inside JSON. If you add a new required field to your product attributes, you need to backfill every existing row. With a regular column, a migration handles this atomically. With JSONB, you're writing an UPDATE that touches every row and hoping your application code handles missing fields gracefully until the backfill completes.
-- Adding a new column: clean, atomic, one statement
ALTER TABLE products ADD COLUMN weight_kg NUMERIC(10,3);
-- Adding a new JSONB field: messy, requires a backfill
UPDATE products
SET attributes = attributes || '{"weight_kg": null}'
WHERE NOT (attributes ? 'weight_kg');
-- Plus update application code to handle missing weight_kg
-- Plus update validation logic
-- Plus update every API response that includes attributesMy rule: if you're querying a JSONB field in a WHERE clause more than twice a week, it should probably be a column. JSONB is a great escape hatch. It's a terrible default.
Schema migrations are where theory meets reality. Your schema looks great on paper, but now you need to alter a table with 50 million rows during business hours without any downtime.
I've used most of them. Brief takes:
Drizzle (TypeScript): My current favorite. Schema-as-code, type-safe queries generated from the schema, clean migration SQL. The push command for development is fast.
Prisma (TypeScript): Great DX for simple schemas. Struggles with advanced PostgreSQL features (partial indexes, custom types, RLS). The migration engine can make surprising decisions.
Flyway (Java/CLI): Rock-solid, battle-tested, SQL-first. If you're writing raw SQL migrations, Flyway tracks them reliably. No magic, no surprises.
golang-migrate (Go/CLI): Similar to Flyway but lighter. Great for Go projects or when you just want a simple up/down migration runner.
The most dangerous schema changes are the ones that lock the table. In PostgreSQL, ALTER TABLE ... ADD COLUMN with a default value used to lock the entire table for the duration of the rewrite. Since PostgreSQL 11, simple defaults (constants) are metadata-only and instant. But other operations still lock:
-- SAFE: metadata-only, instant (PostgreSQL 11+)
ALTER TABLE users ADD COLUMN bio TEXT;
ALTER TABLE users ADD COLUMN is_premium BOOLEAN NOT NULL DEFAULT false;
-- DANGEROUS: rewrites the entire table, full table lock
ALTER TABLE users ALTER COLUMN email TYPE VARCHAR(500);
ALTER TABLE users ADD COLUMN score INTEGER NOT NULL DEFAULT compute_score();
-- DANGEROUS: scans the entire table to validate
ALTER TABLE orders ADD CONSTRAINT check_total CHECK (total_cents >= 0);You can't just add NOT NULL to an existing table with data — it fails because existing rows don't have a value. The naive approach:
-- This locks the table and rewrites it. Don't do this on a big table.
ALTER TABLE users ADD COLUMN avatar_url TEXT NOT NULL DEFAULT 'https://example.com/default.png';Since PostgreSQL 11, this is actually safe for constant defaults — it's metadata-only. But if your default is a function or you need to backfill with computed values, use the expand-contract pattern.
This is the gold standard for zero-downtime schema changes. Three phases:
Phase 1: Expand — Add the new column as nullable. Deploy application code that writes to both old and new columns.
-- Migration 1: Add the nullable column
ALTER TABLE users ADD COLUMN normalized_email TEXT;Phase 2: Migrate — Backfill existing rows in batches. Your application is already writing to the new column for new data.
-- Migration 2: Backfill in batches (don't do this in one statement for large tables)
UPDATE users SET normalized_email = LOWER(TRIM(email))
WHERE normalized_email IS NULL
LIMIT 10000;
-- Repeat until all rows are backfilledPhase 3: Contract — Once all rows are backfilled, add the NOT NULL constraint and remove the old column (if applicable).
-- Migration 3: Add constraint (use NOT VALID to avoid full table scan, then validate separately)
ALTER TABLE users ADD CONSTRAINT users_normalized_email_not_null
CHECK (normalized_email IS NOT NULL) NOT VALID;
-- Migration 4: Validate the constraint (takes a ShareUpdateExclusiveLock, not an AccessExclusiveLock)
ALTER TABLE users VALIDATE CONSTRAINT users_normalized_email_not_null;The NOT VALID + VALIDATE CONSTRAINT trick is crucial. Adding a check constraint normally scans the entire table while holding a heavy lock. NOT VALID adds the constraint without scanning (it only applies to new writes), and VALIDATE CONSTRAINT scans with a lighter lock that doesn't block reads or writes.
Creating indexes on large tables blocks writes by default. Always use CONCURRENTLY:
-- BLOCKS WRITES: don't do this on a live table
CREATE INDEX idx_users_email ON users (email);
-- NON-BLOCKING: use this instead
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);CONCURRENTLY takes longer and can't run inside a transaction, but it doesn't lock the table. The trade-off is always worth it in production.
Every schema has regrets. Here are mine.
Early in a project, I used SERIAL primary keys and exposed them directly in URLs: /users/42, /orders/1337. This leaked information (competitors could guess our user count), made pagination guessable, and broke when we needed to merge databases from two regions. Switching to UUIDs for external-facing IDs required a multi-month migration.
Lesson: use UUIDs for anything exposed outside your database. Use SERIAL/BIGSERIAL for internal join tables if you want, but never let an auto-incrementing integer appear in a URL.
We launched a table without check constraints because "we're moving fast and we'll add them later." Within two weeks, the data had negative prices, empty names, and an order_status value of "oof" that someone typed during testing and never cleaned up.
Adding constraints after the fact required:
This took longer than adding the constraint on day one would have. Start with constraints. Loosen them if you must. Never the other way around.
I once stored tags as a comma-separated string: "javascript,react,nextjs". Querying was a nightmare:
-- This is how you query comma-separated values. Don't do this.
SELECT * FROM posts WHERE tags LIKE '%react%';
-- This also matches "react-native" and "preact"
-- What I should have done:
CREATE TABLE post_tags (
post_id UUID NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
tag_id UUID NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (post_id, tag_id)
);Or at minimum, use PostgreSQL's native array type:
CREATE TABLE posts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
tags TEXT[] NOT NULL DEFAULT '{}',
-- ...
);
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
-- Clean queries
SELECT * FROM posts WHERE 'react' = ANY(tags);
SELECT * FROM posts WHERE tags @> ARRAY['react', 'nextjs'];Arrays are acceptable for simple lists that don't need their own attributes. The moment you need metadata on the relationship (like "who added this tag" or "when was it added"), you need a join table.
-- Seemed fine at the time
CREATE TABLE notifications (
id BIGSERIAL PRIMARY KEY,
type TEXT NOT NULL, -- "email", "sms", "push"
-- ...
);
-- Then you try to query it in literally any ORM:
-- notification.type ← conflicts with every language's type system
-- "type" is a reserved word in most SQL dialects
-- You end up quoting it everywhere: SELECT "type" FROM notificationsUse kind, category, or notification_type. Anything but type. Similarly, avoid order (use sort_order or position), user (use account or prefix it), and group (use team or group_name).
One table didn't get created_at because "we don't need it." Three months later, we needed to debug an issue and had no idea when records were created. Adding it retroactively meant all existing rows got the same timestamp (the migration timestamp), making the historical data useless.
Every table gets created_at. No exceptions. The cost is one column. The cost of not having it is unknowable until you need it.
I once created a chain of views — active_users filtered users, premium_active_users filtered active_users, and a reporting view joined all of them. It worked great until someone changed the users table and all three views broke silently. The query planner also struggled to optimize through multiple view layers.
Views are great for convenience and read-only access patterns. They're terrible as a place to encode business logic that changes. Keep business rules in application code where they're versioned, tested, and deployable. Use views for reporting shortcuts, not as architectural building blocks.
Here's what a well-designed schema looks like when you combine these patterns. A simplified project management system:
-- Extensions
CREATE EXTENSION IF NOT EXISTS "pgcrypto"; -- for gen_random_uuid()
-- Updated_at trigger function (reusable)
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Tenants
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL CHECK (LENGTH(name) > 0),
slug TEXT NOT NULL UNIQUE CHECK (slug ~ '^[a-z0-9-]+$'),
plan TEXT NOT NULL DEFAULT 'free'
CHECK (plan IN ('free', 'pro', 'enterprise')),
settings JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TRIGGER set_updated_at BEFORE UPDATE ON tenants
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
-- Users
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
email TEXT NOT NULL,
display_name TEXT NOT NULL CHECK (LENGTH(display_name) > 0),
password_hash TEXT NOT NULL,
is_active BOOLEAN NOT NULL DEFAULT true,
is_verified BOOLEAN NOT NULL DEFAULT false,
last_login_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX uq_users_email_per_tenant
ON users (tenant_id, email)
WHERE deleted_at IS NULL;
CREATE INDEX idx_users_tenant ON users (tenant_id) WHERE deleted_at IS NULL;
CREATE TRIGGER set_updated_at BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
-- Enable RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON users
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);
-- Projects
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name TEXT NOT NULL CHECK (LENGTH(name) > 0),
description TEXT NOT NULL DEFAULT '',
status TEXT NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'archived', 'deleted')),
owner_id UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_projects_tenant ON projects (tenant_id);
CREATE INDEX idx_projects_owner ON projects (owner_id);
CREATE TRIGGER set_updated_at BEFORE UPDATE ON projects
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON projects
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);
-- Tasks
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
title TEXT NOT NULL CHECK (LENGTH(title) > 0),
description TEXT NOT NULL DEFAULT '',
status TEXT NOT NULL DEFAULT 'todo'
CHECK (status IN ('todo', 'in_progress', 'review', 'done')),
priority INTEGER NOT NULL DEFAULT 0 CHECK (priority BETWEEN 0 AND 4),
assignee_id UUID REFERENCES users(id) ON DELETE SET NULL,
due_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
sort_order INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_tasks_tenant_project ON tasks (tenant_id, project_id);
CREATE INDEX idx_tasks_assignee ON tasks (assignee_id) WHERE assignee_id IS NOT NULL;
CREATE INDEX idx_tasks_status ON tasks (tenant_id, status) WHERE status != 'done';
CREATE TRIGGER set_updated_at BEFORE UPDATE ON tasks
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON tasks
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);
-- Task comments
CREATE TABLE task_comments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
task_id UUID NOT NULL REFERENCES tasks(id) ON DELETE CASCADE,
author_id UUID NOT NULL REFERENCES users(id),
body TEXT NOT NULL CHECK (LENGTH(body) > 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_task_comments_task ON task_comments (task_id);
CREATE TRIGGER set_updated_at BEFORE UPDATE ON task_comments
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
ALTER TABLE task_comments ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON task_comments
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);
-- Audit log (no RLS — admin-only table)
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL REFERENCES tenants(id),
table_name TEXT NOT NULL,
record_id TEXT NOT NULL,
operation TEXT NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
old_values JSONB,
new_values JSONB,
changed_fields TEXT[],
changed_by UUID REFERENCES users(id),
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_audit_log_tenant_table ON audit_log (tenant_id, table_name, record_id);
CREATE INDEX idx_audit_log_changed_at ON audit_log (changed_at);Notice the patterns:
tenant_id and RLS enabled (except tenants and audit_log).created_at and updated_at with triggers.ON DELETE behavior.users (where account recovery is needed), status-based archival on projects.Schema design isn't glamorous. Nobody has ever given a conference talk titled "I Added Check Constraints to Every Table and It Saved Us Six Months of Debugging." But that's exactly what good schema design does — it prevents problems so quietly that you never even know they would have existed.
The patterns in this post aren't novel. They're the result of years of writing migrations, debugging data corruption, and refactoring schemas under production load. Every one of them exists because I, or someone I worked with, did it the other way first and paid the price.
Start with constraints. Use foreign keys. Add created_at to everything. Pick a naming convention and enforce it ruthlessly. Use RLS for multi-tenancy. Be careful with JSONB. Test your migrations against production-sized data before deploying them.
The database is the foundation. Get it right, and everything built on top becomes simpler. Get it wrong, and no amount of clever application code can save you.