"The database crashed at 3 AM. The events table grew to 400GB. Indexes didn't help. Queries were timing out. Clients were furious."
This is a real story from 2024. We lost 12 hours of uptime, $8k in revenue, and almost lost a major client. We solved the problem in 3 hours with partitioning—a pattern we could have applied six months earlier.
Spoiler: PostgreSQL is not just "SQL with transactions". It's a Swiss Army knife with dozens of advanced features that 90% of developers never use. And that's a shame.
I've spent the last 8 years designing databases for fintech, e-commerce, SaaS, and government projects. I've tried MongoDB, Cassandra, DynamoDB—and always came back to PostgreSQL. Because it can do EVERYTHING, if you know how.
In this article—7 patterns I use in every production project. No textbook theory—just practice, code, and pitfalls I've already stepped on for you.
Why These Patterns
Over 8 years, I've worked with databases from 10GB to 3TB. From 100 queries per second to 50k RPS. From monolithic Django apps to microservice architectures.
These 7 patterns solve 90% of production problems:
- JSONB — flexible schema without migrations
- Partial Indexes — indexes 10x smaller
- Generated Columns — denormalization without hacks
- Row-Level Security — multi-tenancy at the database level
- Advisory Locks — distributed locks without Redis
- Materialized Views — analytics without slowdowns
- Partitioning — working with terabytes of data
All examples tested on PostgreSQL 15-17. Most patterns work from version 12+, but I recommend using at least PostgreSQL 15 for production.
Before Optimizing: How to Measure Performance
Golden rule of optimization: never apply patterns blindly. Measure first, optimize, then measure again.
EXPLAIN ANALYZE — Your Query X-Ray
EXPLAIN shows how PostgreSQL executes a query. ANALYZE — executes it and shows real numbers.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM orders
WHERE user_id = 12345
AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;What to look for:
- Execution Time — total execution time (your main metric)
- Planning Time — time to build the plan (usually < 1ms)
- Seq Scan vs Index Scan — full table scan or index
- Buffers: shared hit — data from cache (fast)
- Buffers: shared read — data from disk (slow)
Example of a good plan:
Index Scan using idx_orders_user_status on orders
(cost=0.42..8.44 rows=1 width=120)
(actual time=0.023..0.025 rows=3 loops=1)
Buffers: shared hit=12
Planning Time: 0.156 ms
Execution Time: 0.052 ms
Example of a bad plan:
Seq Scan on orders
(cost=0.00..185432.00 rows=50 width=120)
(actual time=245.123..1840.456 rows=3 loops=1)
Filter: ((user_id = 12345) AND (status = 'pending'))
Rows Removed by Filter: 8945632
Buffers: shared hit=45000 shared read=98000
Planning Time: 0.234 ms
Execution Time: 1840.678 ms
See the difference? 0.052ms vs 1840ms — 36,000x slower! The reason — Seq Scan instead of Index Scan.
Common mistake: looking only at cost. Cost is the planner's estimate,
not real time. Always use ANALYZE to get actual time.
pg_stat_statements — Finding the Most Expensive Queries
pg_stat_statements is a PostgreSQL extension that collects statistics on all executed queries.
Installation (one time):
-- In postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
-- Restart PostgreSQL, then:
CREATE EXTENSION pg_stat_statements;Top 10 slowest queries:
SELECT
substring(query, 1, 100) as short_query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time,
stddev_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;Top 10 queries by total time (finding bottlenecks):
SELECT
substring(query, 1, 100) as short_query,
calls,
total_exec_time,
mean_exec_time,
(total_exec_time / sum(total_exec_time) OVER ()) * 100 AS percent_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Real case from practice:
In 2024, we found through pg_stat_statements that one query consumed 47% of all database time. It was a SELECT with 15 JOINs without indexes. We added 3 indexes — database load dropped by 2x.
Metrics to Monitor
| Metric | What it shows | Good value |
|---|---|---|
| Query execution time (p95) | 95th percentile query time | < 50ms for OLTP |
| Cache hit ratio | shared_buffers_hit / (hit + read) | > 95% |
| Index usage | % of queries using indexes | > 90% |
| Sequential scans | Number of full table scans | Minimum |
| Temp files | Sorts/groups on disk | 0 (should be in memory) |
Query for cache hit ratio:
SELECT
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) * 100 AS cache_hit_ratio
FROM pg_statio_user_tables;If cache hit ratio < 90% — increase shared_buffers in configuration.
Pro tip: Set up automatic monitoring with pg_stat_statements + Grafana. This will let you see performance degradation BEFORE clients start complaining.
Pattern 1: JSONB — When Schema Can't Be Rigid
Problem: Flexible Schema Without Migrations
Imagine: you're building an e-commerce platform. You have products of different categories: clothing, electronics, furniture. Each category has its own attributes:
- Clothing: size, color, material, season
- Electronics: screen size, processor, memory, battery
- Furniture: material, dimensions, weight, style
Classic approach: create a product_attributes table with columns for ALL possible attributes. Result — 80% NULL values and a migration every time marketing comes up with a new attribute.
JSONB approach:
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
category TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL,
-- Flexible attributes in JSONB
attributes JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- GIN index for fast JSONB search
CREATE INDEX idx_products_attributes ON products USING GIN (attributes);
-- Index for specific field (even faster)
CREATE INDEX idx_products_color ON products ((attributes->>'color'));Usage examples:
-- Insert product with attributes
INSERT INTO products (name, category, price, attributes)
VALUES (
'iPhone 15 Pro',
'electronics',
999.99,
'{
"brand": "Apple",
"storage": "256GB",
"color": "Natural Titanium",
"screen": "6.1 inch",
"processor": "A17 Pro"
}'::jsonb
);
-- Find all black M-sized t-shirts
SELECT id, name, price, attributes
FROM products
WHERE category = 'clothing'
AND attributes @> '{"color": "black", "size": "M"}';
-- Update specific attribute (without overwriting others)
UPDATE products
SET attributes = attributes || '{"discount": 15}'::jsonb
WHERE id = 42;
-- Remove attribute
UPDATE products
SET attributes = attributes - 'discount'
WHERE id = 42;
-- Search nested fields
SELECT name, attributes->'specs'->>'processor' as cpu
FROM products
WHERE attributes->'specs'->>'processor' LIKE '%A17%';Real Case: Saved 2 Weeks of Development
Project: Real estate management SaaS (2024)
Problem: Each property type (apartment, house, commercial) has its own fields. Initially planned 3 separate tables + joins. That's 3 weeks of development + maintenance headache.
Solution: One properties table with a JSONB details column. Migration written in 2 hours. When client asked to add "apartments" and "parking" — added in 15 minutes without migrations.
Result:
- ⏱️ 2 weeks of saved time
- 🚀 New property types deployed in minutes
- 📊 JSONB indexes work as fast as regular columns
When NOT to Use JSONB
❌ Don't use JSONB if:
- You need foreign keys on attributes
- Attributes are critical for business logic (better as regular columns)
- You do frequent aggregations on these fields (SUM, AVG)
- Schema is stable and doesn't change
✅ Use JSONB if:
- Schema changes frequently
- Lots of NULL values in regular columns
- Different entities of the same type have different attributes
- You need flexibility without migrations
Common mistake: storing everything in JSONB "for the future". Don't do this. Business-critical fields (price, user_id, status) should be regular columns with types and constraints.
⚠️ Pitfalls and Gotchas
Pitfall #1: Forgot about the containment operator (@>)
-- ❌ SLOW: Search via ->> (doesn't use GIN index)
SELECT * FROM products
WHERE attributes->>'color' = 'red';
-- ✅ FAST: Search via @> (uses GIN index)
SELECT * FROM products
WHERE attributes @> '{"color": "red"}';Why: The ->> operator returns text, index not used. The @> operator works with JSONB directly — GIN index works.
Pitfall #2: Wrong index type
-- ❌ SLOW: B-tree index on JSONB column (almost useless)
CREATE INDEX idx_bad ON products (attributes);
-- ✅ FAST: GIN index for searching inside content
CREATE INDEX idx_good ON products USING GIN (attributes);
-- ✅ EVEN FASTER: Index on specific field
CREATE INDEX idx_best ON products ((attributes->>'color'));Rule: GIN for searching inside JSONB, B-tree for specific fields.
Pitfall #3: Aggregations on JSONB fields (slow)
-- ❌ VERY SLOW: SUM on JSONB field
SELECT SUM((attributes->>'price')::numeric) FROM products;
-- Doesn't use indexes, casts type for each row
-- ✅ SOLUTION: Move critical fields to separate columns
ALTER TABLE products ADD COLUMN price NUMERIC(10,2);
UPDATE products SET price = (attributes->>'price')::numeric;
-- Now SUM(price) works fastWhen it happened: E-commerce project (2024), revenue report took 45 seconds. Moved price and quantity to columns — became 0.3 seconds.
Pitfall #4: JSONB bloat with frequent UPDATEs
-- ❌ BAD: Updating one field rewrites entire JSONB
UPDATE products
SET attributes = attributes || '{"views": 1001}'::jsonb
WHERE id = 123;
-- Each UPDATE creates new version of ENTIRE JSONB object
-- ✅ SOLUTION: Counters and frequently changing fields — separate columns
ALTER TABLE products ADD COLUMN views INTEGER DEFAULT 0;
UPDATE products SET views = views + 1 WHERE id = 123;
-- UPDATE of only one INT — 10x fasterRule: If a field updates frequently (counters, timestamps) — DON'T store in JSONB.
Pitfall #5: NULL vs empty object vs missing key
-- These are THREE DIFFERENT things:
INSERT INTO products (attributes) VALUES (NULL); -- NULL
INSERT INTO products (attributes) VALUES ('{}'); -- Empty object
INSERT INTO products (attributes) VALUES ('{"color": null}'); -- Key with null
-- Search works differently:
WHERE attributes IS NULL -- Only first
WHERE attributes = '{}' -- Only second
WHERE attributes->>'color' IS NULL -- Second and third
WHERE attributes ? 'color' -- Only thirdSolution: Define convention (e.g., always '{}', never NULL) and stick to it.
Pattern 2: Partial Indexes — Indexes That Don't Eat Space
Problem: Indexes Take More Space Than Data
Scenario: you have an orders table with millions of records. 95% of orders are completed (status = 'completed'). You need fast search only for active orders (status IN ('pending', 'processing')).
Bad solution:
-- Index on entire table (95% garbage)
CREATE INDEX idx_orders_status ON orders (status);Problems:
- Index is huge (millions of rows)
- Slow inserts (each INSERT updates index)
- Cache filled with unnecessary data
Correct solution:
-- Partial index only for active orders (5% of table)
CREATE INDEX idx_orders_active
ON orders (created_at, user_id)
WHERE status IN ('pending', 'processing');Result:
- Index is 20x smaller
- Faster inserts (fewer indexes to update)
- Cache used efficiently
Real Examples
-- Index only for non-deleted records (soft delete)
CREATE INDEX idx_users_active
ON users (email)
WHERE deleted_at IS NULL;
-- Index only for future events
CREATE INDEX idx_events_upcoming
ON events (start_time)
WHERE start_time > NOW();
-- Index only for unpaid invoices
CREATE INDEX idx_invoices_unpaid
ON invoices (due_date, customer_id)
WHERE paid_at IS NULL;
-- Composite partial index for complex conditions
CREATE INDEX idx_orders_priority
ON orders (created_at DESC)
WHERE status = 'processing'
AND priority = 'high'
AND assigned_to IS NULL;Usage in Queries
-- PostgreSQL automatically uses partial index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending'
AND user_id = 12345
ORDER BY created_at DESC
LIMIT 10;
-- Index Scan using idx_orders_active on orders
-- Buffers: shared hit=42Pro tip: Partial indexes are perfect for "hot" data. You have 10 million orders but only 50k active? Partial index will be 200x smaller and much faster to search.
Real Case: Index from 40GB to 2GB
Project: Logistics platform (2023)
Problem: shipments table with 50 million records. Index on status took 40GB. 98% of shipments — delivered (status = 'delivered'). We only need active ones.
Solution:
-- Was
CREATE INDEX idx_shipments_status ON shipments (status);
-- Size: 40GB
-- Became
CREATE INDEX idx_shipments_active
ON shipments (status, estimated_delivery)
WHERE status IN ('pending', 'in_transit', 'out_for_delivery');
-- Size: 2GBResult:
- 💾 Freed 38GB of disk space
- ⚡ Queries 3x faster
- 📈 Inserts 40% faster (fewer indexes to update)
Pattern 3: Generated Columns — Denormalization Without Hacks
Problem: Computed Fields Needed Constantly
Classic task: you have a users table with first_name and last_name. You often need to search by full name.
Bad solution #1: concatenation in every query
SELECT * FROM users
WHERE LOWER(first_name || ' ' || last_name) LIKE '%john doe%';
-- Slow, index not usedBad solution #2: triggers to update field
-- Lots of code for trigger
-- Easy to forget to update
-- Another source of bugsCorrect solution: Generated Columns
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
-- Generated column (automatically computed)
full_name TEXT GENERATED ALWAYS AS (
first_name || ' ' || last_name
) STORED,
-- Can be indexed!
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Index for fast full name search
CREATE INDEX idx_users_full_name ON users USING GIN (
to_tsvector('english', full_name)
);Usage:
-- Full-text search works instantly
SELECT id, full_name, email
FROM users
WHERE to_tsvector('english', full_name) @@ to_tsquery('english', 'john & doe');
-- Generated column updates automatically
UPDATE users
SET first_name = 'Jane'
WHERE id = 42;
-- full_name automatically becomes "Jane Doe"Real Examples
-- Calculate age from birth date
CREATE TABLE employees (
id BIGSERIAL PRIMARY KEY,
birth_date DATE NOT NULL,
age INTEGER GENERATED ALWAYS AS (
EXTRACT(YEAR FROM AGE(birth_date))
) STORED
);
-- Automatic total calculation for order
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL,
quantity INTEGER NOT NULL CHECK (quantity > 0),
price NUMERIC(10,2) NOT NULL CHECK (price >= 0),
total NUMERIC(10,2) GENERATED ALWAYS AS (
quantity * price
) STORED
);
-- URL slug from title
CREATE TABLE articles (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
slug TEXT GENERATED ALWAYS AS (
LOWER(
REGEXP_REPLACE(
REGEXP_REPLACE(title, '[^a-zA-Z0-9\\s-]', '', 'g'),
'\\s+', '-', 'g'
)
)
) STORED UNIQUE
);
-- JSON field for search (from multiple fields)
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
category TEXT NOT NULL,
search_vector tsvector GENERATED ALWAYS AS (
to_tsvector('english',
COALESCE(name, '') || ' ' ||
COALESCE(description, '') || ' ' ||
COALESCE(category, '')
)
) STORED
);
CREATE INDEX idx_products_search ON products USING GIN (search_vector);STORED vs VIRTUAL
PostgreSQL supports only STORED (computed on INSERT/UPDATE and stored on disk).
Pros of STORED:
- ✅ Fast reads (value already computed)
- ✅ Can be indexed
- ✅ Can be used in constraints
Cons of STORED:
- ❌ Takes disk space
- ❌ Slows INSERT/UPDATE (needs computation)
In MySQL and MariaDB there are VIRTUAL generated columns (computed on the fly during reads). PostgreSQL doesn't support this yet, but it's in the roadmap.
Real Case: Full-Text Search in 5ms
Project: Knowledge base for tech support (2024)
Problem: Search across articles (title + body + tags) took 2-3 seconds. 50k articles, users complaining.
Solution:
ALTER TABLE articles
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('russian', COALESCE(title, '')), 'A') ||
setweight(to_tsvector('russian', COALESCE(body, '')), 'B') ||
setweight(to_tsvector('russian', COALESCE(array_to_string(tags, ' '), '')), 'C')
) STORED;
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);Result:
- ⚡ Search from 2-3 seconds to 5-8ms
- 🎯 Relevance ranking out of the box (title more important than body)
- 🔧 Zero maintenance (everything automatic)
Pattern 4: Row-Level Security — Multi-Tenancy at Database Level
Problem: Every Query Needs tenant_id Filtering
Typical SaaS scenario: you have companies (tenants), each with their own data. In code, EVERY query must contain WHERE company_id = current_user_company.
Problems:
- 🐛 Easy to forget filter → data leak
- 🔒 Security through code (not through DB)
- 📝 Code duplication in every query
Correct solution: Row-Level Security (RLS)
-- Enable RLS for table
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
company_id BIGINT NOT NULL,
title TEXT NOT NULL,
content TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
-- Policy: user sees only their company's documents
CREATE POLICY documents_isolation_policy ON documents
USING (company_id = current_setting('app.current_company_id')::BIGINT);Usage in application:
# Python (psycopg2/psycopg3)
def get_db_connection(user):
conn = psycopg2.connect(...)
cursor = conn.cursor()
# Set user's company_id in session
cursor.execute(
"SET app.current_company_id = %s",
(user.company_id,)
)
return conn
# Now all queries are automatically filtered!
# Forgetting WHERE company_id = ... is IMPOSSIBLE
cursor.execute("SELECT * FROM documents WHERE title LIKE %s", ('%report%',))
# PostgreSQL automatically adds: AND company_id = current_company_idAdvanced Policies
-- Policy for SELECT (read)
CREATE POLICY documents_select_policy ON documents
FOR SELECT
USING (
company_id = current_setting('app.current_company_id')::BIGINT
);
-- Policy for INSERT (only to your company)
CREATE POLICY documents_insert_policy ON documents
FOR INSERT
WITH CHECK (
company_id = current_setting('app.current_company_id')::BIGINT
);
-- Policy for UPDATE (edit only your documents + only authors)
CREATE POLICY documents_update_policy ON documents
FOR UPDATE
USING (
company_id = current_setting('app.current_company_id')::BIGINT
AND created_by = current_setting('app.current_user_id')::BIGINT
);
-- Policy for DELETE (only company admins can delete)
CREATE POLICY documents_delete_policy ON documents
FOR DELETE
USING (
company_id = current_setting('app.current_company_id')::BIGINT
AND current_setting('app.user_role') = 'admin'
);
-- Superusers bypass RLS
ALTER TABLE documents FORCE ROW LEVEL SECURITY;
-- Now even SUPERUSER won't bypass policies (except tables without policies)Real Case: Closing Critical Vulnerability
Project: B2B CRM system (2023)
Problem: Junior developer forgot to add WHERE company_id = ... in one endpoint. Result — company A client saw company B data. GDPR violation, €10k fine.
Solution: Implemented RLS at database level.
-- All tables with client data
ALTER TABLE contacts ENABLE ROW LEVEL SECURITY;
ALTER TABLE deals ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
-- Single policy for all tables
CREATE POLICY tenant_isolation_policy ON contacts
USING (company_id = current_setting('app.current_company_id')::BIGINT);
-- Same for other tablesResult:
- 🔒 Impossible to forget filtering
- 🛡️ Defense in depth (security at DB level)
- ✅ Audit showed: 0 vulnerabilities in 2 years
Performance: RLS adds small overhead (1-3%). But security is more important. If critical — use prepared statements with company_id in parameters.
⚠️ Pitfalls and Gotchas
Pitfall #1: Forgot to set company_id in session
-- ❌ ERROR: current_setting will return NULL or ERROR
SELECT * FROM documents;
-- ERROR: unrecognized configuration parameter "app.current_company_id"
-- ✅ CORRECT: Always set parameter at session start
SET app.current_company_id = 42;
-- Now RLS worksSolution: Middleware in application should ALWAYS set app.current_company_id right after connection.
Pitfall #2: SUPERUSER bypasses RLS (even if you didn't want it)
-- SUPERUSER sees EVERYTHING, ignoring RLS policies
-- Even if you connected as postgres
-- ✅ SOLUTION: FORCE ROW LEVEL SECURITY
ALTER TABLE documents FORCE ROW LEVEL SECURITY;
-- Now even SUPERUSER won't bypass policiesWhen it happened: Migrations executed under SUPERUSER and accidentally created documents in wrong company. Lost 3 hours on rollback.
Pitfall #3: Policies block themselves on INSERT/UPDATE
-- You have policy:
CREATE POLICY select_policy ON documents FOR SELECT
USING (company_id = current_setting('app.current_company_id')::BIGINT);
-- ❌ INSERT doesn't work!
INSERT INTO documents (company_id, title) VALUES (42, 'Test');
-- ERROR: new row violates row-level security policy
-- ✅ SOLUTION: Need separate policy for INSERT
CREATE POLICY insert_policy ON documents FOR INSERT
WITH CHECK (company_id = current_setting('app.current_company_id')::BIGINT);Rule: Create policies for EACH operation: SELECT, INSERT, UPDATE, DELETE.
Pitfall #4: Performance degradation without index on policy key
-- You have policy on company_id
CREATE POLICY isolation ON documents
USING (company_id = current_setting('app.current_company_id')::BIGINT);
-- ❌ SLOW: No index on company_id
-- PostgreSQL does Seq Scan + filtering
-- ✅ FAST: Index on company_id
CREATE INDEX idx_documents_company ON documents (company_id);
-- Now Index ScanMetric: In production without index: 200ms per query. With index: 5ms.
Pitfall #5: RLS doesn't work on direct partition access
-- You have partitioned table with RLS
CREATE TABLE documents (...) PARTITION BY RANGE (created_at);
-- ❌ RLS DOESN'T WORK on direct partition queries!
SELECT * FROM documents_2024_01;
-- Returns ALL rows, ignoring policies
-- ✅ CORRECT: Query parent table
SELECT * FROM documents;
-- RLS applies, then partition pruningSolution: Forbid direct partition access via GRANT/REVOKE.
Pattern 5: Advisory Locks — Distributed Locks Without Redis
Problem: Need Locks Between Processes
Scenario: you have 10 Celery workers processing tasks. Need to guarantee task executes by only one worker.
Bad solution #1: Redis with SETNX
- Another service to maintain
- What if Redis goes down?
- Problem with TTL (task is long → lock expired)
Bad solution #2: SELECT FOR UPDATE
- Locks row in table
- Deadlocks with complex logic
Correct solution: PostgreSQL Advisory Locks
-- Try to get exclusive lock
SELECT pg_try_advisory_lock(12345);
-- Returns true if lock acquired
-- Returns false if already taken
-- Lock with waiting
SELECT pg_advisory_lock(12345);
-- Waits until lock is released
-- Release lock
SELECT pg_advisory_unlock(12345);
-- Automatic release on connection close
-- (no need to manually unlock if process crashes)Real Example: Payment Processing
import psycopg2
import hashlib
def process_payment_exclusively(payment_id):
"""
Process payment only if nobody else is processing it
"""
conn = psycopg2.connect(...)
cursor = conn.cursor()
# Generate numeric ID from payment_id (advisory lock takes bigint)
lock_id = int(hashlib.md5(f"payment:{payment_id}".encode()).hexdigest()[:15], 16)
# Try to acquire lock
cursor.execute("SELECT pg_try_advisory_lock(%s)", (lock_id,))
acquired = cursor.fetchone()[0]
if not acquired:
print(f"Payment {payment_id} already being processed by another worker")
return False
try:
# Process payment
process_payment_logic(payment_id)
conn.commit()
return True
finally:
# Release lock
cursor.execute("SELECT pg_advisory_unlock(%s)", (lock_id,))
conn.close()Practical Examples
-- Lock by two numbers (instead of one bigint)
SELECT pg_try_advisory_lock(user_id, task_id);
-- Shared lock (multiple processes can read)
SELECT pg_try_advisory_lock_shared(12345);
-- Transaction-level lock (released on COMMIT/ROLLBACK)
SELECT pg_try_advisory_xact_lock(12345);
-- Check existing locks
SELECT
locktype,
objid,
mode,
granted,
pid,
pg_blocking_pids(pid) as blocking_pids
FROM pg_locks
WHERE locktype = 'advisory';When to Use Advisory Locks
✅ Use when:
- Distributed cron tasks (only one instance runs)
- Queue processing (uniqueness guarantee)
- Data migrations (avoid concurrent updates)
- Rate limiting (one user → one operation)
❌ Don't use when:
- Short operations (< 100ms) — overhead bigger than benefit
- High contention (> 1000 workers for one lock)
- Need TTL (advisory locks don't expire automatically)
Real Case: Task Deduplication in Queue
Project: Email campaigns for e-commerce (2024)
Problem: Celery sometimes ran same task twice (network glitch + retry). Result — clients got duplicate emails.
Solution:
from celery import Task
import hashlib
class ExclusiveTask(Task):
def __call__(self, *args, **kwargs):
# Generate lock_id from task name + arguments
task_hash = hashlib.md5(
f"{self.name}:{args}:{kwargs}".encode()
).hexdigest()[:15]
lock_id = int(task_hash, 16)
with self.app.pool.acquire() as conn:
cursor = conn.cursor()
cursor.execute("SELECT pg_try_advisory_lock(%s)", (lock_id,))
if not cursor.fetchone()[0]:
self.app.log.warning(f"Task {self.name} already running")
return None
try:
return self.run(*args, **kwargs)
finally:
cursor.execute("SELECT pg_advisory_unlock(%s)", (lock_id,))
@app.task(base=ExclusiveTask)
def send_email(user_id, template):
# Guaranteed to execute only once
passResult:
- 🎯 100% task deduplication
- 📉 Duplicate email complaints dropped to zero
- 🔧 No Redis needed for locks
Pattern 6: Materialized Views — Analytics Without Slowdowns
Problem: Analytical Queries Kill Production Database
Scenario: you have a dashboard with metrics: revenue by month, top products, user activity. Query does JOIN on 5 tables and GROUP BY. Executes for 30 seconds. Users open dashboard → database dies.
Bad solution #1: Cache in Redis
- Complex invalidation logic
- What if Redis goes down?
- Need to write code for cache
Bad solution #2: Separate analytics database
- Replication, synchronization, ETL
- Another service
Correct solution: Materialized Views
-- Create materialized view with aggregated data
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
DATE_TRUNC('month', created_at) as month,
category,
COUNT(*) as orders_count,
SUM(total) as total_revenue,
AVG(total) as avg_order_value,
COUNT(DISTINCT user_id) as unique_customers
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at), category
ORDER BY month DESC, total_revenue DESC;
-- Index for fast access
CREATE INDEX idx_monthly_revenue_month ON monthly_revenue (month);
-- Refresh data (can be via cron)
REFRESH MATERIALIZED VIEW monthly_revenue;
-- Concurrently (without blocking reads)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;Usage:
-- Instant query instead of 30 seconds
SELECT * FROM monthly_revenue
WHERE month >= '2024-01-01'
ORDER BY total_revenue DESC;
-- Execution time: 2ms (was 30s)Advanced Features
-- Materialized view with JOINs
CREATE MATERIALIZED VIEW user_activity_summary AS
SELECT
u.id,
u.email,
u.created_at as registered_at,
COUNT(DISTINCT o.id) as total_orders,
SUM(o.total) as lifetime_value,
MAX(o.created_at) as last_order_date,
COUNT(DISTINCT DATE(o.created_at)) as active_days
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id;
-- For CONCURRENTLY need UNIQUE index
CREATE UNIQUE INDEX idx_user_activity_id
ON user_activity_summary (id);
-- Now can refresh without blocking
REFRESH MATERIALIZED VIEW CONCURRENTLY user_activity_summary;Automatic Refresh
-- Function for refresh
CREATE OR REPLACE FUNCTION refresh_all_matviews()
RETURNS void AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;
REFRESH MATERIALIZED VIEW CONCURRENTLY user_activity_summary;
RAISE NOTICE 'Materialized views refreshed at %', NOW();
END;
$$ LANGUAGE plpgsql;
-- Schedule via pg_cron (extension)
-- Every hour at :05 minutes
SELECT cron.schedule('refresh-matviews', '5 * * * *', 'SELECT refresh_all_matviews()');
-- Or via external cron
-- 5 * * * * psql -d mydb -c "REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue"Real Case: CEO Dashboard from 45s to 50ms
Project: Fintech startup (2024)
Problem: CEO wanted real-time metrics. Dashboard made 12 heavy queries. Page load — 45 seconds. Database under load.
Solution:
-- 1. Materialized view for all metrics
CREATE MATERIALIZED VIEW dashboard_metrics AS
SELECT
'total_users'::text as metric,
COUNT(*)::bigint as value,
NULL::jsonb as details
FROM users
UNION ALL
SELECT
'active_users_30d',
COUNT(DISTINCT user_id),
NULL
FROM events
WHERE created_at > NOW() - INTERVAL '30 days'
UNION ALL
SELECT
'revenue_today',
SUM(amount)::bigint,
jsonb_build_object(
'count', COUNT(*),
'avg', AVG(amount)
)
FROM payments
WHERE DATE(created_at) = CURRENT_DATE
-- ... 9 more metrics
;
-- 2. Refresh every 5 minutes
SELECT cron.schedule(
'refresh-dashboard',
'*/5 * * * *',
'REFRESH MATERIALIZED VIEW CONCURRENTLY dashboard_metrics'
);Result:
- ⚡ Dashboard load: 45s → 50ms
- 📊 Data "near real-time" (5 min delay)
- 🎯 Database load dropped 10x
CONCURRENTLY requires UNIQUE index on materialized view. Without it PostgreSQL will block reads during refresh.
Pattern 7: Partitioning — Terabytes of Data Without Pain
Problem: Table Grew to Hundreds of Gigabytes
Classic story: events table grows 10GB per month. After year — 120GB. After 3 years — 400GB. Queries slow. VACUUM takes hours. Indexes huge.
Correct solution: Table Partitioning
PostgreSQL supports several partition types:
- RANGE — by ranges (dates, numbers)
- LIST — by value list (categories, statuses)
- HASH — by hash (even distribution)
Range Partitioning (most popular)
-- Parent table (partitioned)
CREATE TABLE events (
id BIGSERIAL,
user_id BIGINT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
-- Partitions by month
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE events_2024_03 PARTITION OF events
FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- Indexes created for each partition
CREATE INDEX idx_events_user_id ON events (user_id);
CREATE INDEX idx_events_type ON events (event_type);
-- PostgreSQL automatically creates:
-- idx_events_user_id_2024_01, idx_events_user_id_2024_02, ...Usage:
-- Queries work transparently
INSERT INTO events (user_id, event_type, payload, created_at)
VALUES (12345, 'page_view', '{"url": "/pricing"}', NOW());
-- PostgreSQL automatically puts in correct partition
-- Partition Pruning (automatic optimization)
EXPLAIN ANALYZE
SELECT * FROM events
WHERE created_at >= '2024-03-01'
AND created_at < '2024-04-01';
-- Seq Scan on events_2024_03
-- Reads ONLY needed partition (not all 400GB!)Automatic Partition Creation
-- Function to create future partitions
CREATE OR REPLACE FUNCTION create_monthly_partitions(
table_name TEXT,
start_date DATE,
end_date DATE
)
RETURNS void AS $$
DECLARE
partition_date DATE := start_date;
partition_name TEXT;
start_range TEXT;
end_range TEXT;
BEGIN
WHILE partition_date < end_date LOOP
partition_name := table_name || '_' || TO_CHAR(partition_date, 'YYYY_MM');
start_range := TO_CHAR(partition_date, 'YYYY-MM-DD');
end_range := TO_CHAR(partition_date + INTERVAL '1 month', 'YYYY-MM-DD');
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
partition_name,
table_name,
start_range,
end_range
);
RAISE NOTICE 'Created partition: %', partition_name;
partition_date := partition_date + INTERVAL '1 month';
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Create partitions for year ahead
SELECT create_monthly_partitions('events', '2025-01-01', '2026-01-01');Archiving Old Data
-- Detach old partition from parent table
ALTER TABLE events DETACH PARTITION events_2022_01;
-- Now it's regular table, can:
-- 1. pg_dump and delete
pg_dump -t events_2022_01 mydb > events_2022_01.sql
DROP TABLE events_2022_01;
-- 2. Move to separate schema (archive)
CREATE SCHEMA archive;
ALTER TABLE events_2022_01 SET SCHEMA archive;
-- 3. Move to another databaseList Partitioning (by categories)
CREATE TABLE products (
id BIGSERIAL,
name TEXT NOT NULL,
category TEXT NOT NULL,
price NUMERIC(10,2)
) PARTITION BY LIST (category);
CREATE TABLE products_electronics PARTITION OF products
FOR VALUES IN ('laptop', 'phone', 'tablet');
CREATE TABLE products_clothing PARTITION OF products
FOR VALUES IN ('shirt', 'pants', 'shoes');
CREATE TABLE products_other PARTITION OF products DEFAULT;
-- DEFAULT partition for all other valuesHash Partitioning (even distribution)
-- Useful when no natural partition key
CREATE TABLE user_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id BIGINT NOT NULL,
data JSONB
) PARTITION BY HASH (user_id);
-- 4 partitions (hash mod 4)
CREATE TABLE user_sessions_0 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_sessions_1 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_sessions_2 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_sessions_3 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 3);Real Case: Saving 3TB Database
Project: IoT platform for telemetry (2023)
Problem: sensor_readings table grew to 3TB in 2 years. Queries for last month took 20-30 seconds. VACUUM FULL took 18 hours.
Solution:
-- 1. Created new partitioned table
CREATE TABLE sensor_readings_new (
id BIGSERIAL,
sensor_id BIGINT NOT NULL,
value NUMERIC,
recorded_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (recorded_at);
-- 2. Created monthly partitions (for 3 years)
SELECT create_monthly_partitions('sensor_readings_new', '2021-01-01', '2024-01-01');
-- 3. Data migration (in chunks to avoid killing DB)
-- Script ran for 3 days
DO $$
DECLARE
start_date DATE := '2021-01-01';
end_date DATE;
BEGIN
WHILE start_date < '2024-01-01' LOOP
end_date := start_date + INTERVAL '1 month';
RAISE NOTICE 'Migrating data for %', start_date;
INSERT INTO sensor_readings_new
SELECT * FROM sensor_readings
WHERE recorded_at >= start_date AND recorded_at < end_date;
COMMIT;
PERFORM pg_sleep(60); -- 1 minute pause between batches
start_date := end_date;
END LOOP;
END $$;
-- 4. Switched tables
BEGIN;
ALTER TABLE sensor_readings RENAME TO sensor_readings_old;
ALTER TABLE sensor_readings_new RENAME TO sensor_readings;
COMMIT;
-- 5. Deleted old data (older than 1 year)
DROP TABLE sensor_readings_old_2021_01;
DROP TABLE sensor_readings_old_2021_02;
-- ...Result:
- ⚡ Last month queries: 30s → 200ms
- 💾 Freed 2TB (deleted old partitions)
- 🔧 VACUUM takes 15 minutes instead of 18 hours
- 📈 Archiving old data — one SQL command
Pro tip: Start partitioning IMMEDIATELY if you know table will be large. Migrating existing 500GB+ table is pain and downtime.
⚠️ Pitfalls and Gotchas
Partitioning is powerful but easy to step on rakes. Here are mistakes that cost me hours of debugging:
Pitfall #1: Forgot to create indexes on partitions
-- ❌ WRONG: Index only on parent table
CREATE INDEX ONLY idx_events_user_id ON events (user_id);
-- Creates index only on parent, not on partitions!
-- ✅ CORRECT: Index propagates to all partitions
CREATE INDEX idx_events_user_id ON events (user_id);
-- PostgreSQL automatically creates indexes on all existing and future partitionsWhy important: Without indexes on partitions, queries will do Seq Scan even with correct partition pruning.
Pitfall #2: INSERT without DEFAULT partition
-- You have partitions for 2024-2025
-- Someone tries to insert record for 2026:
INSERT INTO events (user_id, event_type, created_at)
VALUES (123, 'click', '2026-01-01');
-- ERROR: no partition of relation "events" found for row
-- ✅ Solution 1: DEFAULT partition
CREATE TABLE events_default PARTITION OF events DEFAULT;
-- All records not fitting explicit partitions go here
-- ✅ Solution 2: Trigger for automatic partition creation
-- Use pg_partman (extension)Why important: Production crashes at 3 AM when someone tries to insert data "from future".
Pitfall #3: Foreign Keys on/from partitioned tables
-- ✅ FK FROM partitioned table — works
CREATE TABLE events (
id BIGSERIAL,
user_id BIGINT REFERENCES users(id), -- OK
created_at TIMESTAMPTZ
) PARTITION BY RANGE (created_at);
-- ❌ FK TO partitioned table — DOESN'T work (until PG 16)
CREATE TABLE notifications (
id BIGSERIAL PRIMARY KEY,
event_id BIGINT REFERENCES events(id) -- ERROR in PG < 16
);Solution: Upgrade to PostgreSQL 16+ or use FK on specific partitions (pain).
Pitfall #4: VACUUM and ANALYZE on partitions
-- ❌ WRONG: VACUUM only parent table
VACUUM events; -- Doesn't touch partitions!
-- ✅ CORRECT: VACUUM each partition
VACUUM events_2024_01;
VACUUM events_2024_02;
-- Or via automation
-- Or enable autovacuum for each partition
ALTER TABLE events_2024_01 SET (autovacuum_enabled = true);Why important: Without VACUUM partitions bloat, performance drops.
Pitfall #5: Wrong partition key in WHERE
-- You have partitions by created_at
-- ❌ Query WITHOUT partition key
SELECT * FROM events WHERE user_id = 123;
-- Scans ALL partitions (slow)
-- ✅ Query WITH partition key
SELECT * FROM events
WHERE user_id = 123
AND created_at >= '2024-01-01'
AND created_at < '2024-02-01';
-- Scans only one partition (fast)Solution: Always include partition key in WHERE when possible.
Pitfall #6: Deleting large partition (table lock)
-- ❌ SLOW and locks table
DELETE FROM events WHERE created_at < '2022-01-01';
-- If this is 100GB data — database will freeze for hours
-- ✅ FAST: detach and drop partition
ALTER TABLE events DETACH PARTITION events_2021_12; -- instant
DROP TABLE events_2021_12; -- fast, doesn't block parent tableWhy important: DELETE from partitioned table = nightmare for production.
When to Apply Each Pattern
| Pattern | When to use | Don't use if |
|---|---|---|
| JSONB | Flexible schema, frequent attr changes | Critical fields, need foreign keys |
| Partial Indexes | 95%+ data not needed in index | Data distributed evenly |
| Generated Columns | Frequently computed fields | Rare computations, saving space |
| RLS | Multi-tenancy, sensitive data | High performance critical |
| Advisory Locks | Distributed tasks, deduplication | Short operations, need TTL |
| Materialized Views | Heavy analytics, dashboards | Data changes every second |
| Partitioning | Tables > 100GB, archiving | Small tables, no natural key |
Checklist: What to Check in Your Project
Go through this list and you'll find at least 3-5 optimization opportunities:
JSONB:
- Are there tables with 10+ columns where 50%+ are NULL?
- Do you add migrations every time schema changes?
- Are you using EAV (Entity-Attribute-Value) pattern?
Partial Indexes:
- Are there indexes where 90%+ rows have one value?
- Do you use soft delete (deleted_at IS NULL)?
- Do you constantly filter by status (active, pending)?
Generated Columns:
- Are there fields computed in every SELECT?
- Do you use triggers to update computed fields?
- Do you need full-text search?
RLS:
- Do you have multi-tenancy (company_id, organization_id)?
- Do you duplicate WHERE tenant_id = ... in every query?
- Have there been data leak incidents between clients?
Advisory Locks:
- Do you use Redis only for locks?
- Is there task duplication problem in queue?
- Do you need execution exclusivity (cron, migrations)?
Materialized Views:
- Are there queries executing > 5 seconds?
- Do you cache aggregates in Redis/Memcached?
- Are there dashboards with heavy analytics?
Partitioning:
- Are there tables > 50GB?
- Do you need to archive old data?
- Are queries filtered by date/category?
Conclusion: PostgreSQL is More Than Just SQL
These 7 patterns I use in 90% of projects. They've saved production, saved weeks of development, and prevented incidents.
Main lesson: PostgreSQL is not just a relational database. It's a Swiss Army knife that can do:
- NoSQL (JSONB)
- Full-text search (tsvector)
- Distributed locks (advisory locks)
- Multi-tenancy (RLS)
- Time-series data (partitioning)
- And a dozen more things I didn't write about
PostgreSQL Mastery Roadmap: Practical One-Month Plan
Don't try to apply all patterns at once. Here's step-by-step plan for implementation in your project.
Week 1: Foundation — Learn to Measure
Goal: Stop guessing, start measuring.
Tasks:
-
Install pg_stat_statements (if not installed)
- Add
shared_preload_libraries = 'pg_stat_statements'topostgresql.conf - Restart PostgreSQL +
CREATE EXTENSION pg_stat_statements
- Add
-
Find 10 slowest queries in your application
- Use query from "How to Measure Performance" section
- Record their execution time (this is your baseline)
-
Learn to read EXPLAIN ANALYZE
- Take slowest query
- Run
EXPLAIN (ANALYZE, BUFFERS) ... - Find Seq Scan — these are index candidates
-
Measure cache hit ratio
- If < 90% — increase
shared_buffers
- If < 90% — increase
Result: You know WHERE your database hurts. Now can treat it.
Week 2: Quick Wins — Partial Indexes and Generated Columns
Goal: Get first results with minimal effort.
Tasks:
-
Index audit
- Find tables with soft delete (
deleted_at IS NULL) - Find tables with status filtering (
status = 'active') - Create partial indexes
- Find tables with soft delete (
-
Computed fields
- Find queries with
LOWER(email),full_name = first || ' ' || last,DATE(created_at) - Create generated columns + indexes on them
- Find queries with
-
Measure result
- Repeat
EXPLAIN ANALYZEfor optimized queries - Record new numbers (should be 5-10x faster)
- Repeat
Expected result: 30-50% queries became faster. Index sizes decreased.
Week 3: Architectural Patterns — JSONB or Partitioning
Goal: Solve one systemic problem.
Option A: If you have flexible schema
-
NULL values audit
- Find tables where > 50% columns contain NULL
- Candidates for JSONB migration
-
Pilot on dev
- Choose one table
- Migrate optional fields to JSONB
- Create GIN index
-
Measure
- Table size (should decrease)
- Migration speed (should increase)
Option B: If you have large tables (> 100GB)
-
Table size audit
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10; -
Plan partitioning
- Choose candidate table
- Determine partition key (usually date)
- Make migration plan
-
Automation via pg_partman
- Install pg_partman extension
- Set up automatic partition creation
Expected result: One big problem solved. Experience for scaling to other tables.
Week 4: Security and Production Readiness
Goal: Lock in results, increase reliability.
Tasks:
-
RLS for multi-tenant tables (if applicable)
- Audit: find tables with
company_id,organization_id,tenant_id - Create RLS policies
- Test on dev
- Audit: find tables with
-
Advisory locks for critical operations
- Find places needing exclusivity (payment processing, cron tasks)
- Wrap in
pg_try_advisory_lock
-
Set up monitoring
- Connect pg_stat_statements to Grafana/pganalyze/pghero
- Set up alerts for slow queries (> 1s)
- Set up alerts for cache hit ratio (< 90%)
-
Document changes
- Record "before/after" metrics
- Create runbook for team
Expected result: Production more stable, you see problems BEFORE clients.
Month 2 and Beyond: Ecosystem and Depth
PostgreSQL Extensions:
- pg_cron — cron tasks inside PostgreSQL (refresh materialized views, cleanup)
- pg_partman — automatic partition management
- pgvector — vector search for AI/ML (if working with embeddings)
- PostGIS — geo data (if working with maps/locations)
- TimescaleDB — extension for time-series data (if your load is heavily temporal)
Monitoring Tools:
- pganalyze — SaaS for PostgreSQL monitoring (paid but powerful)
- pghero — open-source dashboard for PostgreSQL
- pgBadger — PostgreSQL log analyzer
Advanced Topics:
- MVCC and transactions — understanding isolation levels, SERIALIZABLE, SELECT FOR UPDATE
- Replication — streaming replication, logical replication for microservices
- Connection pooling — PgBouncer, Odyssey (when you have > 100 connections)
- Vacuum tuning — when standard autovacuum can't keep up
Useful Links:
- PostgreSQL Documentation — official documentation
- Use The Index, Luke — index bible
- PostgreSQL Wiki — real cases
Learning PostgreSQL:
If you want to dive deeper into PostgreSQL, I recommend free courses from Postgres Professional:
For application developers:
- DEV1 — basic server-side development course (functions, triggers, procedures)
- DEV2 — advanced course (performance, extensions, security)
- QPT — query optimization (EXPLAIN, indexes, statistics)
For administrators:
- DBA1 — basic administration
- DBA2 — configuration and monitoring
- DBA3 — backup and replication
Each course contains theory (up to 1 hour) and practical tasks. Materials provided free — great way to level up knowledge without costs.
Bonus Section: Monitoring and Alerting — Seeing Problems Before Clients
"The database crashed Saturday morning. Nobody knew. Clients found out first."
This happened in 2022. Disk filled up due to uncontrolled WAL log growth. Database stopped. We had no alerts on disk space. Lost 4 hours of uptime and client trust.
Lesson: Query optimization is 50% of the work. The other 50% is monitoring that will warn you BEFORE everything breaks.
In this section — practical PostgreSQL monitoring setup. No theory, only what I use in production.
Three Levels of PostgreSQL Monitoring
Effective monitoring works on three levels:
1. Infrastructure level (system metrics)
- Disk I/O, CPU, RAM, network
- Disk space (data + WAL)
- Connection count
- Replication lag (if there are replicas)
2. Database level (database metrics)
- Query performance (slow queries, p95/p99 latency)
- Cache hit ratio
- Table/index bloat
- Transaction rate, commits/rollbacks
3. Application level (application metrics)
- Connection errors
- Query timeouts
- Queue depth (if any)
Rule: Monitor all three levels. Problem can be anywhere.
pg_stat_statements: Deep Query Analysis
Installation and Configuration
pg_stat_statements is a mandatory extension for production. Without it you're flying blind.
Step 1: Enable in postgresql.conf
# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
# pg_stat_statements settings
pg_stat_statements.max = 10000 # Maximum records (default 5000)
pg_stat_statements.track = all # Track all queries (top/all)
pg_stat_statements.track_utility = on # Including DDL (CREATE, DROP, etc.)
pg_stat_statements.track_planning = on # PostgreSQL 13+: planning time
pg_stat_statements.save = on # Save between restartsStep 2: Restart PostgreSQL
sudo systemctl restart postgresql
# Or via pg_ctl
pg_ctl restart -D /var/lib/postgresql/dataStep 3: Create extension
-- Connect to database
\c mydb
-- Create extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Check
SELECT * FROM pg_stat_statements LIMIT 5;Top 10 Slowest Queries (by average time)
SELECT
substring(query, 1, 80) as short_query,
calls,
ROUND(mean_exec_time::numeric, 2) as avg_ms,
ROUND(total_exec_time::numeric, 2) as total_ms,
ROUND((total_exec_time / sum(total_exec_time) OVER ()) * 100, 2) as percent_total,
ROUND(stddev_exec_time::numeric, 2) as stddev_ms,
ROUND(max_exec_time::numeric, 2) as max_ms
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;What to look for:
avg_ms— average time (main metric)percent_total— percent of all database time (find bottleneck)stddev_ms— variance (if large — query is unstable)calls— call frequency
Queries Consuming Most Time (total impact)
SELECT
substring(query, 1, 80) as short_query,
calls,
ROUND(total_exec_time::numeric, 2) as total_ms,
ROUND(mean_exec_time::numeric, 2) as avg_ms,
ROUND((total_exec_time / sum(total_exec_time) OVER ()) * 100, 2) as percent_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Real case (2024):
Found query with percent_total = 62%. It was SELECT with LIKE '%search%' without index. Added full-text search (tsvector + GIN) — database load dropped 3x.
Queries with High Variance (unstable)
SELECT
substring(query, 1, 80) as short_query,
calls,
ROUND(mean_exec_time::numeric, 2) as avg_ms,
ROUND(stddev_exec_time::numeric, 2) as stddev_ms,
ROUND((stddev_exec_time / NULLIF(mean_exec_time, 0)) * 100, 2) as cv_percent
FROM pg_stat_statements
WHERE calls > 100 -- Enough calls for statistics
ORDER BY stddev_exec_time DESC
LIMIT 10;Why important: If cv_percent > 50% — query is unstable. Reasons:
- Cold vs hot cache
- Different data sizes (WHERE user_id = ...)
- Query plan change (statistics outdated)
Solution: Check statistics (ANALYZE), check indexes.
Reset Statistics (for clean measurement)
-- Reset all pg_stat_statements data
SELECT pg_stat_statements_reset();
-- Reset for specific query (PostgreSQL 12+)
SELECT pg_stat_statements_reset(userid, dbid, queryid);Common mistake: Looking only at mean_exec_time, ignoring
total_exec_time. Slow query called once per day is not a problem. Fast query
called 100k times per hour — that's the bottleneck.
auto_explain: Automatic Slow Query Logging
auto_explain is an extension that automatically writes EXPLAIN ANALYZE for slow queries to the log.
Installation and Configuration
Step 1: postgresql.conf
# Enable auto_explain
shared_preload_libraries = 'pg_stat_statements,auto_explain'
# auto_explain settings
auto_explain.log_min_duration = 1000 # Log queries > 1 second (in ms)
auto_explain.log_analyze = on # Run ANALYZE (real numbers)
auto_explain.log_buffers = on # Show shared buffers
auto_explain.log_timing = on # Show timing for each node
auto_explain.log_triggers = on # Include triggers
auto_explain.log_verbose = off # VERBOSE mode (lots of output)
auto_explain.log_nested_statements = on # Log nested statements
auto_explain.log_format = text # Format: text/json/xml/yamlStep 2: Restart PostgreSQL
sudo systemctl restart postgresqlStep 3: Check logs
# Logs usually in /var/log/postgresql/
tail -f /var/log/postgresql/postgresql-15-main.logExample Output in Log
2024-12-21 15:30:45 UTC LOG: duration: 1234.567 ms plan:
Query Text: SELECT * FROM orders WHERE user_id = 12345 ORDER BY created_at DESC LIMIT 10;
Limit (cost=0.42..8.44 rows=10 width=120) (actual time=1205.123..1234.456 rows=10 loops=1)
Buffers: shared hit=12000 read=45000
-> Index Scan Backward using orders_created_at_idx on orders (cost=0.42..80000.00 rows=100000 width=120) (actual time=1205.120..1234.450 rows=10 loops=1)
Filter: (user_id = 12345)
Rows Removed by Filter: 8945620
Buffers: shared hit=12000 read=45000
Planning Time: 0.234 ms
Execution Time: 1234.678 ms
What we see:
- Query ran 1234ms (> threshold 1000ms)
- Index Scan, but filter removed 8.9M rows (index inefficient!)
shared read=45000— many disk reads (bad cache hit)
Solution: Create composite index (user_id, created_at) instead of separate on created_at.
Production Configuration (best practices)
# Production: log only very slow queries
auto_explain.log_min_duration = 5000 # 5 seconds
# Enable sample_rate to reduce load
auto_explain.sample_rate = 0.1 # Log 10% queries (PostgreSQL 13+)
# Disable timing to reduce overhead
auto_explain.log_timing = off # Less overhead on productionWhy sample_rate: On high-load systems auto_explain can create 5-10% overhead. Sample rate reduces load while keeping statistical significance.
Pro tip: For development set auto_explain.log_min_duration = 100
(100ms). This helps find potential problems before production.
Slow Query Log: Classic Approach
PostgreSQL can log slow queries directly (without auto_explain).
Configuration in postgresql.conf
# Slow query logging
log_min_duration_statement = 1000 # Log queries > 1 second
# Log details
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_statement = 'none' # Don't log all queries
log_duration = off # Don't log duration of all queries
log_checkpoints = on # Log checkpoints
log_connections = on # Log connections
log_disconnections = on # Log disconnections
log_lock_waits = on # Log locks > deadlock_timeoutReload configuration (without restart):
SELECT pg_reload_conf();Log Analysis with pgBadger
pgBadger is the best tool for PostgreSQL log analysis.
Installation:
# Ubuntu/Debian
sudo apt install pgbadger
# macOS
brew install pgbadger
# Or via CPAN
cpan install pgBadgerUsage:
# Analyze today's log
pgbadger /var/log/postgresql/postgresql-15-main.log
# Analyze multiple files
pgbadger /var/log/postgresql/postgresql-*.log -o report.html
# Incremental analysis (for large logs)
pgbadger --incremental --outdir /var/www/reports/ /var/log/postgresql/*.logWhat pgBadger shows:
- Top 10 slowest queries
- Top 10 queries by frequency
- Query time distribution (histogram)
- Queries by categories (SELECT/INSERT/UPDATE/DELETE)
- Lock waits
- Temporary files
- Checkpoints
- Connections timeline
Real case (2023):
pgBadger showed that 78% temp files created by one report with ORDER BY on 5M rows. Increased work_mem from 4MB to 64MB — temp files disappeared, query 10x faster.
pg_stat_user_tables and pg_stat_user_indexes: Usage Statistics
Unused Indexes (candidates for deletion)
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0 -- Index never used
AND indexrelname NOT LIKE '%pkey' -- Except primary keys
ORDER BY pg_relation_size(indexrelid) DESC;Result: List of indexes taking space but not used.
Action: Delete them (but first check via EXPLAIN on production-like data).
-- Delete unused index
DROP INDEX CONCURRENTLY idx_unused; -- CONCURRENTLY = without table lockReal case (2024):
Found 12 unused indexes total size 4GB. Deleted — freed space, INSERT/UPDATE 15% faster.
Tables with Bad Cache Hit Ratio
SELECT
schemaname,
tablename,
heap_blks_read,
heap_blks_hit,
CASE
WHEN heap_blks_hit + heap_blks_read = 0 THEN 0
ELSE ROUND((heap_blks_hit::numeric / (heap_blks_hit + heap_blks_read)) * 100, 2)
END as cache_hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY cache_hit_ratio ASC, heap_blks_read DESC
LIMIT 20;What to look for:
cache_hit_ratio < 90%— table often read from diskheap_blks_readlarge — many disk operations
Solutions:
- Increase
shared_buffers(if RAM allows) - Check indexes (might be Seq Scan instead of Index Scan)
- Partition table (if it's huge)
Tables with Large Bloat (need VACUUM)
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
ROUND((n_dead_tup::numeric / NULLIF(n_live_tup, 0)) * 100, 2) as dead_ratio,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000 -- More than 1000 dead rows
ORDER BY dead_ratio DESC, n_dead_tup DESC
LIMIT 20;What to look for:
dead_ratio > 20%— table bloated, needs VACUUMlast_autovacuum IS NULL— autovacuum not working (bad sign!)
Solution:
-- Manual VACUUM for specific table
VACUUM ANALYZE table_name;
-- Or aggressive VACUUM FULL (requires exclusive lock)
VACUUM FULL table_name; -- CAUTION: locks table!Alerting: What Metrics to Monitor
Critical Alerts (urgent response)
| Metric | Threshold | Why critical |
|---|---|---|
| Disk space | > 85% full | Database will crash at 100% |
| Connection count | > 80% of max | New connections will be rejected |
| Replication lag | > 60 seconds | Replica lagging, data inconsistent |
| Long transactions | > 10 minutes | Blocks VACUUM, creates bloat |
| Lock wait time | > 30 seconds | Deadlock or contention |
| Cache hit ratio | < 85% | Too many disk operations |
| Query p95 latency | > 1 second | Users see slowdowns |
| Checkpoint frequency | > 5 per minute | Too frequent checkpoints — bad configuration |
Warning Alerts (plan actions)
| Metric | Threshold | Action |
|---|---|---|
| Table bloat | > 30% | Schedule VACUUM FULL |
| Index bloat | > 50% | Recreate index (REINDEX) |
| Unused indexes | idx_scan = 0 | Consider deletion |
| Temp files | > 100MB/hour | Increase work_mem |
| Sequential scans | Often on big tables | Add indexes |
Example Alerts for Prometheus + Alertmanager
groups:
- name: postgresql
interval: 30s
rules:
# Critical: Disk space
- alert: PostgreSQLDiskSpaceHigh
expr: (node_filesystem_avail_bytes{mountpoint="/var/lib/postgresql"} / node_filesystem_size_bytes) < 0.15
for: 5m
labels:
severity: critical
annotations:
summary: "PostgreSQL disk space < 15% ({{ $value }}%)"
# Critical: Connection count
- alert: PostgreSQLTooManyConnections
expr: (pg_stat_database_numbackends / pg_settings_max_connections) > 0.8
for: 5m
labels:
severity: critical
annotations:
summary: "PostgreSQL connections > 80% of max"
# Critical: Replication lag
- alert: PostgreSQLReplicationLag
expr: pg_replication_lag > 60
for: 5m
labels:
severity: critical
annotations:
summary: "PostgreSQL replication lag > 60s ({{ $value }}s)"
# Warning: Cache hit ratio
- alert: PostgreSQLLowCacheHitRatio
expr: pg_stat_database_blks_hit / (pg_stat_database_blks_hit + pg_stat_database_blks_read) < 0.85
for: 15m
labels:
severity: warning
annotations:
summary: "PostgreSQL cache hit ratio < 85% ({{ $value }}%)"
# Warning: Long running queries
- alert: PostgreSQLLongRunningQueries
expr: pg_stat_activity_max_tx_duration > 600
for: 5m
labels:
severity: warning
annotations:
summary: "PostgreSQL query running > 10 minutes"Monitoring Tools
1. pganalyze (SaaS, paid)
Features:
- Automatic metrics collection (pg_stat_statements, logs)
- Index Advisor (index recommendations)
- Query Performance Insights
- EXPLAIN visualization
- Alerting out of the box
Price: From $99/month per server
Pros:
- ✅ Zero setup (connect and it works)
- ✅ Beautiful UI
- ✅ Historical data
Cons:
- ❌ Paid
- ❌ Data with third party
When to use: If no time to set up monitoring manually and have budget.
2. pghero (Open Source)
Installation (Docker):
docker run -p 8080:8080 -e DATABASE_URL=postgres://user:pass@host:5432/dbname ankane/pgheroFeatures:
- Dashboard with key metrics
- Slow queries
- Index usage
- Table bloat
- Query suggestions
Pros:
- ✅ Open source
- ✅ Simple installation
- ✅ Beautiful UI
Cons:
- ❌ No alerting (need to integrate with Prometheus/Grafana)
- ❌ No historical data by default
When to use: For quick database audit or as addition to Prometheus.
3. Prometheus + postgres_exporter + Grafana (Open Source)
Install postgres_exporter:
# Docker
docker run -d \
-p 9187:9187 \
-e DATA_SOURCE_NAME="postgresql://user:password@localhost:5432/mydb?sslmode=disable" \
prometheuscommunity/postgres-exporter
# Or via systemd
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.15.0/postgres_exporter-0.15.0.linux-amd64.tar.gz
tar xvfz postgres_exporter-0.15.0.linux-amd64.tar.gz
sudo mv postgres_exporter /usr/local/bin/Prometheus configuration (prometheus.yml):
scrape_configs:
- job_name: "postgresql"
static_configs:
- targets: ["localhost:9187"]Grafana Dashboard:
Import ready dashboard: PostgreSQL Database Dashboard
Pros:
- ✅ Open source
- ✅ Full control
- ✅ Flexible alerting
- ✅ Integration with other metrics (system, application)
Cons:
- ❌ Complex setup
- ❌ Requires infrastructure maintenance
When to use: For serious production systems that already have Prometheus/Grafana.
Real Case: How Monitoring Saved Production
Project: Marketplace for property rentals (2023)
Problem:
Monday morning (9:00 AM) users started complaining about slowdowns. Pages loading 10-15 seconds instead of usual 0.5s.
Diagnosis via monitoring:
-
Grafana: Spike in
query p95 latency(200ms → 8000ms) -
pg_stat_statements:
SELECT substring(query, 1, 80), mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 5;Result:
SELECT * FROM listings WHERE ...grew from 50ms to 7000ms -
auto_explain logs:
Seq Scan on listings (cost=0.00..185432.00 rows=50 width=1200) (actual time=245.123..6840.456 rows=3 loops=1) Filter: ((status = 'active') AND (city = 'Moscow')) Rows Removed by Filter: 8945632Diagnosis: Seq Scan instead of Index Scan. Why?
-
Check indexes:
SELECT indexname, idx_scan FROM pg_stat_user_indexes WHERE tablename = 'listings' AND indexrelname LIKE '%status%';Result:
idx_listings_statushasidx_scan = 0(wasn't used!) -
EXPLAIN on problem query:
EXPLAIN ANALYZE SELECT * FROM listings WHERE status = 'active' AND city = 'Moscow';Plan showed: index ignored due to outdated statistics (
n_distinctwrong).
Solution:
-- Updated statistics
ANALYZE listings;
-- Recreated index just in case
REINDEX INDEX CONCURRENTLY idx_listings_status;Result:
- ⏱️ Query latency: 7000ms → 35ms (200x faster)
- 🎯 Problem found and solved in 20 minutes
- 📊 Added alert on
query p95 > 1s
Without monitoring: we'd spend hours debugging. With monitoring — 20 minutes.
⚠️ Pitfalls and Gotchas of Monitoring
Pitfall #1: pg_stat_statements not installed on production
-- ❌ Learned this during incident
SELECT * FROM pg_stat_statements;
-- ERROR: extension "pg_stat_statements" is not available
-- ✅ Check BEFORE problem happens
SELECT * FROM pg_extension WHERE extname = 'pg_stat_statements';
-- If empty — install NOWWhen it happened: Production problem in 2022. Couldn't find slow query because pg_stat_statements wasn't installed.
Solution: Always install pg_stat_statements right after creating database.
Pitfall #2: auto_explain with too low threshold killed production
# ❌ DANGEROUS: Logs ALL queries > 10ms
auto_explain.log_min_duration = 10
# On high-load system this created:
# - 50GB logs in 2 hours
# - Disk full
# - Database crashSolution:
# ✅ SAFE: Log only really slow
auto_explain.log_min_duration = 5000 # 5 seconds for production
auto_explain.sample_rate = 0.1 # 10% queries (PostgreSQL 13+)Rule: On production start with high threshold (5-10 seconds), then lower as needed.
Pitfall #3: Monitoring didn't show problem because queries through connection pooler
Problem:
- We had PgBouncer in transaction pooling mode
- pg_stat_statements showed
user=pgbouncerfor all queries - Impossible to tell which application generating slow queries
Solution:
# PgBouncer config
application_name_add_host = 1 # Adds hostname to application_nameAnd in application:
# Python (psycopg2)
conn = psycopg2.connect(
"...",
application_name="my-app-worker-1" # Explicitly set application name
)Now in pg_stat_statements visible:
SELECT usename, application_name, query, calls
FROM pg_stat_activity;Pitfall #4: Alerts fired too often (alert fatigue)
Problem:
Set up alert on query p95 > 100ms. Result: 50 alerts per day, team started ignoring.
Solution:
-
Increased threshold:
query p95 > 500ms(5x slower than normal) -
Added
forclause in Prometheus:- alert: PostgreSQLSlowQueries expr: pg_query_p95 > 500 for: 10m # Fires only if problem lasts > 10 minutes -
Split alerts by severity:
critical— requires immediate action (disk full, replication lag)warning— need to look within day (cache hit ratio, bloat)
Rule: Better 1 accurate alert than 100 false ones.
Pitfall #5: Didn't monitor WAL disk space
Problem:
PostgreSQL stores WAL logs separately. Our WAL was on separate disk (SSD for speed). Disk filled up → database stopped.
Solution:
# Monitor WAL directory
df -h /var/lib/postgresql/15/main/pg_wal
# Alert in Prometheus
node_filesystem_avail_bytes{mountpoint="/var/lib/postgresql/15/main/pg_wal"}Best practice: Monitor disk space on ALL PostgreSQL directories:
/var/lib/postgresql/(data).../pg_wal/(WAL logs)- Tablespaces (if any)
Checklist: Minimum Production Monitoring
Use this checklist to audit your monitoring:
Basic metrics (MUST HAVE):
- pg_stat_statements installed and configured
- Disk space monitored (data + WAL)
- Connection count monitored
- Cache hit ratio > 90%
- Query p95 latency < 500ms
- Replication lag < 10s (if have replicas)
Logging (recommended):
- auto_explain enabled (threshold 5-10 seconds)
- log_min_duration_statement = 1000
- log_lock_waits = on
- log_checkpoints = on
Alerting (critical):
- Disk space > 85% → critical alert
- Connection count > 80% max → critical alert
- Replication lag > 60s → critical alert
- Query p95 > 1s → warning
Tools (choose at least one):
- pghero (quick start)
- pganalyze (SaaS, paid but convenient)
- Prometheus + Grafana (for serious systems)
- pgBadger (log analysis)
Processes:
- Weekly review slow queries (pg_stat_statements)
- Monthly review unused indexes
- Quarterly review table/index bloat
Conclusion: Monitoring is Insurance
PostgreSQL monitoring is not "nice to have", it's must have for any production project.
Three maturity levels of monitoring:
Level 1: Reactive (😰)
- Learn about problems from clients
- No metrics, no logs
- Debugging takes hours
Level 2: Basic monitoring (😌)
- pg_stat_statements installed
- Disk space and connection count monitored
- Find problems in minutes
Level 3: Proactive (😎)
- Automatic alerts on critical metrics
- Regular review of slow queries
- See problems BEFORE clients
Aim for level 3.
Good luck with optimization! If you have questions — write in comments or Telegram.
P.S. This article is part of a database series. Next article will cover replication and high availability for PostgreSQL (streaming replication, logical replication, failover). Subscribe not to miss!

