MOFU Engineering Guide · 2026

Scaling a SaaS Database: The Right Order of Operations

Indexing, connection pooling, read replicas, partitioning, caching, and sharding — in the order you should actually reach for them. Most teams scale in the wrong sequence and pay for it. Here is the playbook we run.

By Bill Beltz, founder of QUANT LAB USA INC · Published June 3, 2026

Quick answer

Scale a SaaS database in this order: fix indexes and query plans, add connection pooling, scale the instance up, add read replicas, partition large tables, cache hot reads — and only shard as a last resort. The early wins are cheap and the late ones are expensive, so the sequence matters enormously. Most SaaS companies never need to shard; a single well-indexed primary with pooling and read replicas carries them for years. Doing the hard things before the easy things is the most common and costly mistake in database scaling.

When a SaaS gets slow, the instinct is to reach for the dramatic fix: "we need to shard" or "we need to move to a new database." Nine times out of ten the real problem is a missing index on a hot query, and the dramatic fix would have cost six months to solve a problem a six-minute CREATE INDEX would have closed. Scaling is an order-of-operations discipline before it is anything else.

This guide is the sequence we run on every SaaS platform engagement, from the first slow-query alert to the rare day sharding is genuinely warranted. It pairs with our engine comparison, Postgres vs MySQL for SaaS, and our multi-tenant Postgres RLS guide. For the basics, see what SaaS and multi-tenant SaaS mean.

The scaling ladder

Step	Solves	Cost / complexity
1. Indexing & query tuning	Slow queries, high CPU	Low — biggest ROI
2. Connection pooling	Connection exhaustion	Low
3. Vertical scaling	General headroom	Low effort, rising cost
4. Read replicas	Read-heavy load	Medium
5. Partitioning	Huge tables, retention	Medium
6. Caching	Repeated hot reads	Medium (invalidation)
7. Sharding	Write & storage ceiling	High — last resort

Step 1: indexing and query tuning

This is where 80% of early scaling problems live and where the cheapest wins are. Start by finding the queries that hurt. In Postgres, pg_stat_statements ranks queries by total time; in MySQL, the slow query log plus the performance schema does the same.

-- Postgres: top queries by total execution time
SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Then explain the worst offender to see the plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM invoices WHERE tenant_id = $1 AND status = 'open';

A Seq Scan on a large table in that plan is the smoking gun. The fix is usually a composite index whose column order matches the query's filter and sort. Lead with the highest-selectivity equality column, then range or sort columns:

-- Composite index that supports the filter above
CREATE INDEX CONCURRENTLY idx_invoices_tenant_status
  ON invoices (tenant_id, status, created_at DESC);

Use CREATE INDEX CONCURRENTLY in production so you do not lock writes. A handful of well-chosen composite indexes routinely cut P95 latency by an order of magnitude — far more than any hardware upgrade would. This is always step one.

Step 2: connection pooling

Postgres opens a full OS process per connection. A few hundred direct connections — easy to hit with multiple app instances or serverless functions — consumes gigabytes of RAM and degrades the whole server. A pooler sits between your app and the database and multiplexes many client connections onto a small set of real ones.

PgBouncer in transaction-pooling mode is the standard. It hands a real connection to a transaction, then returns it to the pool when the transaction ends.
PgCat and Supavisor add load balancing and replica routing on top of pooling.
Serverless app tiers need pooling badly, because each function invocation can open a connection. Without a pooler you exhaust the limit under modest traffic.

One caveat that bites multi-tenant teams: transaction-pooling mode does not preserve session state across transactions, which interacts with the SET LOCAL tenant variable used for row-level security. The fix is to set the variable inside each transaction — we detail it in the multi-tenant RLS guide. MySQL's threaded model tolerates more connections, but pooling still pays off at scale.

Step 3: scale up before you scale out

Vertical scaling is unfashionable and underrated. Moving from a 4-core instance to a 32-core instance with more RAM is a config change and a short maintenance window — no application changes, no distributed-systems complexity. Modern managed Postgres and MySQL run very large primaries, and RAM that fits your working set in the buffer cache is the single biggest lever after indexing.

The reason to scale up before scaling out is that every scale-out technique (replicas, partitioning, sharding) adds permanent complexity to every future feature. A bigger box adds none. Buy headroom with hardware until the cost curve or a specific bottleneck forces a smarter approach. We size instances against the working set during cloud infrastructure engagements rather than guessing.

Mid-post: get your slow query log read

Database creaking under load? We start every performance engagement by reading your slow query log and EXPLAIN plans. Free 30-minute diagnostic call.

Step 4: read replicas

Most SaaS workloads are read-heavy — dashboards, lists, reports, search. A read replica is a synchronized full copy of the primary that serves read-only queries, so you can move that traffic off the primary without changing your data model.

The one thing to design for is replication lag. A replica is usually milliseconds behind, but a user who just wrote a record and immediately reads it from a lagging replica sees stale data. The standard patterns:

Read-your-writes routing: send a user's reads to the primary for a short window after they write.
Analytics to replicas: route heavy reporting and export queries to replicas where slight staleness is fine.
Replica-aware ORM config: a routing layer that splits reads and writes, with explicit overrides for consistency-critical paths.

Replicas scale reads cleanly and are far simpler than sharding. They are the right answer to "our reads are overwhelming the primary," and they buy enormous headroom before any harder technique is needed.

Step 5: partitioning large tables

When a single table reaches tens or hundreds of millions of rows — typically event logs, audit trails, or time-series — index maintenance and vacuum get expensive and full scans get slow. Partitioning splits one logical table into physical chunks by a key, while the application still queries one table name.

-- Postgres: range-partition an events table by month
CREATE TABLE events (
  id          bigint,
  tenant_id   uuid,
  created_at  timestamptz NOT NULL,
  payload     jsonb
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_06 PARTITION OF events
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');

Two big wins. Queries that filter on the partition key only touch relevant partitions — partition pruning — so a query for last week never scans last year. And data retention becomes DROP TABLE events_2024_01, an instant operation, instead of a slow, bloat-inducing bulk DELETE. Partitioning is a single-database technique and dramatically simpler than sharding; reach for it well before you consider splitting across servers.

Step 6: caching the hot path

Caching reduces the load that ever reaches the database. The cheapest, highest-value caching targets read-heavy, change-rarely data: reference tables, feature configuration, per-tenant settings, and expensive aggregations that do not need to be real-time.

Application cache (Redis): store computed results keyed by query inputs, with a TTL and an explicit invalidation on the write that changes them.
HTTP / CDN cache: for public or semi-public read endpoints, cache upstream of the app entirely.
Materialized views: precompute heavy aggregations and refresh on a schedule — but mind tenant isolation, since a materialized view can snapshot across tenants.

The hard part of caching is never the cache; it is invalidation. A cache that serves stale data is worse than no cache because the bug is invisible until a customer notices. Tie every cached entry to a clear invalidation trigger. Our application layer is usually Next.js, so we lean on its caching primitives too — covered in our Next.js 16 App Router guide.

Step 7: when to actually shard

Sharding splits data horizontally across multiple databases — tenants A to M here, N to Z there — so writes and storage distribute across machines. It is the only thing on this list that scales writes past a single primary, and it is also the most expensive thing you can do to a codebase.

Shard only when all of these are true:

You have already indexed, pooled, scaled up, added replicas, and partitioned.
The write volume or storage on a single primary is the proven, measured bottleneck — not a guess.
You have a natural shard key (usually tenant_id) that keeps almost all queries within one shard.
The team is ready to live with cross-shard queries, distributed transactions, and rebalancing forever.

For multi-tenant SaaS, the cleanest sharding is by tenant, which keeps each tenant's data on one shard and makes most queries single-shard. Tools like Citus (Postgres) and Vitess (MySQL) manage the mechanics. But be honest: the vast majority of SaaS companies never reach this point. A vertically scaled primary with read replicas and partitioning carries most products for their entire life. Sharding is a real tool for a real ceiling — not a default, and not a flex.

Real-world example: a SaaS hitting its first wall

A representative engagement: a B2B SaaS whose dashboard had crept to multi-second loads as data grew. The reflexive ask was "do we need to shard?" The slow query log said otherwise. Three missing composite indexes were driving sequential scans on the two largest tables; adding them with CONCURRENTLY cut P95 from roughly 3 seconds to under 250 milliseconds. We then put PgBouncer in front to stop connection exhaustion from the serverless tier, and routed reporting queries to a read replica. No partitioning, no sharding, no new database — just the first three rungs of the ladder, in order.

That is the typical shape: the dramatic fix is almost never the right first move. For the engine-selection decision underneath all of this, see Postgres vs MySQL for SaaS, and for the broader build picture, the 2026 state of custom software development.

Frequently asked questions

What is the right order of operations for scaling a database?

Fix the query plans and indexes first, because most early performance problems are missing or wrong indexes, not a lack of hardware. Then add connection pooling so you stop exhausting database connections. Then scale up the instance (vertical) before you scale out. Then add read replicas to offload reads. Then partition large tables. Caching can come in anywhere it helps. Sharding is the last resort, only when a single primary genuinely cannot hold the write volume. Doing these out of order — sharding before indexing, say — wastes months and money.

When do I need connection pooling?

Almost immediately on Postgres, and especially with serverless or many app instances. Postgres uses a process per connection, so a few hundred direct connections can consume gigabytes of RAM and stall the server. A pooler like PgBouncer or PgCat multiplexes thousands of client connections onto a small pool of real database connections. Serverless functions make this critical because each invocation can open a connection — without a pooler you exhaust the limit under modest load. MySQL's threaded model is more forgiving but still benefits from pooling at scale.

What is the difference between a read replica and a shard?

A read replica is a full copy of the database that stays in sync with the primary and serves read-only queries, which offloads read traffic without changing your data model. Every replica holds all the data. A shard is a horizontal split: each shard holds a different subset of the data (for example, tenants A to M on shard 1, N to Z on shard 2), and writes are distributed across shards. Replicas scale reads and are easy; shards scale writes and storage but add major application complexity. Reach for replicas long before shards.

When should I partition a table?

Partition when a single table grows large enough that index maintenance, vacuum, and full scans become painful — often in the tens to hundreds of millions of rows for time-series or event data. Partitioning splits one logical table into physical chunks by a key, usually a date range or a hash. Queries that filter on the partition key only touch relevant partitions (partition pruning), and you can drop old partitions instantly instead of running slow bulk deletes. It is a within-one-database technique and far simpler than sharding across databases.

Do I need to shard my SaaS database?

Probably not, and not for a long time. A single modern Postgres or MySQL primary on capable hardware comfortably handles tens of thousands of transactions per second and terabytes of data once indexing, pooling, replicas, and partitioning are in place. Most SaaS companies never outgrow a vertically scaled primary with read replicas. Sharding adds cross-shard queries, distributed transactions, rebalancing, and operational complexity that slows every future feature. Shard only when you have exhausted vertical scaling and the write volume on a single primary is the proven bottleneck.

How does caching fit into database scaling?

Caching reduces the load that ever reaches the database. The cheapest win is application-level query caching for hot, rarely-changing data — reference tables, configuration, expensive aggregations — usually in Redis or an in-memory layer with explicit invalidation. HTTP and CDN caching cut requests further upstream. The discipline is invalidation: a cache that serves stale data is worse than no cache. Cache read-heavy, change-rarely data first, and always have a clear invalidation trigger tied to the write that changes it.

Can QUANT LAB USA help scale our database?

Yes. We run database performance engagements that start with query and index analysis, add pooling and replicas, and partition or cache where the data demands it — and we will tell you honestly if you do not need to shard. We work primarily in Postgres and MySQL with Next.js and TypeScript application layers. A focused performance review is fixed-scope; ongoing scaling work fits our retainer model. Book a call below and we will look at your slow query log.

Your database is slowing down.

Free 30-minute diagnostic. We will read your slow query log, look at your EXPLAIN plans, and map the cheapest path back to fast — usually without sharding.

Or call Bill directly at (770) 652-1282