How is a data warehouse different from a database?

An application database (OLTP) is optimized for many small reads and writes — login, checkout, click. A data warehouse (OLAP) is optimized for fewer, much larger analytical queries across years of history, using columnar storage and parallel execution.

What is the difference between a data warehouse and a data lake?

A warehouse stores structured, schema-enforced data ready for SQL. A lake stores raw files of any shape — JSON, Parquet, video, logs — without enforcing a schema upfront. Modern "lakehouses" like Databricks blur the line.

What are the main data warehouses today?

Snowflake, Google BigQuery, Amazon Redshift, Databricks, and ClickHouse cover the majority of new deployments. Each has tradeoffs in cost model, ecosystem, and operational style.

When does a startup need a data warehouse?

Usually when (a) analytics queries start slowing the production app, (b) the team starts wanting to join data across Stripe, the app, marketing, and support, or (c) finance or investors are asking for reports the app database cannot produce in reasonable time.

Can Postgres be a data warehouse?

For small data — tens of millions of rows — Postgres with proper indexes is often fine. Past that, columnar warehouses become dramatically faster and cheaper to run, and the migration becomes worth the effort.

Glossary · Software

What is a Data Warehouse?

A data warehouse is a database engineered for analytical workloads — long, wide queries across years of history — so the business can ask hard questions of its data without bringing the production application to its knees.

Where the idea came from

Bill Inmon coined "data warehouse" in the 1990s and defined the basic shape: a subject-oriented, integrated, time-variant, and non-volatile collection of data. Translated out of textbook English — a single place to put everything your business has ever recorded, organized by what the business cares about (customers, orders, sessions) rather than by which application emitted the data. Ralph Kimball added the dimensional modeling approach that made warehouses queryable by non-engineers. The cloud era — Redshift in 2012, BigQuery in 2010, Snowflake in 2014 — made warehouses cheap enough that a Series A company could run one.

Why a separate database

The application database — the Postgres or MySQL instance running your product — is optimized for the kind of work an app does: write a row when a user signs up, read a row when they log in, update a balance when they pay. That workload is small, fast, and constant. The analytical workload is the opposite: read forty million rows, group by month and country, join four tables. Mixing the two on the same instance is how product engineers learn the phrase "the database is on fire" — a finance analyst running a quarterly report can lock tables, blow the page cache, and slow every customer login.

The warehouse is a separate copy of the data, refreshed on a schedule (often hourly or nightly), stored in columnar format so that analytical scans are 10 to 100 times faster, and sized for the analytical workload independently. The price is some staleness — the warehouse is rarely up-to-the-second — and a pipeline to keep it fresh.

The modern data stack around it

A warehouse never lives alone. A modern data stack typically has: extraction tools (Fivetran, Airbyte, custom scripts) that pull data out of Postgres, Stripe, HubSpot, Google Ads, and dozens of other sources; the warehouse itself (Snowflake, BigQuery, Databricks, or ClickHouse); a transformation layer (dbt is the standard) that turns raw landed tables into clean, modeled tables; and a visualization tool on top (Metabase, Looker, Hex, Tableau) where business users actually answer questions. The warehouse is the middle of that stack, not the whole of it.

When you actually need one

Three pretty reliable triggers. First, an analyst query takes more than a minute on the production database, or has caused a customer-facing slowdown. Second, the team starts asking questions that require joining the app, Stripe, the marketing platform, and the support tool — none of which share a database. Third, a board deck or investor update needs a report you cannot produce in twenty minutes because the data lives in seven places. Any one of those is enough; two of them is the sign you have already waited too long.

At QUANT LAB

Our cloud infrastructure practice builds warehouses for clients whose product database is no longer enough — usually BigQuery or Snowflake, with Fivetran or custom Python jobs for extraction and dbt for transformation. We also help trading and quant clients build research warehouses that store tick-level history and make backtests reproducible.

For early-stage SaaS companies we usually recommend starting with a single dbt project pointed at a managed warehouse rather than over-engineering a Kafka-based pipeline. Read our piece on multi-tenant SaaS with Postgres for how the application database is structured before the warehouse stage, and book a call if you want a one-hour review of your current analytics setup.

Long-form deep-dives that use this term

All posts

Related terms

Outgrowing the app database?

We design analytics warehouses sized to your data and your team — not the over-engineered Kafka monster the consultant pitched.

Cloud infrastructure