SaaS Architecture · 2026

Event-Driven Architecture for SaaS: A 2026 Guide

Events decouple your services and absorb spiky load — and they introduce a new class of bugs around duplication, ordering, and the dual-write problem. This guide covers when event-driven design earns its keep, and the patterns that keep it honest in production.

By Bill Beltz, Founder & Principal EngineerPublished June 3, 202613 min read

Quick answer

Use event-driven architecture when independent reactions to a state change — email, indexing, billing, analytics — should not block or couple to the main request. Solve the dual-write problem with the transactional outbox pattern, make every consumer idempotent because delivery is at-least-once, and treat exactly-once as a myth you engineer around with deduplication. Choose orchestration for complex workflows you must see and test, choreography for simple decoupled reactions. For a single small app, a background job is usually clearer than a broker.

Event-driven architecture is powerful and frequently over-applied. The decoupling is real, but so is the cost: an asynchronous flow is harder to trace, test, and debug than a function call. We design SaaS platforms for a living, and our platform engineering practice reaches for events deliberately, not by default. If your async work is really just deferred tasks, you may want a queue instead — see background jobs and queues in production.

1. Events, commands, and when to use which

An event is an immutable fact — OrderPlaced — that the producer emits without knowing or caring who reacts. A command is a directed request to do something — CapturePayment — aimed at a specific handler. Confusing the two couples services that should be independent.

Emit events for facts that have already happened; multiple consumers may react in their own way.
Send commands when you need a specific thing done and care about the outcome.
Name events in the past tense and version their schema — consumers you do not control depend on the shape.

2. The dual-write problem and the outbox

The single most common event-driven bug: you commit a row to your database and then publish an event to a broker as two separate steps. A crash in between either loses the event (data committed, no event) or duplicates it (event sent, transaction rolled back). The transactional outbox closes the gap.

// Write the business change and the event in ONE transaction
await db.transaction(async (tx) => {
  await tx.orders.insert(order);
  await tx.outbox.insert({
    id: crypto.randomUUID(),
    type: "OrderPlaced",
    payload: JSON.stringify(order),
    published: false,
  });
});

// A separate relay polls the outbox and publishes, then marks sent.
// The event ships only if the order actually committed.

A relay process (polling or change-data-capture) reads unpublished outbox rows, pushes them to the broker, and marks them sent. The event is published if and only if the business data committed — no dual write, no lost or phantom events.

3. Idempotent consumers and at-least-once delivery

Real brokers deliver at least once. Retries and redeliveries mean every consumer will eventually see a duplicate. Idempotency is not optional — it is the price of admission.

// Dedupe on event id; the unique constraint makes the check atomic
async function onEvent(evt) {
  try {
    await db.processedEvents.insert({ id: evt.id }); // unique PK
  } catch (e) {
    if (isUniqueViolation(e)) return; // already handled — skip
    throw e;
  }
  await handle(evt); // safe: runs at most once per event id
}

Record processed event IDs and skip duplicates, or make the side effect naturally repeatable (an upsert, a set-to-value).
Stop treating "exactly-once delivery" as achievable — aim for at-least-once delivery plus exactly-once processing.
Route poison messages that keep failing to a dead-letter queue so one bad event does not block the stream — the same pattern covered in our queues guide.

4. Ordering, choreography, and orchestration

Ordering is the next trap. Most brokers guarantee order only within a partition, so events for the same aggregate must share a partition key (e.g. the order ID) if their sequence matters. Across partitions, assume no global order.

For multi-step business processes, choose your coordination style deliberately:

Choreography. Each service reacts and emits; the workflow is emergent. Loosely coupled but the end-to-end flow is implicit and hard to observe.
Orchestration (saga). A central coordinator drives each step and runs compensating actions on failure. Complex flows become explicit, testable, and recoverable.
Use a saga for anything money-touching or multi-service where a half-finished process is unacceptable — the compensation logic is the point.

Mid-post: most teams need a queue before a broker

Event-driven architecture is the right tool for genuinely decoupled reactions — and overkill for deferred tasks in a single app. Want help deciding which you actually need? Book a free scoping call.

Event-driven tradeoffs at a glance

Concern	What it buys / costs
Decoupling	Independent scaling and deploys; harder end-to-end tracing
Delivery	At-least-once; you must build idempotency
Consistency	Eventual, not immediate; outbox prevents lost events
Ordering	Per-partition only; key by aggregate when sequence matters
Coordination	Choreography decouples; orchestration makes flows visible

Operational practices that hold over time

Asynchronous systems fail quietly; instrumentation is what makes them operable:

Trace across the boundary. Propagate a correlation ID through every event so you can reconstruct a flow that spans services — see observability for startups.
Version events from day one. Add fields, never repurpose them; consumers you do not control will break otherwise.
Watch consumer lag. A consumer falling behind is the early warning that capacity or a poison message is about to become an incident.

If your events drive notifications or third-party callbacks, the signature-verification and idempotency patterns in our Stripe webhook security guide apply directly.

Frequently asked questions

What is event-driven architecture?

Event-driven architecture is a style where services communicate by publishing and reacting to events — immutable records that something happened, such as OrderPlaced or PaymentCaptured — rather than calling each other directly. A producer emits an event without knowing who consumes it, and any number of consumers react independently. This decouples services in time and in dependency, letting them scale, fail, and deploy on their own schedules. The tradeoff is that the overall flow becomes harder to trace and reason about.

When should a SaaS use event-driven architecture?

Reach for events when you have genuinely independent reactions to a state change — sending email, updating search indexes, billing, analytics — that should not block or couple to the main request. It also fits when teams need to evolve services independently or when you must absorb spiky load by buffering work. For a small app with one team and a single database, a direct function call or a simple background job is usually clearer and cheaper than the operational weight of a broker.

What is the transactional outbox pattern?

The outbox pattern solves the dual-write problem: you cannot atomically write to your database and publish to a message broker in one transaction, so a crash between the two loses or duplicates events. Instead, you write the event into an outbox table in the same database transaction as the business change. A separate relay process reads unpublished rows and pushes them to the broker, marking them sent. The event is published if and only if the business data committed.

Why must event consumers be idempotent?

Because real message systems deliver at least once, not exactly once. Network retries, redeliveries after a crash, and broker semantics all mean a consumer will occasionally see the same event twice. An idempotent consumer produces the same result whether it processes an event once or five times — typically by recording processed event IDs and skipping duplicates, or by making the side effect naturally repeatable. Without idempotency, retries double-charge customers and double-send emails.

Is exactly-once delivery possible?

Exactly-once delivery over a network is effectively a myth; what systems offer is at-least-once delivery plus exactly-once processing through idempotency and deduplication. Some brokers advertise exactly-once semantics, but they achieve it with idempotent producers and transactional reads within their own boundary — the moment your consumer has an external side effect, you are responsible for making that effect idempotent. Design as if every event can arrive more than once, because it can.

What is the difference between choreography and orchestration?

In choreography, each service reacts to events and emits its own, with no central coordinator — the workflow emerges from local rules. It is loosely coupled but the end-to-end process is implicit and hard to see. In orchestration, a central coordinator (often a saga or workflow engine) explicitly directs each step and handles compensation when something fails. Orchestration makes complex, long-running business processes visible and testable; choreography keeps simple reactions decoupled. Many systems use both.

Sources & references

[1]microservices.io — Transactional Outbox pattern · Chris Richardson
[2]microservices.io — Saga pattern · Chris Richardson
[3]Apache Kafka — Design and delivery semantics · Apache Kafka
[4]AWS — Event-driven architecture · Amazon Web Services

Decouple deliberately. Build it to be debuggable.

We design event-driven systems with the outbox, idempotency, and tracing built in — and we'll tell you honestly when a queue is the better answer. Book a free scoping call.