What is a service mesh in one sentence?

A service mesh is an infrastructure layer that manages communication between services using sidecar proxies, handling retries, encryption, load balancing, and observability without changes to application code.

What is a sidecar proxy?

A sidecar is a small proxy deployed alongside each service instance. All of that service's inbound and outbound traffic flows through its sidecar, which is where the mesh applies its policies.

How is a service mesh different from an API gateway?

A gateway governs north-south traffic entering the system from outside. A service mesh governs east-west traffic — calls services make to each other internally. Many systems run both.

Do I need a service mesh?

Most teams do not until they have many services and real pain around security, retries, or visibility between them. For a handful of services, a mesh's operational overhead usually outweighs the benefit.

What are common service mesh implementations?

Istio and Linkerd are the best-known, both built around Envoy-style proxies. Consul Connect and AWS App Mesh are other options. Some teams use lighter sidecar-less approaches as the technology matures.

Glossary · APIs

What is a Service Mesh?

A service mesh is a dedicated infrastructure layer that takes over the messy job of service-to-service communication — encryption, retries, timeouts, load balancing, and observability — by routing every call through a small sidecar proxy next to each service, so that capability lives in the platform instead of being re-coded into every application.

The problem it addresses

Once a system splits into many microservices, the network between them becomes the hard part. Every service needs retries with backoff, timeouts, circuit breaking, mutual authentication, and tracing. Building all of that into each service, in each language, is duplicative and inconsistent. A service mesh moves those concerns out of the app and into a uniform layer that every service shares — without anyone editing business logic.

The sidecar pattern

The mesh works by attaching a lightweight proxy — the sidecar — to each service instance. All traffic in and out of the service passes through its sidecar, and the sidecars together form the data plane. Because the proxy intercepts every call, it can transparently retry a failed request, enforce a timeout, or encrypt the connection, while the application code believes it is making an ordinary local call.

Data plane and control plane

A mesh has two halves. The data plane is the fleet of sidecar proxies actually moving traffic. The control plane is the brain that configures them — distributing policy, certificates, and routing rules. You change a rule once in the control plane (say, "encrypt all traffic between these services" or "send 5% of requests to the new version") and it propagates to every sidecar. This separation is what makes the mesh manageable at scale.

Security: mTLS by default

One of the strongest reasons to adopt a mesh is mutual TLS between services. The mesh can issue and rotate certificates and encrypt every internal call automatically, so traffic inside the cluster is authenticated and confidential — a concrete step toward zero trust on the network. Doing this by hand across dozens of services is the kind of toil that quietly never gets finished; the mesh makes it the default rather than the exception.

Mesh vs gateway

A mesh handles east-west traffic — services talking to each other inside the cluster. An API gateway handles north-south traffic — requests arriving from the outside world. They solve different problems and frequently run together: the gateway at the edge, the mesh internally. Confusing the two leads teams to reach for a mesh when a gateway would do, which is a costly mistake.

When it is overkill

A service mesh is real operational weight: extra proxies, a control plane to run and upgrade, more moving parts to debug, and a learning curve. For a handful of services it is almost always overkill — you will spend more time operating the mesh than you save. It earns its place when you have many services, a strong need for uniform security and retries, and a platform team that can own it. Adopting one too early is a classic way to drown a small team in infrastructure.

At QUANT LAB

We are conservative about meshes. Most of the products we build do not have enough services to justify one, and we will say so rather than sell complexity. When a client genuinely runs a large Kubernetes estate with real east-west security needs, our API development and platform work covers the rollout deliberately — start with mTLS and observability, prove value, then expand routing and policy. The goal is reliability you can reason about, not a maximalist diagram.

Long-form deep-dives that use this term

All posts

Related terms

Weighing a service mesh?

We help teams decide whether a mesh is worth the weight, and roll it out deliberately when it is. Book a 30-minute call.

API development