What is MLOps in one sentence?

MLOps is the discipline of deploying, monitoring, and maintaining machine-learning models in production reliably — DevOps practices extended to cover data and models, not just code.

How is MLOps different from DevOps?

DevOps ships code. MLOps also has to version data and models, handle retraining, and monitor for the model silently getting worse as the real world drifts away from its training data.

Drift is when a model's accuracy decays over time because the live data no longer matches what it was trained on. Unlike a code bug, it produces no error — just quietly worse predictions.

Do I need MLOps for an LLM feature?

Yes, in spirit. Even when you call a hosted model, you still need versioned prompts, evaluation sets, monitoring, and a rollback path — the same operational discipline under a different name.

What tools are used for MLOps?

Experiment trackers like MLflow or Weights & Biases, feature stores, model registries, pipeline orchestrators, and monitoring tools — but tools follow the practice, they do not replace it.

Glossary · Data & AI

What is MLOps?

MLOps is what it takes to run machine learning in production without it quietly falling apart. It extends the automation and discipline of DevOps to cover the messy extras that models bring — versioned data, trained artifacts, evaluation, monitoring for silent decay, and a repeatable path to retrain and redeploy. It is the difference between a demo and a system you can trust.

Why models need their own operations

A traditional application is deterministic: the same input yields the same output until someone changes the code. A machine-learning system has three moving parts — code, data, and the trained model — and any of them can change the behavior. The training data shifts, the model is retrained, a hyperparameter is tuned, and suddenly the output is different with no code change at all. MLOps exists because you cannot manage that with the same assumptions you bring to ordinary software.

MLOps vs. DevOps

MLOps inherits everything from DevOps — automated builds, continuous deployment, infrastructure as code — and adds the parts unique to learning systems. You version not just code but datasets and model artifacts, so any prediction can be traced back to the exact data and weights that produced it. You treat retraining as a first-class pipeline, not a manual chore. And you monitor a failure mode that has no equivalent in normal software: a model that throws no errors yet steadily gets worse.

The lifecycle

A mature MLOps loop runs continuously: ingest and validate data; engineer features (often through a feature store so training and serving stay consistent); train and track experiments so results are reproducible; evaluate against a held-out set and gate deployment on the metrics; deploy behind the same safeguards as any service; then monitor inputs, outputs, and accuracy in production. When monitoring detects drift, the loop kicks off again. The whole thing rests on a reliable data engineering foundation.

The drift problem

The most insidious MLOps failure is model drift. The world moves — customer behavior shifts, fraud patterns evolve, a product launches — and the live data drifts away from what the model learned. The model keeps returning confident answers; they are just increasingly wrong. There is no stack trace, no 500 error, nothing to page on unless you are explicitly watching prediction quality. Catching drift requires monitoring the distribution of inputs and outputs and comparing live performance to a baseline, which is why observability is core to MLOps rather than an afterthought.

LLMOps is the same discipline

The rise of hosted large language models does not let you skip operations — it just relabels them. You no longer train the model, but you still version prompts, maintain evaluation sets, monitor cost and latency per token, guard against prompt injection, and keep a rollback path when a provider changes a model under you. Whether you call a fine-tuned model or a hosted one, the operational discipline is the same.

At QUANT LAB

We treat an AI or ML feature as a system to be operated, not a model to be demoed. On AI integration work that means an evaluation harness before launch, versioned prompts or models, monitoring for drift and cost, and a clean rollback path — the same engineering rigor our DevOps practice brings to any production service. The flashy part is the model; the part that keeps it useful six months later is MLOps.

Long-form deep-dives that use this term

All posts

Related terms

Putting a model into production?

We build the evaluation, monitoring, and retraining pipeline that keeps an ML or AI feature trustworthy long after launch. Book a 30-minute call.

DevOps engineering