What is an LLM in one sentence?

A large language model is a neural network trained on huge amounts of text to predict the next token, which lets it generate, summarize, translate, and reason over natural language.

A token is a chunk of text — roughly a word or part of a word — that the model reads and produces one at a time. Pricing, context limits, and speed are all measured in tokens.

What is a context window?

The context window is the maximum number of tokens the model can consider at once, covering both your input and its output. Exceed it and earlier content is dropped.

Why do LLMs hallucinate?

An LLM predicts plausible text, not verified facts. When it lacks the right information it still produces fluent output, which can be confidently wrong unless grounded with retrieval.

Are all LLMs the same?

No. They differ in size, training data, context window, speed, cost, and how they were tuned. Closed models like GPT and Claude and open ones like Llama each suit different use cases.

Glossary · Data & AI

What is an LLM (Large Language Model)?

A large language model is a neural network with billions of parameters trained on enormous quantities of text to do one deceptively simple thing: predict the next token. From that single objective emerges the ability to write, summarize, translate, answer questions, and follow instructions — the capability behind nearly every AI product shipping in 2026.

Next-token prediction

At its core an LLM is a very good autocomplete. Given a sequence of tokens, it outputs a probability distribution over what comes next, picks one, appends it, and repeats. Trained on a large fraction of the public internet, books, and code, this turns into something far richer than autocomplete: to predict the next word in a math proof, a legal clause, or a Python function, the model has to internalize structure, grammar, facts, and patterns of reasoning. Scale — more data, more parameters, more compute — is what made the behavior leap from toy to useful.

Tokens and context windows

LLMs do not read characters or whole words; they read tokens — chunks roughly the size of a word or word-fragment. Two numbers govern almost every practical decision. The context window is how many tokens the model can hold at once, spanning both your prompt and its reply; exceed it and the oldest content falls out of view. Token count also drives cost and latency, since providers bill per token and longer prompts run slower. Designing an AI feature is partly the art of fitting the right information into that window.

From base model to assistant

A raw pretrained model just continues text; it will not reliably follow instructions. The chat assistants people use are the result of additional training: instruction tuning teaches the model to follow requests, and reinforcement learning from human feedback (RLHF) shapes it toward helpful, honest, harmless responses. You can also adapt a model to your own domain with fine-tuning, which adjusts behavior and tone rather than teaching new facts.

Why they hallucinate

Because the objective is plausibility, not truth, an LLM will happily produce a fluent, confident, fabricated answer when it lacks the facts. It has no built-in sense of "I do not know." The standard defense is to ground the model in real data using retrieval-augmented generation, which retrieves relevant documents and inserts them into the prompt so the model summarizes real sources instead of guessing. Citations and evaluation close the loop.

A new class of security risk

LLMs blur the line between data and instructions: text the model reads can change what the model does. That opens prompt injection, a class of attack with no equivalent in traditional software, where a malicious document or web page hijacks the model's behavior. Any application that lets an LLM read untrusted content or take actions needs to treat the model's output as untrusted and bound its permissions accordingly.

At QUANT LAB

We help teams put LLMs to work without the hype tax. Our AI integration engagements start by separating what an LLM is genuinely good at — summarization, extraction, drafting, classification — from what it is not, and choosing the right model for the cost, latency, and privacy the use case demands. We pair the model with retrieval for grounding, an evaluation harness for quality, and clear permission boundaries for safety. The model is the easy part; the system around it is the work.

Long-form deep-dives that use this term

All posts

Related terms

Putting an LLM into your product?

We design LLM features that are grounded, evaluated, and safe to ship — choosing the right model for your cost and privacy needs. Book a 30-minute call.

AI integration services