Skip to main content
QuantLab Logo

Vulnerability Management · 2026

How to Build a Vulnerability Management Program

A scanner finds thousands of issues. A program decides which few hundred actually matter, fixes them on a clock, and proves it to an auditor. Here is the full lifecycle — inventory, scanning, prioritization, remediation, SLAs, and metrics — built the way it survives a SOC 2, PCI, or ISO 27001 review.

Bill Beltz, Founder & Principal Engineer
By , Founder & Principal EngineerPublished 13 min read

Quick answer

A vulnerability management program is a continuous loop: inventory every asset, scan it on a defined cadence, prioritize findings by real-world risk, remediate against severity-based SLAs, and measure the result. The single most important decision is how you prioritize. Patching by raw CVSS score is a losing game — most CVEs score 7+ and you cannot fix everything. Prioritize by real-world exploitability instead: anything in the CISA KEV catalog or with a high EPSS probability, on an internet-facing asset, goes to the front of the line. A documented, operating loop like this is exactly the evidence SOC 2, PCI DSS, and ISO 27001 auditors ask to see.

Most teams think they have a vulnerability management program because they run a scanner. They have a scanner. The program is everything around it — and that is where the gaps live. The scanner returns four thousand findings; nobody has the time to fix four thousand things, so the list rots, the criticals age out, and the next breach turns out to have been finding number 312 the whole time.

A real program does three jobs the scanner cannot: it knows what to scan (inventory), it knows what to fix first (prioritization), and it proves the work got done on time (SLAs and metrics). Get those right and the compliance evidence falls out for free. For the difference between a scan and a deeper assessment, see pen test vs vulnerability scan.

1. Asset inventory and discovery

You cannot manage what you cannot see. Every serious post-incident review eventually finds the same root cause: the vulnerable thing was not in the inventory, so it was never scanned, so nobody knew it was there. Inventory is not glamorous, but it is the floor the entire program stands on. If coverage is 70 percent, then 30 percent of your attack surface is invisible no matter how good the rest of the pipeline is.

A modern inventory is not a spreadsheet someone updates quarterly — it is assembled continuously from authoritative sources:

  • Cloud APIs: pull every EC2 instance, container, Lambda, S3 bucket, and managed database directly from AWS, Azure, or GCP so the inventory tracks reality as infrastructure scales up and down.
  • Network discovery: active and passive scans that catch the unmanaged device, the shadow-IT box, the forgotten staging server someone exposed to the internet.
  • Agents and endpoint tooling: EDR and configuration-management data to cover laptops and ephemeral workloads that come and go.
  • Source of business context: a CMDB or tag taxonomy that records who owns each asset, what data it touches, and whether it is internet-facing — the context that makes prioritization possible later.

Tag every asset with criticality and exposure at discovery time. That metadata is what turns a flat list of 4,000 findings into a ranked queue in step three.

2. Scanning: authenticated, agent-based, and on a cadence

Not all scans are equal, and the defaults are usually the weak ones. The choices that matter:

  • Authenticated vs unauthenticated: an unauthenticated scan sees what an outside attacker sees from the network. An authenticated scan logs in and reads installed package versions, registry keys, and config — it finds dramatically more, with far fewer false positives. Run authenticated scans wherever you can; treat unauthenticated as the external-attacker view, not the primary signal.
  • Agent-based vs network: network scanners reach across the wire and need connectivity and credentials. Agents live on the host and report continuously, which is the only way to keep up with autoscaling cloud fleets and laptops that are rarely on the corporate network. Most mature programs run both.
  • Cadence: continuous (agent and cloud-config), weekly for internet-facing assets, monthly for internal infrastructure — plus on-change. A scan triggered by every deploy or infrastructure change catches the regression the same day it ships, not three weeks later.

One thing scanning is not: a penetration test. A scanner compares an asset against a database of known CVEs. It does not chain findings, exploit business logic, or prove that a vulnerability is actually reachable and damaging. Scanning is breadth; a pentest is depth. You need both, and they answer different questions — see pen test vs vulnerability scan for the full breakdown, and penetration test cost in 2026 for budgeting.

3. Prioritization and triage: why CVSS alone fails

This is the step that separates a program from a backlog. The Common Vulnerability Scoring System (CVSS) gives every flaw a 0–10 severity score across three metric groups: base (the intrinsic characteristics that do not change), temporal/threat (exploit maturity over time), and environmental (how it applies in your specific deployment). CVSS v4.0 sharpened these and split the score into Base, Threat, Environmental, and Supplemental groups; v3.1 is still everywhere in scanner output.

Here is the trap: almost everyone uses the base score and stops. Base scores cluster high — the majority of published CVEs land at 7.0 or above — so “fix all the criticals and highs” is a queue of thousands you can never clear. Raw CVSS measures theoretical severity, not the likelihood anyone will exploit the flaw against you. Layer two exploitability signals on top:

  • CISA KEV catalog: a binary, authoritative list of CVEs with confirmed active exploitation in the wild. If a finding is in KEV, it is being used by attackers right now — it jumps the queue regardless of its CVSS number.
  • EPSS: a daily-updated probability (0–1) that a CVE will be exploited in the next 30 days. A CVSS 9.8 with an EPSS of 0.02 is far less urgent than a CVSS 7.5 with an EPSS of 0.7.

Then weight by asset criticality and exposure. The same CVE on an internet-facing production API is a different problem than on an isolated internal box behind three firewalls. Combine all four inputs — CVSS, KEV, EPSS, and exposure — into a single decision rule:

function priority(finding, asset):
  if finding.cve in CISA_KEV:
    return P1                        # confirmed in-the-wild exploitation
  if finding.epss > 0.5 and asset.internetFacing:
    return P1                        # likely to be exploited, externally reachable
  if finding.cvss >= 9.0 and asset.criticality == "crown_jewel":
    return P1
  if finding.epss > 0.2 or (finding.cvss >= 7.0 and asset.internetFacing):
    return P2
  if finding.cvss >= 7.0:
    return P3
  return P4                          # track, batch into routine patch cycles

That rule typically collapses a 4,000-finding scan into a few dozen P1s and a few hundred P2s — a queue a team can actually clear. For application-layer flaws, cross-reference the OWASP Top 10 and API security best practices.

4. Remediation: patch, mitigate, or accept

Once a finding is prioritized, there are only three valid outcomes — and “ignore it” is not one of them:

  • Patch: apply the vendor fix or upgrade. The default and the goal. The hard part is rarely the patch itself — it is the test-and-deploy pipeline that lets you ship it safely and fast, which is why a clean CI/CD path is a security control, not just a developer convenience.
  • Mitigate with a compensating control: when you cannot patch immediately — a vendor has no fix, or the change is high-risk — reduce exposure another way. Put the asset behind a WAF rule, restrict it with a firewall or network segmentation, disable the vulnerable feature, or add detection. The vulnerability still exists; the path to exploiting it is narrowed.
  • Risk acceptance with sign-off: sometimes the right business call is to accept the risk — but it must be explicit, time-boxed, and signed by a named owner with the authority to own that risk. An accepted risk gets a documented justification, an expiry date, and a review. An informal “we’ll get to it” is not risk acceptance; it is the thing auditors and incident reviews punish.

The discipline is that every P1 and P2 finding lands in exactly one of these buckets with a record attached. That record is your audit trail.

5. Remediation SLAs by severity and exposure

An SLA is the clock that keeps remediation honest. Without one, “we’ll patch it” has no deadline and never happens. The key insight: the clock should depend on both severity and exposure. A critical on an internet-facing asset is an emergency; the same critical on an air-gapped internal system is merely urgent. A defensible baseline:

SeverityInternet-facing SLAInternal SLA
Critical (KEV / P1)7 days14 days
High15 days30 days
Medium30 days60 days
Low90 days180 days
Emergency (active mass exploitation)24–72 hours72 hours

The exact numbers are negotiable; the structure is not. What an auditor — and a board — actually wants to see is that the SLA is written down, tied to exposure, and measured. Pick numbers your team can realistically hit, then enforce them. A 7-day critical SLA you miss 40 percent of the time is worse than a 14-day SLA you hit.

Put a real program in place

Drowning in scanner output and not sure which findings actually matter? We will assess your current coverage, build the KEV/EPSS prioritization model, and set SLAs that survive a SOC 2 or PCI audit. Free 30-minute scoping call.

6. Reporting, metrics, and the continuous loop

A program you cannot measure is a program you cannot defend — not to leadership, not to an auditor. Four metrics carry most of the weight:

  • Mean time to remediate (MTTR): the average time from detection to fix, broken out by severity. Trending MTTR down for criticals is the single clearest sign the program is working.
  • SLA adherence: the percentage of findings closed inside their window. This is the discipline metric — the gap between your policy and your reality.
  • Scan coverage: the share of the asset inventory actually being scanned. Coverage below 100 percent is a direct measure of your blind spots, and it ties step six back to step one.
  • Recurring-finding rate: vulnerabilities that reappear after being marked fixed. A high rate means a broken patch process, a stale golden image, or a deploy pipeline that reintroduces the flaw — a root-cause signal a raw open-count never shows.

Reporting up to leadership is a translation job. A board does not want a list of 4,000 CVEs; it wants a one-page trend: are critical findings aging down, is coverage approaching 100 percent, are we inside SLA, and where is the residual risk we have formally accepted. Frame it as risk reduction over time, not raw counts.

All of this closes the loop. New findings feed prioritization, remediation updates the metrics, the metrics expose gaps in coverage, and coverage gaps send you back to inventory. That continuous cycle — not any single scan — is what SOC 2 (CC7.1/CC7.2), PCI DSS (Requirements 6 and 11), and ISO 27001 (Annex A 8.8) are actually auditing. NIST SP 800-40r4 frames the same lifecycle for patch and vulnerability management. Build the loop and the compliance evidence is a byproduct. For the audit angle, see how to prepare for a SOC 2 audit and cybersecurity services for SaaS startups.

Frequently asked questions

What is the difference between vulnerability scanning and vulnerability management?

Scanning is one step; vulnerability management is the full lifecycle around it. A scan produces a list of findings, but management is what you do with that list: maintaining an asset inventory so you know what to scan, prioritizing findings by real-world risk, driving remediation against time-bound SLAs, and measuring whether the program is actually reducing exposure. A team that runs weekly scans but never closes the loop does not have a vulnerability management program — it has a backlog. The program is the governance, the SLAs, and the metrics that turn raw scanner output into reduced risk.

Why is raw CVSS not enough to prioritize vulnerabilities?

CVSS measures the theoretical severity of a flaw, not the likelihood that anyone will exploit it in your environment. Most published CVEs score 7.0 or higher, so a CVSS-only program tells you to drop everything for thousands of findings, which is impossible. Worse, CVSS ignores whether a working exploit exists in the wild and whether the affected asset is internet-facing or buried on an internal segment. Layering CISA KEV (is it being exploited right now?) and EPSS (what is the probability it will be?) on top of CVSS lets you patch the few hundred vulnerabilities that actually put you at risk before the thousands that probably never will.

What are CISA KEV and EPSS, and how do they fit together?

The CISA Known Exploited Vulnerabilities (KEV) catalog is an authoritative list of CVEs with confirmed, active exploitation in the wild — a binary, high-confidence signal that something is a present danger. EPSS (the Exploit Prediction Scoring System from FIRST) is a daily-updated probability, from 0 to 1, that a given CVE will be exploited in the next 30 days. KEV tells you what is on fire today; EPSS tells you what is most likely to catch fire next. Used together with asset criticality, they let you rank a flood of CVSS-7-and-above findings into a short, defensible work queue.

What remediation SLAs should I set by severity?

A common, defensible baseline ties the clock to both severity and exposure: critical internet-facing in 7 days, high in 15, medium in 30, low in 90, with internal-only assets given roughly double the window. Critical findings that appear in the CISA KEV catalog often get an accelerated SLA regardless of where they live. The exact numbers matter less than three things: that they are written down, that they are tied to asset exposure, and that you measure adherence. Auditors care that you have a documented, risk-based SLA and evidence you meet it — not that you copied a specific number from a blog post.

What metrics should a vulnerability management program report?

The core four are mean time to remediate (MTTR) by severity, SLA adherence (the percentage of findings closed within their window), scan coverage (the share of the asset inventory actually being scanned), and recurring-finding rate (vulnerabilities that come back after being marked fixed). MTTR shows speed, SLA adherence shows discipline, coverage shows blind spots, and recurring-finding rate exposes broken patch processes or golden images. For leadership, roll these into a one-page trend: are critical findings aging down over time, and is coverage approaching 100 percent? That narrative is far more useful than a raw count of open vulnerabilities.

Does a vulnerability management program satisfy SOC 2, PCI DSS, and ISO 27001?

Yes — a documented, operating program is exactly what these frameworks ask for. SOC 2 maps to common criteria like CC7.1 (detecting and monitoring for vulnerabilities) and CC7.2; PCI DSS Requirement 6 and 11 demand ranked vulnerability identification, internal and external scanning, and timely remediation; ISO 27001 Annex A control 8.8 requires technical vulnerability management end to end. Auditors look for the same evidence in every case: a current asset inventory, a defined scan cadence, a risk-based prioritization method, time-bound SLAs, and records proving you actually remediate and track exceptions. The program is not a compliance side-quest — it is the control those frameworks are auditing.

Stop managing a backlog. Run a program.

QUANT LAB USA builds vulnerability management programs that prioritize by real-world exploitability, fix the right things on a clock, and produce the evidence auditors expect. Book a free 30-minute scoping call.

Or call Bill directly at (770) 652-1282
All blog postsUpdated June 19, 2026