Vulnerability management procedures

Playbook for responding to vulnerability disclosures that affect container images and application dependencies. A vulnerability is disclosed; the question is which of the production images contain the affected component, and how quickly can they be remediated. The SBOM pipeline and Trivy scanning exist specifically to answer the first question. This runbook covers the full lifecycle from disclosure to resolution and documents the escalation thresholds that determine response urgency.

Severity tiers and response times

Not all vulnerabilities require the same urgency. The following response times are required from the moment a CVE is confirmed to affect a production image:

Critical (CVSS 9.0+): patched image in production within 48 hours. If no fix is available upstream, implement a mitigating control (network policy, runtime restriction, or temporary service withdrawal) within 24 hours and document the interim state.

High (CVSS 7.0-8.9): patched image in production within 7 days. Cheery reviews the exposure weekly; unresolved highs are escalated to Ludmilla at the 7-day mark.

Medium (CVSS 4.0-6.9): addressed in the next scheduled dependency update cycle, no later than 30 days. Tracked in the vulnerability register.

Low (CVSS below 4.0) and Negligible: recorded in the quarterly vulnerability review. No mandatory remediation timeline unless the vulnerability is in a component with elevated exposure.

These thresholds were agreed with the Royal Bank of Ankh-Morpork as part of the supply chain security commitments.

Identifying affected images

When a CVE is announced, use Grype to identify which images contain the affected package. Install Grype alongside Syft:

apt install -y grype

Query all current production images against the CVE:

for IMAGE in $(harbor-cli list-images golemtrust-prod); do
  echo "=== $IMAGE ==="
  grype "$IMAGE" --output table \
    --severity critical,high \
    2>/dev/null | grep -i "CVE-YYYY-NNNNN" || echo "Not affected"
done

Replace CVE-YYYY-NNNNN with the actual CVE identifier.

Alternatively, query the stored SBOM attestations directly. This is faster than pulling and scanning each image:

for IMAGE in $(harbor-cli list-images golemtrust-prod); do
  PAYLOAD=$(cosign download attestation \
    --predicate-type https://spdx.dev/Document \
    "registry.golemtrust.am/golemtrust-prod/${IMAGE}" \
    2>/dev/null | jq -r '.payload' | base64 -d)

  if echo "$PAYLOAD" | jq -e \
    '.predicate.packages[] | select(.name == "PACKAGE_NAME")' \
    > /dev/null 2>&1; then
    echo "AFFECTED: $IMAGE"
    echo "$PAYLOAD" | jq '.predicate.packages[] | select(.name == "PACKAGE_NAME") | {name, versionInfo}'
  fi
done

Replace PACKAGE_NAME with the affected library name (for example, libssl3, log4j-core, requests).

Remediation workflow

Once affected images are identified, the remediation sequence is:

  1. For OS package vulnerabilities: update the base image. Pull the latest version from the upstream base-images project, which is rescanned weekly. If the base image itself lacks the fix, request an upstream patch or switch to a different base. Ludmilla approves base image changes.

  2. For application dependency vulnerabilities: update the dependency in the application code. Open a pull request in the affected repository. The CI/CD pipeline will rebuild, scan, sign, and push automatically if the scan passes.

  3. For vulnerabilities in third-party images (Redis, PostgreSQL): check the upstream project for a patched version. Pull the patched image into the third-party project in Harbor, submit it for Ludmilla’s SBOM review, and replace the deployed version.

  4. For vulnerabilities with no upstream fix: document the interim mitigation in the vulnerability register (see below). Apply a Kubernetes NetworkPolicy to restrict the exposed service’s network access. Set a review date of 14 days; if no fix has appeared, escalate to Ludmilla and consider service withdrawal.

Vulnerability register

Maintain a vulnerability register in the internal repository at security/vulnerability-register.md. Each entry records:

| CVE           | Package       | Affected images    | Severity | Discovered   | Remediated   | Notes                          |
|---------------|---------------|--------------------|----------|--------------|--------------|--------------------------------|
| CVE-YYYY-NNNN | libssl3 3.0.2 | keycloak-app:abc1  | High     | 2026-03-15   | 2026-03-19   | Base image updated to 3.0.11   |
| CVE-YYYY-MMMM | requests 2.28 | payments-svc:def2  | Critical | 2026-03-20   | open         | Fix in upstream v2.32, pending |

Unresolved entries require a review date in the Notes field. Cheery reviews the register weekly. Entries that have exceeded their remediation deadline without resolution are flagged in the Monday standup.

.trivyignore entries

When a vulnerability cannot be immediately remediated and has been formally accepted, add it to the affected repository’s .trivyignore file. Acceptance requires:

  • Written review documenting why the vulnerability is not exploitable in the deployed context (for example, a Windows-only vulnerability in a Linux container image)

  • Approval by Ludmilla for Critical or High findings

  • An expiry date no more than 90 days from acceptance

Format:

# CVE-YYYY-NNNNN: <package name>
# Reason: <brief explanation of why this is not exploitable in context>
# Accepted: <date>, Reviewed by: <username>, Expires: <date>
CVE-YYYY-NNNNN

The quarterly vulnerability review (see below) checks all .trivyignore entries for expired acceptances. Expired entries that have not been renewed are removed, causing the next scan to fail until the vulnerability is remediated.

Quarterly vulnerability review

Cheery runs a quarterly review of the security posture across all production images. The review covers:

All current production images rescanned with Grype for any vulnerabilities not caught during the original build scan (vulnerability databases update continuously; an image clean at build time may have known vulnerabilities three months later):

for IMAGE in $(harbor-cli list-images golemtrust-prod); do
  grype "registry.golemtrust.am/golemtrust-prod/${IMAGE}" \
    --output table --severity critical,high,medium
done

All .trivyignore entries across all repositories reviewed for:

  • Entries past their expiry date

  • Entries where the original justification is no longer valid (for example, a Linux-only vulnerability was filed as “Windows only”)

  • Entries where a fix has become available upstream

The SBOM drift report (see the SBOM generation runbook) reviewed for unexplained package additions.

The vulnerability register reviewed for any entries open beyond their remediation deadline.

The quarterly review report is shared with Ludmilla and, for Critical and High findings affecting the Royal Bank integration services, with the Royal Bank liaison.

Emergency response: zero-day disclosure

When a zero-day is disclosed affecting a component that is likely present in production images (for example, a widely used TLS library or container runtime), follow the accelerated response procedure:

  1. Run the SBOM query immediately to identify affected images. Do not wait for Grype’s database to update; query by package name and version range manually.

  2. If any Critical-severity affected images are confirmed: alert Ludmilla and begin remediation immediately. Do not wait for the standard triage cycle.

  3. If a fix is not yet available: implement network isolation for affected services. Apply restrictive NetworkPolicies. Consider whether the service can be taken offline until a fix is available; coordinate with the service owner.

  4. Post a status update in the #security-incidents channel within one hour of confirmation.

  5. When a fix becomes available, prioritise it above all other security work. The 48-hour response window for Critical findings starts from when the fix is available, not from when the CVE was first disclosed, but the interim mitigation must remain in place until remediation is complete.

The zero-day procedure does not require the standard pull request process for the .trivyignore exception; Ludmilla can approve exceptions verbally, with documentation to follow within 24 hours.