OPA Gatekeeper policies

Before Gatekeeper, developers at Golem Trust could deploy containers running as root, with no resource limits, pulling images from anywhere on the internet. Ludmilla described this as “giving everyone a Burleigh Crum key and hoping they only open sensible doors.” OPA Gatekeeper is the admission webhook that enforces security invariants at deploy time, before workloads ever start. It runs in audit mode first so that violations become visible without breaking existing deployments, then shifts to enforcement once the namespace has been remediated. This runbook covers the Gatekeeper installation, the four core policies in use at Golem Trust, and the procedure for testing a new policy before enabling enforcement.

Install Gatekeeper via Helm

helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts
helm repo update

helm install gatekeeper gatekeeper/gatekeeper \
  --namespace gatekeeper-system \
  --create-namespace \
  --set replicas=3 \
  --set auditInterval=60 \
  --set constraintViolationsLimit=100

Verify that the webhook configuration was created:

kubectl get validatingwebhookconfiguration | grep gatekeeper
kubectl get pods -n gatekeeper-system

How ConstraintTemplates and Constraints work

A ConstraintTemplate defines the policy logic in Rego and declares what parameters the policy accepts. A Constraint is an instance of a template that specifies the enforcement mode, which namespaces it applies to, and the parameter values for that specific instance. You always create the template before the constraint.

Policy: require-non-root-containers

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequirenonroot
spec:
  crd:
    spec:
      names:
        kind: K8sRequireNonRoot
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequirenonroot
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.securityContext.runAsNonRoot
          msg := sprintf("Container %v must set runAsNonRoot: true", [container.name])
        }
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          container.securityContext.runAsUser == 0
          msg := sprintf("Container %v must not run as UID 0", [container.name])
        }
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequireNonRoot
metadata:
  name: require-non-root-containers
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system

Policy: require-resource-limits

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequireresourcelimits
spec:
  crd:
    spec:
      names:
        kind: K8sRequireResourceLimits
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequireresourcelimits
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.cpu
          msg := sprintf("Container %v must set resources.limits.cpu", [container.name])
        }
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.memory
          msg := sprintf("Container %v must set resources.limits.memory", [container.name])
        }

Policy: require-approved-registry

Only images from registry.golemtrust.am and gcr.io/distroless are permitted. This stops developers from accidentally pulling from Docker Hub in production, which caused the incident where a deprecated image with a critical CVE was deployed to the merchants-guild namespace.

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sapprovedregistry
spec:
  crd:
    spec:
      names:
        kind: K8sApprovedRegistry
      validation:
        openAPIV3Schema:
          properties:
            approvedRegistries:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sapprovedregistry
        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          image := container.image
          not any_approved(image, input.parameters.approvedRegistries)
          msg := sprintf("Container image %v is not from an approved registry", [image])
        }
        any_approved(image, registries) {
          registry := registries[_]
          startswith(image, registry)
        }
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sApprovedRegistry
metadata:
  name: require-approved-registry
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
  parameters:
    approvedRegistries:
      - registry.golemtrust.am
      - gcr.io/distroless

Policy: require-security-context

All pods must declare a securityContext at the pod level with seccompProfile set to RuntimeDefault or Localhost.

Audit mode versus enforcement mode

Set enforcementAction: warn or enforcementAction: dryrun when testing a new policy in a production cluster. Gatekeeper will log violations and expose them in the constraint status but will not block the admission request.

# Check current violations in audit mode
kubectl get k8srequirenonroot require-non-root-containers -o jsonpath='{.status.violations}' | jq .

Once all existing violations are remediated, switch to enforcementAction: deny by patching the constraint:

kubectl patch k8srequirenonroot require-non-root-containers \
  --type merge \
  -p '{"spec":{"enforcementAction":"deny"}}'

Dr. Crucible’s standing rule: no new policy goes to deny mode without at least one full audit cycle (minimum 60 seconds after the policy is applied) showing zero violations in the target namespaces.