Response automation¶
Detecting a malicious container is useful; killing it automatically within seconds is better. Golem Trust operates a response controller that receives Falco alert webhooks from Falcosidekick and acts on them without human intervention for a defined set of high-confidence rules. Dr. Crucible designed the safeguards carefully: the controller never kills pods in kube-system, never kills DaemonSet pods, and operates only in the production and staging namespaces. Ludmilla reviewed the design and added the requirement that every automated action be logged to the Graylog audit stream, so that Mr. Bent can account for any container termination in the compliance records.
Architecture¶
The response controller is a small Go service deployed as a Deployment (two replicas) in the falco namespace. Falcosidekick sends a webhook to the controller for rules tagged autoresponse: true. The controller inspects the alert, identifies the pod, applies the configured action, and writes a structured audit log entry.
Falcosidekick webhook output configuration in falcosidekick-values.yaml:
config:
webhook:
address: http://falco-response-controller.falco.svc.cluster.local:8080/alert
minimumpriority: critical
customheaders: "X-Falco-Token: <token-from-secret>"
Response controller configuration¶
The controller reads its configuration from a ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: response-controller-config
namespace: falco
data:
config.yaml: |
dryrun: false
allowed_namespaces:
- production
- staging
denied_namespaces:
- kube-system
- falco
- monitoring
- cert-manager
protected_owners:
- DaemonSet
- StatefulSet
rules:
- name: "Crypto mining process detected"
action: kill_pod
audit_reason: "Crypto miner process detected by Falco; automated termination"
- name: "Shell spawned in production database container"
action: kill_pod
audit_reason: "Shell in production DB container; automated termination per Ludmilla policy GTR-014"
- name: "Network connection to known mining pool"
action: kill_pod
audit_reason: "Mining pool network connection; automated termination"
Pod kill logic¶
When the controller receives a webhook for a rule with action: kill_pod, it executes the following sequence:
Extract
k8s.ns.nameandk8s.pod.namefrom the alert output fields.Check that the namespace is in
allowed_namespacesand not indenied_namespaces.Look up the pod’s owner references. If the top-level owner is a DaemonSet or is in
protected_owners, abort and log a warning.Delete the pod:
kubectl delete pod <name> -n <namespace> --grace-period=0.Write a structured audit log entry to Graylog.
The controller uses the Kubernetes API directly via in-cluster credentials bound to a ServiceAccount with the minimum required RBAC:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: falco-response-controller
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "delete"]
- apiGroups: ["apps"]
resources: ["daemonsets", "replicasets", "statefulsets"]
verbs: ["get"]
NetworkPolicy auto-update¶
For rules that indicate malicious outbound connections, the controller can add a NetworkPolicy to isolate the pod’s namespace rather than simply killing the pod. This is configured per rule:
- name: "Network connection to known mining pool"
action: isolate_namespace
audit_reason: "Mining pool connection; namespace isolated pending investigation"
The isolation action creates a deny-all egress NetworkPolicy in the affected namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: falco-emergency-isolation
namespace: <affected-namespace>
spec:
podSelector: {}
policyTypes:
- Egress
egress: []
Cheery is automatically paged when an isolation policy is applied, as this affects all workloads in the namespace, not just the compromised pod.
Dry-run mode¶
When a new rule is tagged for automated response, set dryrun: true in the ConfigMap and redeploy. In dry-run mode, the controller logs what it would have done without taking action:
kubectl logs -n falco -l app=falco-response-controller | grep DRY_RUN
Leave new rules in dry-run mode for at least one week of production operation, reviewing the logs to confirm there are no false positives before enabling live termination.
Audit log format¶
Every automated action writes a JSON entry to the Graylog audit stream falco-automated-response:
{
"timestamp": "2025-11-14T03:22:17Z",
"action": "kill_pod",
"namespace": "production",
"pod": "payments-api-7d9f4b-xkp2m",
"rule": "Crypto mining process detected",
"dry_run": false,
"reason": "Crypto miner process detected by Falco; automated termination",
"operator": "falco-response-controller",
"falco_priority": "CRITICAL",
"falco_output": "Crypto miner process started (ns=production pod=payments-api-7d9f4b-xkp2m ...)"
}
Mr. Bent reviews these entries weekly as part of the change management reconciliation process. Any automated pod kill that affected a customer-facing service must be annotated in the incident register within 24 hours.