Calico deployment¶
Calico handles all pod networking across Golem Trust’s three clusters. Ludmilla chose Calico specifically for its network policy support, which is considerably richer than the basic Kubernetes NetworkPolicy API. Every namespace runs under a default-deny policy; workloads must explicitly declare what traffic they will accept and originate. This is not optional: the Royal Bank’s tenancy agreement requires network-level isolation between the royal-bank namespace and every other customer namespace, and Calico is the mechanism that enforces this. This runbook covers the Calico operator installation, CIDR allocation, BGP versus VXLAN mode selection, and the procedure for verifying that pod isolation is actually working.
CIDR allocation per cluster¶
Pod CIDRs must not overlap between clusters. The current allocation is:
finland-cluster: 10.244.0.0/16
germany-cluster: 10.245.0.0/16
helsinki-cluster: 10.246.0.0/16
Service CIDRs follow the same pattern:
finland-cluster: 10.96.0.0/16
germany-cluster: 10.97.0.0/16
helsinki-cluster: 10.98.0.0/16
These ranges are recorded in the network allocation register. Do not deviate from them; overlapping CIDRs cause routing failures in the VPN tunnels that connect the clusters.
Install Calico via the Tigera operator¶
Install the Tigera operator first, then apply an Installation resource that configures Calico for the cluster:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/tigera-operator.yaml
Create the Installation resource. Replace <POD_CIDR> with this cluster’s allocated range from the table above:
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
ipPools:
- blockSize: 26
cidr: <POD_CIDR>
encapsulation: VXLAN
natOutgoing: Enabled
nodeSelector: all()
Apply it:
kubectl create -f calico-installation.yaml
Wait for all Calico pods to reach Running state:
kubectl get pods -n calico-system --watch
BGP mode versus VXLAN mode¶
Golem Trust runs VXLAN encapsulation by default on all three clusters. BGP mode would offer lower overhead but requires the Hetzner network to carry BGP routes, which is not supported without additional configuration at the hypervisor level. VXLAN adds approximately 50 bytes of overhead per packet and a small CPU cost for encapsulation and decapsulation. For Golem Trust’s traffic volumes this is not a concern; Dr. Crucible benchmarked it at under 0.5% CPU on worker nodes at peak load.
If a future cluster is deployed in an environment where BGP is available (for example, a co-location facility with a proper network fabric), change the encapsulation field in the Installation resource to None and configure the BGP peer resources accordingly.
Apply default-deny NetworkPolicy¶
Apply a default-deny policy to every namespace immediately after creating it. This is enforced by a Gatekeeper policy (see the gatekeeper-policies runbook), but the NetworkPolicy itself must still be present:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: <NAMESPACE>
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
A Helm chart in the golem-trust/infra repository applies this policy automatically when a new namespace is provisioned. Do not rely on manual application.
Calico network policy for cross-namespace communication¶
Where services in different namespaces must communicate (for example, the royal-bank namespace consuming a shared logging service in the platform namespace), use a Calico NetworkPolicy with namespace selectors rather than a plain Kubernetes NetworkPolicy. Calico’s own CRDs support richer matching:
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: allow-platform-logging
namespace: royal-bank
spec:
selector: all()
egress:
- action: Allow
destination:
namespaceSelector: name == "platform"
selector: app == "log-collector"
ports:
- protocol: TCP
port: 5044
Verify pod isolation¶
After applying default-deny policies, verify that isolation is working before handing a namespace to a customer. Carrot’s standard procedure is to deploy two test pods in separate namespaces and confirm that traffic is blocked:
# Deploy a netcat listener in namespace-a
kubectl run test-server -n namespace-a --image=nicolaka/netshoot -- nc -lk 8080
# Attempt connection from namespace-b (should time out)
kubectl run test-client -n namespace-b --image=nicolaka/netshoot --rm -it -- \
nc -zv test-server.namespace-a.svc.cluster.local 8080
The connection attempt from namespace-b must time out. If it succeeds, the NetworkPolicy is not correctly applied. Check that the default-deny policy exists in both namespaces and that there is no erroneous allow rule.
Debugging Calico policy drops¶
When a developer reports that their service cannot reach another service and they believe there should be a policy permitting it, use the Calico policy viewer:
calicoctl get networkpolicy -n <NAMESPACE> -o yaml
# Check effective policy on a specific pod
calicoctl get workloadendpoint -n <NAMESPACE>
# View Felix logs for policy enforcement decisions
kubectl logs -n calico-system -l app=calico-node -c calico-node | grep -i deny
Ponder has a standing request that all new NetworkPolicy changes go through a pull request review before being applied to production namespaces. This caught three misconfigured policies in the first month after the migration.