Master Machine Usage Too High
What to do once we are paged for ‘workload cluster master machine CPU usage is too high’.
Classification:
What to do once we are paged for “workload cluster master machine CPU usage is too high”.
Identify the culprit
Check the master conditions to see if kubelet has reported any problem
» kubectl describe node master-XXXX ... Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Tue, 11 Apr 2023 11:30:18 +0200 Tue, 11 Apr 2023 11:30:18 +0200 CiliumIsUp Cilium is running on this node MemoryPressure False Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletReady kubelet is posting ready status
There you can see if network, memory or disk could be behind the problem.
Check if there is any pod consuming lots of CPU/Mem
kubectl resource-capacity -pu --sort mem.usage --node-labels node-role.kubernetes.io/control-plane= NODE NAMESPACE POD CPU REQUESTS CPU LIMITS CPU UTIL MEMORY REQUESTS MEMORY LIMITS MEMORY UTIL * * * 10401m (86%) 13042m (108%) 2830m (23%) 31137Mi (67%) 43484Mi (93%) 32059Mi (69%) ip-10-0-5-112.eu-central-1.compute.internal * * 3585m (89%) 3807m (95%) 1349m (33%) 10609Mi (69%) 14293Mi (93%) 11348Mi (73%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm aws-admission-controller-6bb6dfb7df-z8hkg 15m (0%) 75m (1%) 1m (0%) 100Mi (0%) 167Mi (1%) 28Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system aws-cloud-controller-manager-w89jc 15m (0%) 0m (0%) 2m (0%) 100Mi (0%) 0Mi (0%) 29Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm aws-operator-14-15-0-54559bc49b-26xcg 15m (0%) 37m (0%) 2m (0%) 100Mi (0%) 100Mi (0%) 61Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal monitoring cert-exporter-daemonset-wqwnr 50m (1%) 150m (3%) 1m (0%) 50Mi (0%) 50Mi (0%) 15Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system cert-manager-controller-5697745d4d-zjzzh 50m (1%) 0m (0%) 4m (0%) 100Mi (0%) 0Mi (0%) 266Mi (1%) ip-10-0-5-112.eu-central-1.compute.internal kube-system cilium-operator-7744d4689c-hpknn 0m (0%) 0m (0%) 1m (0%) 0Mi (0%) 0Mi (0%) 42Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system cilium-zhcbh 100m (2%) 0m (0%) 17m (0%) 100Mi (0%) 0Mi (0%) 298Mi (1%) ip-10-0-5-112.eu-central-1.compute.internal kube-system coredns-workers-cff679f78-blwf4 250m (6%) 0m (0%) 3m (0%) 192Mi (1%) 192Mi (1%) 42Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system ebs-csi-controller-678bcd7994-zl8pj 55m (1%) 0m (0%) 2m (0%) 211Mi (1%) 0Mi (0%) 138Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm etcd-backup-operator-b54fdc876-8dz92 11m (0%) 11m (0%) 5m (0%) 105Mi (0%) 269Mi (1%) 31Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm falco-4tsnk 93m (2%) 111m (2%) 16m (0%) 121Mi (0%) 242Mi (1%) 93Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm falco-falco-exporter-76758 100m (2%) 100m (2%) 1m (0%) 128Mi (0%) 128Mi (0%) 20Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal flux-giantswarm image-reflector-controller-b8f54fcb6-4zl79 100m (2%) 0m (0%) 5m (0%) 100Mi (0%) 0Mi (0%) 53Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-api-healthz-ip-10-0-5-112.eu-central-1.compute.internal 50m (1%) 0m (0%) 13m (0%) 20Mi (0%) 0Mi (0%) 13Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-api-server-ip-10-0-5-112.eu-central-1.compute.internal 1133m (28%) 2550m (63%) 160m (4%) 7168Mi (46%) 11264Mi (73%) 4548Mi (29%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm k8s-audit-metrics-fb84d 23m (0%) 23m (0%) 11m (0%) 105Mi (0%) 105Mi (0%) 34Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-controller-manager-ip-10-0-5-112.eu-central-1.compute.internal 200m (5%) 0m (0%) 23m (0%) 200Mi (1%) 0Mi (0%) 588Mi (3%) ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-scheduler-ip-10-0-5-112.eu-central-1.compute.internal 200m (5%) 0m (0%) 2m (0%) 200Mi (1%) 0Mi (0%) 76Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kyverno kyverno-56b45576df-ddx8q 100m (2%) 100m (2%) 726m (18%) 256Mi (1%) 1024Mi (6%) 558Mi (3%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm management-cluster-admission-controller-manager-f4df9d976-rbp4n 100m (2%) 250m (6%) 1m (0%) 250Mi (1%) 250Mi (1%) 25Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal monitoring net-exporter-z6ng4 50m (1%) 0m (0%) 1m (0%) 75Mi (0%) 150Mi (0%) 16Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal kube-system ingress-nginx-controller-app-546758487d-jqgrc 500m (12%) 0m (0%) 2m (0%) 600Mi (3%) 0Mi (0%) 211Mi (1%) ip-10-0-5-112.eu-central-1.compute.internal monitoring node-exporter-z2qgt 75m (1%) 0m (0%) 1m (0%) 50Mi (0%) 75Mi (0%) 26Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal monitoring oauth2-proxy-55568fc5cc-h85d6 100m (2%) 100m (2%) 1m (0%) 100Mi (0%) 100Mi (0%) 23Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal promtail promtail-454w6 100m (2%) 200m (5%) 13m (0%) 128Mi (0%) 128Mi (0%) 87Mi (0%) ip-10-0-5-112.eu-central-1.compute.internal giantswarm vault-exporter-6c64bbc55d-zd7km 100m (2%) 100m (2%) 1m (0%) 50Mi (0%) 50Mi (0%) 23Mi (0%)
You can install the above tool with krew or check here:
kubectl krew install resource-capacity
Check
K8s API performance
grafana dashboard on teh installation to verify the ETCD and node pressure if correct» export=INSTALLATION=XXXXX » opsctl open -i $INSTALLATION -a grafana
Note: If you notice the CPU and memory overloads the machine for last days, you can think on growing the master machine to next instance type.