Master Machine Usage Too High
Confidentiality:
What to do once we are paged byWorkload Cluster master machine CPU usage is too high.
Table of Contents
Identify the culprit
- Check the master conditions to see if kubelet has reported any problem
» kubectl describe node master-XXXX
...
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 11 Apr 2023 11:30:18 +0200 Tue, 11 Apr 2023 11:30:18 +0200 CiliumIsUp Cilium is running on this node
MemoryPressure False Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 04 May 2023 12:28:22 +0200 Fri, 28 Apr 2023 16:08:33 +0200 KubeletReady kubelet is posting ready status
There you can see if network, memory or disk could be behind the problem.
- Check if there is any pod consuming lots of CPU/Mem
kubectl resource-capacity -pu --sort mem.usage --node-labels node-role.kubernetes.io/control-plane=
NODE NAMESPACE POD CPU REQUESTS CPU LIMITS CPU UTIL MEMORY REQUESTS MEMORY LIMITS MEMORY UTIL
* * * 10401m (86%) 13042m (108%) 2830m (23%) 31137Mi (67%) 43484Mi (93%) 32059Mi (69%)
ip-10-0-5-112.eu-central-1.compute.internal * * 3585m (89%) 3807m (95%) 1349m (33%) 10609Mi (69%) 14293Mi (93%) 11348Mi (73%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm aws-admission-controller-6bb6dfb7df-z8hkg 15m (0%) 75m (1%) 1m (0%) 100Mi (0%) 167Mi (1%) 28Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system aws-cloud-controller-manager-w89jc 15m (0%) 0m (0%) 2m (0%) 100Mi (0%) 0Mi (0%) 29Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm aws-operator-14-15-0-54559bc49b-26xcg 15m (0%) 37m (0%) 2m (0%) 100Mi (0%) 100Mi (0%) 61Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal monitoring cert-exporter-daemonset-wqwnr 50m (1%) 150m (3%) 1m (0%) 50Mi (0%) 50Mi (0%) 15Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system cert-manager-controller-5697745d4d-zjzzh 50m (1%) 0m (0%) 4m (0%) 100Mi (0%) 0Mi (0%) 266Mi (1%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system cilium-operator-7744d4689c-hpknn 0m (0%) 0m (0%) 1m (0%) 0Mi (0%) 0Mi (0%) 42Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system cilium-zhcbh 100m (2%) 0m (0%) 17m (0%) 100Mi (0%) 0Mi (0%) 298Mi (1%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system coredns-workers-cff679f78-blwf4 250m (6%) 0m (0%) 3m (0%) 192Mi (1%) 192Mi (1%) 42Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system ebs-csi-controller-678bcd7994-zl8pj 55m (1%) 0m (0%) 2m (0%) 211Mi (1%) 0Mi (0%) 138Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm etcd-backup-operator-b54fdc876-8dz92 11m (0%) 11m (0%) 5m (0%) 105Mi (0%) 269Mi (1%) 31Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm falco-4tsnk 93m (2%) 111m (2%) 16m (0%) 121Mi (0%) 242Mi (1%) 93Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm falco-falco-exporter-76758 100m (2%) 100m (2%) 1m (0%) 128Mi (0%) 128Mi (0%) 20Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal flux-giantswarm image-reflector-controller-b8f54fcb6-4zl79 100m (2%) 0m (0%) 5m (0%) 100Mi (0%) 0Mi (0%) 53Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-api-healthz-ip-10-0-5-112.eu-central-1.compute.internal 50m (1%) 0m (0%) 13m (0%) 20Mi (0%) 0Mi (0%) 13Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-api-server-ip-10-0-5-112.eu-central-1.compute.internal 1133m (28%) 2550m (63%) 160m (4%) 7168Mi (46%) 11264Mi (73%) 4548Mi (29%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm k8s-audit-metrics-fb84d 23m (0%) 23m (0%) 11m (0%) 105Mi (0%) 105Mi (0%) 34Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-controller-manager-ip-10-0-5-112.eu-central-1.compute.internal 200m (5%) 0m (0%) 23m (0%) 200Mi (1%) 0Mi (0%) 588Mi (3%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system k8s-scheduler-ip-10-0-5-112.eu-central-1.compute.internal 200m (5%) 0m (0%) 2m (0%) 200Mi (1%) 0Mi (0%) 76Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kyverno kyverno-56b45576df-ddx8q 100m (2%) 100m (2%) 726m (18%) 256Mi (1%) 1024Mi (6%) 558Mi (3%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm management-cluster-admission-controller-manager-f4df9d976-rbp4n 100m (2%) 250m (6%) 1m (0%) 250Mi (1%) 250Mi (1%) 25Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal monitoring net-exporter-z6ng4 50m (1%) 0m (0%) 1m (0%) 75Mi (0%) 150Mi (0%) 16Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal kube-system ingress-nginx-controller-app-546758487d-jqgrc 500m (12%) 0m (0%) 2m (0%) 600Mi (3%) 0Mi (0%) 211Mi (1%)
ip-10-0-5-112.eu-central-1.compute.internal monitoring node-exporter-z2qgt 75m (1%) 0m (0%) 1m (0%) 50Mi (0%) 75Mi (0%) 26Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal monitoring oauth2-proxy-55568fc5cc-h85d6 100m (2%) 100m (2%) 1m (0%) 100Mi (0%) 100Mi (0%) 23Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal promtail promtail-454w6 100m (2%) 200m (5%) 13m (0%) 128Mi (0%) 128Mi (0%) 87Mi (0%)
ip-10-0-5-112.eu-central-1.compute.internal giantswarm vault-exporter-6c64bbc55d-zd7km 100m (2%) 100m (2%) 1m (0%) 50Mi (0%) 50Mi (0%) 23Mi (0%)
You can install the above tool with krew or check here:
kubectl krew install resource-capacity
- Check
K8s API performance
grafana dashboard on teh installation to verify the ETCD and node pressure if correct
» export=INSTALLATION=XXXXX
» opsctl open -i $INSTALLATION -a grafana
Note: If you notice the CPU and memory overloads the machine for last days, you can think on growing the master machine to next instance type.
Last modified June 20, 2023: Docs: Rename `nginx-ingress-controller` to `ingress-nginx-controller`. (#82) (c844fa1)