Ops-recipes

Our collection of recipes for handling our alerts and common problems.

Teleport Ops-recipes

Our collection of recipes for handling teleport alerts and problems.

Audit Logs Troubleshooting

How to check the action of certain user in a cluster

Balance nodes with descheduler

How to balance workloads across nodes with descheduler

Cert Manager Troubleshooting

Troubleshooting steps to resolve issues with cert-manager.

Checking for Deprecated Kubernetes APIs with Pluto

How to use Pluto to identify deprecated Kubernetes APIs in your manifests and Helm charts.

Cilium Troubleshooting

If we suspect the CNI is misbehaving

How to trigger a CloudFormation stack reconciliation

Master Machine Usage Too High

What to do once we are paged for ‘workload cluster master machine CPU usage is too high’.

Network error rate is too high

How to troubleshoot network errors

Scheduling a cronjob to patch resources

Help to debug common problems

Switch from AWS-CNI, Calico and Kube-Proxy to Cilium

This document explains how the upgrade from v18 to v19 legacy releases works and how it can break and affect customer workloads. This is currently implemented in AWS only.

Troubleshooting

Help to debug common problems

Troubleshooting GitHub

Troubleshooting GitHub related issues.

Troubleshooting GitOps

Help to debug GitOps problems.

Last modified March 13, 2023: Create upgrade-to-cilium.md (#17) (163ea2e)