Karpenter in Vintage AWS Workload Clusters
Classification:
Installing Karpenter
Karpenter can be installed as a Managed App either via Happa or via an App CR. Before installing Karpenter, you need to make sure that the following prerequisites are met:
- a nodepool to be used for nodes managed by Karpenter must be available. It is totally fine to use an already-existing nodepool that is currently managed via Cluster Autoscaler. However, regardless of whether the nodepool is newly created or already existing, it must have at least one “standard” (managed via Cluster Autoscaler) node running
- some AWS resources must be created as per the Karpenter App README. It is enough to click on the link, ensure that the AWS account and the region are correct, and then click on the “Create” button. This will create the required resources in the AWS account.
In the future, the creation of the AWS resources will be automated. For now, it is manual.
Karpenter configuration
Fundamentals
The configuration of Karpenter happens in Workload Clusters directly via Provisioner
CRs. If you want to centrally manage Provisioner
CRs in a Git repository, you might consider using Flux in the Management Cluster and rely on its out-of-band delivery
functionality.
The Provisioner
CR
Is described in the Karpenter docs. It basically states how Karpenter should create nodes. Beware that Karpenter only create “single” nodes - it does NOT rely on the concept of AutoScaling Group. This differs from the way we usually manage nodes in our Giant Swarm nodepools.
How to configure provisioners
Our suggestion is the following:
- each nodepool should have at least one “normal” node running. In other words,
min
needs to be set to 1.max
can be set to 1 as well in case you want Karpenter to manage all the nodes in the nodepool. If you want to have some nodes managed by Karpenter and some nodes potentially managed by Cluster Autoscaler, you can setmax
to a higher value. - in case your nodepool needs to be made of 100% spot instances, create one
Provisioner
withcapacity-type: ["spot"]
- in case your nodepools needs to be made of spot instances, but you want to fall back to on-demand in case spot instances of the required types are not available, create two provisioners: one with
capacity-type: ["spot"]
and highweight
, and another one withcapacity-type: ["on-demand"]
and lower weight. These two can also be configured separately as per the example below - each
Provisioner
has alimits
section, which basically states when Karpenter should stop spinning up EC2 instances. For instance, iflimits
is set as 1000 vCPUs and 1000Gi of RAM, whenever Karpenter is reaching one of those values in total combined compute managed, it will stop spinning up Virtual Machines. We hence suggest very high limits in each provisioner in case the cluster needs to scale up to a very high number of nodes.
An example
In the following example, we will create two provisioners working on the same nodepool:
- the higher priority Provisioner (
weight: 10
):- will create spot instances only
- these instances need to be either
4xlarge
or8xlarge
or9xlarge
or12xlarge
or16xlarge
- these instances must not be small (no
t2
,t3
ort3a
instances)
- the lower priority Provisioner (
weight: 5
) will be used whenever the higher priority one can’t spin up a required EC2 instance, for instance because there is no compute available in the AWS region.- it will create on-demand instances only
- these instances can only be
m6a
,m6i
,m5
orm5a
instances of size4xlarge
. Whereas with spot instances we accept “whatever” as long as it is cheap, here we enforce stricter conditions since we are paying for the on-demand price
The following example shows a provisioner that works on a single-AZ nodepool. Nothing changes if the nodepool is multi-AZ, except for the fact that the provisioner will create nodes in all the AZs of the nodepool (specified in topology.kubernetes.io/zone
).
apiVersion: v1
items:
- apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: spot-provisioner-abc12
spec:
consolidation:
enabled: true
labels:
cluster: testcluster123
managed-by: karpenter
node.kubernetes.io/worker: ""
nodepool: abc12
role: worker
limits:
resources:
# 4000 vCPUs and 4000Gi of RAM
cpu: 4k
memory: 4000Gi
provider:
launchTemplate: testcluster123-abc12-LaunchTemplate
subnetSelector:
giantswarm.io/machine-deployment: abc12
tags:
Name: testcluster123-karpenter-spot-worker
cluster: testcluster123
giantswarm.io/cluster: testcluster123
managed-by: karpenter
nodepool: abc12
requirements:
- key: karpenter.k8s.aws/instance-family
# avoid small instance families
operator: NotIn
values:
- t3
- t3a
- t2
- key: karpenter.k8s.aws/instance-size
# avoid instances that are too small or too big (e.g. `large` or `48xlarge`)
operator: In
values:
- 4xlarge
- 8xlarge
- 9xlarge
- 12xlarge
- 16xlarge
- key: karpenter.k8s.aws/instance-hypervisor
operator: In
values:
- nitro
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1a
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- key: kubernetes.io/os
operator: In
values:
- linux
startupTaints:
- effect: NoExecute
key: node.cilium.io/agent-not-ready
value: "true"
ttlSecondsUntilExpired: 86400
weight: 10 # higher priority
- apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: ondemand-provisioner-abc12
spec:
consolidation:
enabled: true
labels:
cluster: testcluster123
managed-by: karpenter
node.kubernetes.io/worker: ""
nodepool: abc12
role: worker
limits:
resources:
cpu: 4k
memory: 4000Gi
provider:
launchTemplate: testcluster123-abc12-LaunchTemplate
subnetSelector:
giantswarm.io/machine-deployment: abc12
tags:
Name: testcluster123-karpenter-ondemand-worker
cluster: testcluster123
giantswarm.io/cluster: testcluster123
managed-by: karpenter
nodepool: abc12
requirements:
- key: karpenter.k8s.aws/instance-family
# only use "standard" instance types
operator: In
values:
- m6a
- m6i
- m5a
- m5
- key: karpenter.k8s.aws/instance-size
operator: In
values:
# only use 4xlarge
- 4xlarge
- key: karpenter.k8s.aws/instance-hypervisor
operator: In
values:
- nitro
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1a
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: kubernetes.io/os
operator: In
values:
- linux
startupTaints:
- effect: NoExecute
key: node.cilium.io/agent-not-ready
value: "true"
ttlSecondsUntilExpired: 86400
weight: 5 # lower priority
kind: List
metadata:
resourceVersion: ""
Final notes
- As you can see, we are not disabling cluster-autoscaler. In order to have Karpenter spin up nodes instead of Cluster Autoscaler, you can use this configuration - in particular, the
newPodScaleUpDelay: 300s
option is the relevant one - Karpenter is still in alpha. It is not yet battle-tested. We are working on it and we are confident that it will be ready for production use soon. In the meantime, we suggest to use it in production only if you are comfortable with the fact that it is still in alpha state.
- We know some things are still being done manually. We are working on making Karpenter a first-class citizen.
- In case you want to use a nodepool that has many nodes in it, we suggest slowly scaling it down to min=max=1 after installing Karpenter. This will allow Karpenter to spin up new nodes as needed. You can then set the
max
to a higher value to have Cluster Autoscaler act as a fallback in case Karpenter can’t spin up new nodes. - Karpenter won’t roll your nodes during upgrades. We hence suggest installing it on clusters already on Giant Swarm AWS release 19. Upgrades will require some manual intervention for now. Sync with your Account Engineer for more information.