Releases as gitops managed configmaps

Problem Statement

We need to …

be able to define Giant Swarm releases as sets of configuration.
be able to create Giant Swarm clusters in a repeatable and reliable way.
be able to upgrade Giant Swarm clusters in a repeatable way with cluster-api.
be able to default Giant Swarm clusters through all our user interfaces consistently.

We want to …

decouple managed applications from strict release flows.

gitops approach

We have already taken steps to manage configuration for our components in management clusters through open source gitops tools. This has set a precedent for gitops management for configuration in other areas as well.

Our customers generally desire to work with an Infrastructure As Code approach which is extended by currently available open source gitops tooling.

This RFC will exclusively deal with the impact of using a gitops driven approach in the following sections.

Configuration structure

There are many different possible ways to structure the configuration which is used to create the clusters. The goal here is to provide one way that we think can work and then iterating on it - we do not want to box ourselves in.

Cluster-api fundamental structure

There are some links between and some fields in cluster-api custom resources which are essential. These fields can not be inferred from a webhook without context knowledge. It is therefore necessary to supply these values immediately when the custom resource is created (e.g. kubectl-gs or happa needs to fill them). We therefore define fundamental templates as minimal templates which have to be submitted to create a working cluster.

We can look at a valid MachinePool custom resource to determine its fundamental structure:

apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachinePool
metadata:
  name: capa-mp-0 # This is taken from upstream, Giant Swarm would use an ID like `eump1`.
  namespace: my-org
spec:
  clusterName: capa
  replicas: 2
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfig
          name: capa-mp-0
      clusterName: capa
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AWSMachinePool
        name: capa-mp-0
      version: v1.16.8

We can not infer apiVersion as it’s an essential field.
We can not infer kind as it’s an essential field.
We can not infer metadata.name as it’s required to create the custom resource.
We can not infer spec.clusterName as it’s unique to the created cluster.
We can not infer spec.template.spec.bootstrap.configRef.name as it’s unique to the cluster.
We can not infer spec.template.spec.infrastructureRef.name as it’s unique to the cluster.

All other values can either be inferred from other information in the custom resource or from default values.

Therefore the minimal custom resource which would have to be submitted by a client should be:

apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachinePool
metadata:
  name: capa-mp-0
  namespace: my-org
spec:
  clusterName: capa
  template:
    spec:
      bootstrap:
        configRef:
          name: capa-mp-0
      infrastructureRef:
        name: capa-mp-0

This minimal custom resource can also be represented as a template which will remain valid throughout the lifecycle of v1alpha3 in this case:

apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachinePool
metadata:
  name: {{machinePoolID}}
  namespace: {{orgNamespace}}
spec:
  clusterName: {{clusterID}}
  template:
    spec:
      bootstrap:
        configRef:
          name: {{machinePoolID}}
      infrastructureRef:
        name: {{machinePoolID}}

This template is going to be stable for any v1alpha3 cluster and would only need minimal changes to be stable for v1alpha4.

Cluster-api base files

We need to default a big set of custom resources with cluster-api. These custom resources need to be created and defaulted in a consistent way. The approach we follow here is to have a single source of truth which is as complete as possible.

base files are our static defaulting values - any webhook or tool which does defaulting should use them as the source of truth.

Let’s look at an example to clarify the technical implications. A valid MachinePool CR in v1alpha3 can look like this:

apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachinePool
metadata:
  name: eump1
  namespace: my-org
spec:
  clusterName: capa
  replicas: 2
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfig
          name: eump1
      clusterName: capa
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AWSMachinePool
        name: eump1
      version: v1.16.8

From this custom resource we can now extract the values which are static and configurable. So values which a customer might want to change the default of and which we can default statically independent of any other content in the custom resource.

In this particular case there are two values which are both static and configurable:

spec.replicas
spec.template.spec.version

We can now construct a ConfigMap with these values which are then going to be used by any defaulting tool/webhook:

apiVersion: v1
kind: ConfigMap
metadata:
  name: machinepool-21-0-0
  namespace: giantswarm
data:
  spec:
    replicas: 3
    template:
      spec:
        version: v1.21.4

It is important to note that an assumption is that no values in the base file are immutable!

Cluster-api context dependent values

We have some cases in cluster-api where a value in a custom resource is neither part of the fundamental structure nor is it easy to statically default.

These values are usually highly dependent on the context in which the cluster is created. For example the CIDR in which the cluster is created. We currently assume that these values will be defaulted by custom webhooks or operators with significant provider specific domain knowledge.

Putting together a Giant Swarm release

A Giant Swarm release is just a set of cluster-api base files as defined previously with added apps.

The following folder structure would be contained in giantswarm/releases:

aws:
  v21.0.0:
    base:
      cluster.yaml # configmap containing the base file (defined above)
      aws-cluster.yaml
      machinepool.yaml
      awsmachinepool.yaml
      ...
      net-exporter.yaml # configmap containing version information for defaulting!
    overlay:
      ... # For easier templating of shared values!
  v22.0.0:
    base:
      ...
    overlay:
      ...

In the management cluster a release is then simply a set of configmaps. The configmaps for e.g. release v21.0.0 will be applied through argoCD by taking the base configmaps.

As shown in previous examples these configmaps would contain the release in their name (e.g. machinepool-21-0-0). Any in-cluster webhook or operator can then easily navigate to these configmaps to look up defaults for a cluster.

A cluster creation workflow can now look as follows:

Client (e.g. happa) submits --dry-run with minimal custom resources to management cluster.
Mutating webhooks take information from base file configmaps to default.
Client receives fully defaulted custom resources and allows users to make manual changes.
Client submits custom resources
Cluster is created.

A cluster upgrade workflow can then look like this:

Client changes cluster version label.
upgrade-operator reads base file configmaps according to the new version.
upgrade-operator starts replacing fields with new configuration from configmaps.

Last modified November 21, 2023: Update rendered RFCs (#176) (7f6b6e4)