Importing EKS/AKS/GKE clusters to CAPI using crossplane

CreatedStateSummary
2023-10-26approvedIn order for Giant Swarm to import/adopt customer clusters on bring-your-own infrastructure, use Crossplane ObserveOnly functionality for resources to discover existing infrastructure of customers without managing it. Use clusters.x-k8s.io/managed-by: crossplane annotation to prevent CAPI from reconciling clusters. Do not rely on “paused” objects.

Introduction

Team Honeybadger has proposed a solution to importing existing clusters to CAPI via the use of crossplane in place of writing our own custom importer.

The purpose of this is to provide a mechanism whereby customers may come to Giant Swarm on a Bring Your Own Cluster (BYOC) basis such that we can support and manage apps on the customers cluster without compromising the integrity of their existing infrastructure deployment mechanisms. That is to say, customers operate and manage their own infrastructure, whilst we provide support and deployment strategies for our own applications to run inside the customers cluster.

Throughout this RFC, where the word import is used, this should be read as “unmanaged”, whereas the word adopt refers specifically to the full management and control of clusters.

Problem Statement

In order to onboard a customers cluster to a Giant Swarm management cluster, we require visibility to the customers cloud account from which we can execute a discovery of the cluster resources and reflect these back as CAPI resources inside the cluster.

For this to be effective, there are then three primary components to the import process.

  1. Security of the cloud account.

    We work on the basis that we only have read only access to see only components necessary for the import of the cluster.

  2. The capability of reading existing customer resources

  3. Reflection of customer resources in CAPI

For the purposes of this RFC, point 1, Security of the cloud account is considered out of scope and will be managed on a cloud by customer basis as part of the implementation.

The caveat to this is that we are under the assumption that only read access will be granted and that the customer may place strict limitations on our visibility of components inside their accounts.

Should any form of write permission be granted to the service account being used for importing the clusters, there is an enhanced risk of accidental adoption by CAPI operators in a manner that violates the contract of import and would result in the cluster being brought fully under the control of CAPI operators.

This is discussed in more detail in the section on Preventing CAPI takeover below.

Options

  1. We create our own custom importer based on CAPI that can create the kubeconfig and any CRs that may be required to deliver functionality for import to be successful

    Whilst it is feasibly possible to use CAPI directly to read and reflect cluster resources, this would require the development of a custom operator that can interact with the cloud architecture and directly import any required resources as well as generate the kubeconfig to be used by other components requiring access to the cluster, such as those required for the delivery of apps and app-bundles.

  2. We leverage an existing technology outside of CAPI to create the resources required.

Solution

In order to facilitate ease of operation, reduce overhead on teams and increase the probability of time to market, Honeybadger proposed facilitating the creation of resources using crossplane as an intermediate technology.

This was made possible by crossplane implementing the capability of ObserveOnly resources that allow for the discovery of infrastructure inside cloud accounts without requiring full ownership.

It is recognised from the team that ObserveOnly resource functionality is still considered alpha functionality on a per provider basis but testing has shown this to be stable enough to be used given our current requirements for the product.

Architecture

In order to maintain consistency between providers, we propose the creation of Crossplane Composite Resource Compositions that read information from the cloud account and place it into custom resources generated by crossplane.

Each cloud provider will work in similar ways and in order to achieve this, we require additional components to be installed into the management cluster.

  • crossplane core
  • crossplane-contrib/provider-kubernetes for the creation of resources inside the cluster

To enable management of resources inside each cloud, the importer will require the following additional providers depending on the cloud being managed through that management cluster.

AWSAzureGCP
provider-aws-ec2provider-azure-azureprovider-gcp-compute
provider-aws-eksprovider-azure-containerserviceprovider-gcp-container

The full PoC of how this may work for EKS clusters can be found in the repository crossplane-eks-capi-import

Overall the architecture between providers is expected to be similar across cloud providers but for the remainder of this RFC, the EKS architecture will be used.

eks-architecture

Composition and Definition

The crossplane composition for this importer follows the definition which allows for a minimum of the following to be defined

  • The region the cluster is built in
  • The cluster name
  • The nodegroup name

This information is then used to look up the cluster details inside the cloud provider to be fed to the CAPI resources being generated.

To achieve this lookup, we use crossplane resources for the cloud being searched

provider-aws-eksprovider-azure-containerserviceprovider-gcp-[container,compute]
Cluster (eks.aws.upbound.io)KubernetesCluster (containerservice.azure.upbound.io)Cluster (container.gcp.upbound.io)
NodeGroup (eks.aws.upbound.io)KubernetesClusterNodePool (containerservice.azure.upbound.io)NodeGroup (compute.gcp.upbound.io)
ClusterAuth (eks.aws.upbound.io)

During the generation process, there are certain details that may (excluding the use of composition functions) require the editing of specific blocks of the composition.

The reason for this is that there is no looping inside the composition and details returned from the cloud do not directly correlate to information required by CAPI.

The example of this inside EKS is the subnet groups required by the AWSManagedControlPlane resource which then require hard coded specification via patching.

- fromFieldPath: status.subnetIds[0]
  toFieldPath: spec.forProvider.manifest.spec.network.subnets[0].id
- fromFieldPath: status.subnetIds[1]
  toFieldPath: spec.forProvider.manifest.spec.network.subnets[1].id
- fromFieldPath: status.subnetIds[2]
  toFieldPath: spec.forProvider.manifest.spec.network.subnets[2].id
- fromFieldPath: status.subnetIds[3]
  toFieldPath: spec.forProvider.manifest.spec.network.subnets[3].id
- fromFieldPath: status.subnetIds[4]
  toFieldPath: spec.forProvider.manifest.spec.network.subnets[4].id
- fromFieldPath: status.subnetIds[5]
  toFieldPath: spec.forProvider.manifest.spec.network.subnets[5].id

This issue can to a large degree be solved through the use of composition functions and to support this, there is a proof of concept on how this can work placed at https://github.com/giantswarm/crossplane-fn-generate-subnets/.

To provide 100% accurate results for CAPA, not all fields can be recovered directly through composition functions, in particular the capability of identifying public vs private subnets.

This would be solvable in future iterations as the upstream issue on Querying and filtering for import and observe is resolved as it would enable the discovery of route tables in the same manner CAPA tracks this today.

Within the PoC, I “fake” this by proposing either a custom tag be added to the subnet, or by looking to see if mapPublicIpOnLaunch is set to true on the subnet, although I dislike the second option as not every public subnet would have this flag set. For the first option, the PoC offers that a tag of giantswarm.io/public be set although this is open to refinement.

Clusters with multiple nodepools

Crossplane requires a direct one to one mapping between Custom Resource and Cloud Infrastructure types. This makes it more difficult to directly reconcile multiple nodepools without building custom compositions for each variant of customer infrastructure we may encounter.

There are two potential solutions to this problem:

  1. We develop a library of common structures that we encounter. This will be built as part of discovery with a custom using boilerplate code to enhance the compositions for each customer.

    The drawback to this method is that it isn’t flexible and relies on a degree of human interaction in the initial stages as we start to develop the library

  2. We attempt to hook into crossplanes composition functions. Allow the nodeGroupName parameter to accept a list of existing names, then iterate a new resource into the composition based on the values specified in the list.

One drawback that is applicable to either solution is the ability to recognise when nodegroups are removed from the cluster. As we’re not managing the infrastructure, this would have an impact on the cluster resources, potentially causing failures both in crossplane and in CAPI as the values in the claim drift away from the nodegroups existing as real cloud infrastructure.

This can be mitigated by maintaining that we only need know about the primary nodegroup for the cluster, and work with the customer to understand the level of reflection that can be maintained for imported clusters.

The reflection of nodegroups is not strictly required for cluster import to be successful but is considered a “nice to have” for the customer as it presents an opportunity towards a “Single pane of glass” view of their cluster either as resources tracked inside the cluster, or through Happa where the resources are considered read only.

CAPI resources

Compositions implementing this RFC will create the following CAPI resources:

  • Cluster
  • MachinePool

Additionally cloud specific providers should be created

AWSAzureGCP
AWSManagedClusterAzureManagedClusterGCPManagedCluster
AWSManagedControlPlaneAzureManagedControlPlaneGCPManagedControlPlane
AWSManagedMachinePoolAzureManagedMachinePoolGCPManagedMachinePool

kubeconfig

The kubeconfig secret is normally generated by CAPI controllers and stored in the cluster namespace as <cluster_name>-kubeconfig

As CAPI should be running in a non-reconcilling mode for imported clusters, it is not known if this kubeconfig will be generated automatically by CAPI or whether it needs to be imported separately.

To work around potential limitations here, crossplane leverages the cluster-auth capabilities and stores this secret as <cluster_name>-kubeconfig-cluster-auth

This secret contains a slightly different structure from CAPI and places the kubeconfig at the secret data key data.kubeconfig.

This cannot be controlled via crossplane and in order to make this secret compatible with our own deployments via App-Platform we may need to enhance our App CR to accept a secret data key, or include this location in the list of key locations checked for a kubeconfig value.

Discovery

With the introduction of EKS/AKS/GKE clusters to CAPI management clusters, we need to be able to differentiate between the different types and usage.

This can be achieved through the use of the type of cluster defined as infrastructureRef and the presence or absence of the clusters.x-k8s.io/managed-by annotation:

Normal CAPICAPI ManagedCAPI Adopted
infrastructureRefAWSCluster, AzureCluster, GCPClusterAWSManagedCluster, AzureManagedCluster, GCPManagedClusterAWSManagedCluster, AzureManagedCluster, GCPManagedCluster
annotation--clusters.x-k8s.io/managed-by: crossplane

Preventing CAPI takeover

To prevent CAPI from importing and controlling the resources, each resource should be annotated with clusters.x-k8s.io/managed-by: crossplane.

This annotation is defined in the proposal on Externally Managed cluster infrastructure which defines:

An InfraCluster CR with the cluster.x-k8s.io/managed-by: "<name-of-system>" annotation.

The provider InfraCluster controller must:

  • Skip any reconciliation of the resource.
  • Not update the resource or its status in any way

The external management system must:

  • Populate all required fields within the InfraCluster spec to allow other CAPI components to continue as normal.
  • Adhere to all Cluster API contracts for infrastructure providers.
  • When the infrastructure is ready, set the appropriate status as is done by the provider controller today.

It is known that this annotation is not fully implemented for EKS clusters and this presents an immediate problem towards its use as the CAPA controller attempts to import the cluster, managing status’ and attempting to reconcile the cluster.

For the existing PoC, this is blocked via cloud IAM and can be additionally mitigated using spec.paused: true but not without certain risk and the preference would be for this to be implemented upstream.

Why not just use spec.paused or annotation cluster.x-k8s.io/paused

Whilst it is valid to include either of these on the resources, and they have the effect of ensuring CAPI does not reconcile the resource, there is a trade- off with the level of volatility brought to the platform as a result.

It is perfectly valid to pause a number of resources for any reason for an arbitrary amount of time and the property or annotation may be removed accidentally allowing the resources to immediately fall under CAPI control.

Once under control of CAPI, they cannot be removed from its control without considerable additional effort. (see What happens when a user converts an externally managed InfraCluster to a managed InfraCluster?)

It is better to be explicit via the use of the cluster.x-k8s.io/managed-by annotation which at least offers baked in clarity of its existence.

Last modified November 21, 2023: Update rendered RFCs (#176) (7f6b6e4)