Promtail config management

As a Giant Swarm engineer, I want to be able to deploy configuration for ingesting logs to all our cluster in a unified way.

Context

Our logging infrastructure will rely on Promtail to ingest logs. Promtail act as agent which collect logs locally from within the cluster and ship them to an external component for storage and processing.

After deploying Promtail we need to control 3 elements :

The scrape_config, which describes how Promtail will discover targets to collect logs from.
The credentials, for Promtail to be able to ship logs to Loki via the remote write API.
The toggle flag, to enable/disable Promtail.

Architecture

Here is a schema describing how the Promtail configuration is managed :

promtail-config-management

We have 3 distinct source of truth :

scrape_config
The Promtail scrape config is defined in the logging operator repository and wired to each cluster at runtime by the operator.
credentials
The credentials to access Loki are generated by the logging operator at runtime and configured where needed :
- In the Promtail remote_write configuration: to push logs to Loki
- In the Loki remote_write configuration: to receive logs from Promtail
- In the Grafana Loki datasource: to read logs from Loki
toggle flag
The toggle flag is defined in the giantwarm/config repository, which give us the ability to enable and disable log ingestion per installation.

Limitation

Using the same scrape_config for all installations/clusters regardless of provider and version. This has the benefit for us to deploy update easily. Also this should not be a problem regarding target discovery, learning from Prometheus scrape config we should be able to define a clear interface.
toggle flag do not allow for individual cluster toggling, we can only enable/disable Promtail for all cluster of an installation. This can be improved later on.

Last modified November 21, 2023: Update rendered RFCs (#176) (7f6b6e4)