024: Monitoring Kubernetes with Prometheus Skip to main content

024: Monitoring Kubernetes with Prometheus

Provisioning, operating and maintaining distributed apps requires fine grained capacity planning, objective definition and consumption monitoring.

The daily mood

We had an interesting meeting with the team about API security, then I had my 1-1 with my Manager. He suggested that I move from tutorial applications to real world applications when evaluating Helm chart composition tools like Helmfile, Kustomize, and others. Which I will probably start with tomorrow.

In the mean time I am also ramping up on observability, or the practice of designing systems and applications with the perception that someone needs to watch them.​ As a growing provider of Platform services, our R&D organization must obviously care about monitoring resource metrics and component logs, but it's easier to say than to do. We define 3 levels of observability:
  1. Foundation supports better operations
  2. Core supports better usability
  3. Intelligence supports better automation
This post talks about monitoring in general, then transits to technical monitoring in Kubernetes and finally take a look at Prometheus, a general purpose monitoring system originally inspired by Google Borgmon and developped by SoundCloud. It is independently maintained by the Cloud Native Computing Foundation (CNCF) since 2016.

What is monitoring?
  • Why monitor:
    • Prevent issues, better troubleshoot and react if they occur
    • Improve service performance and reliability
    • Optimize capacity and reduce costs
  • Types of monitoring
    • White-box = with knowledge of the monitored system/component
    • Black-box = vice versa.
  • Hierarchy of monitoring
  • Levels of maturity:
    • Logging: Execution information (trace) in semi-structured text format 
    • Observability: Ability for a component to expose its internal state (cf. Observer Pattern in Event-Driven Architecture) in the form of timestamped measures (metrics) 
    • Monitoring: Collection, storage and analysis of logs and/or metrics
    • Alerting: Rule based behavioural detection, ticketing, notification, call-to-action
    • AI: ML prediction, prescription
In this article we will focus on System and service monitoring (SSM) with alerting.

What to monitor

The USE method is a most accademical way of monitoring infrastructure components. This is what traditional monitoring solutions like Nagios and Ganglia focus on.

 Resources All physical resources (cpu, mem, disk, network)
 Utilization Avg. time the resource is busy (%) 
 Saturation Degree of load the resource cannot serve (%)
 Errors Failed (events/sec)
Source: Brendan D. Gregg (currently working at Netflix)

The Four Golden Signals is a white-box monitoring approach defining the minimum metrics that a DevOps team should collect from a resilient system to be able to maintain its reliability. Here is a comprehensive article on how to collect those metrics per application type (Load Balancer, Web server, Database etc.).

 Latency Time (sec/event)
 Traffic Handled (event/sec)
 Errors Failed (event/sec)
 Saturation Resource load (% system) 

The RED method is a black-box monitoring approach and kind of subset of the Four Golden Signals. It is service consumer centric and therefore convenient for microservices architecture. In a future post, we might talk about how Spring Boot Acuator module exposes JMX and HTTP endpoints to enable Microservices monitoring and auditing.

 Rate Handled (requests/sec)
 Error Failed (requests/sec)
 Duration Time (sec/request)
Source: Tom Wilkie (currently working at Weaveworks)

Monitoring in Kubernetes

Originally, Kubernetes nodes were monitored via a project called Heapster which followed a Push approach to a time-series database (TSDB) and is now end-of-life. In 2017, the Kubernetes sig instrumentation group defined 2 new APIs: resource API and custom metrics API. Standard implementations of the corresponding APIs are listed here. Of course it is also possible to process external metrics beside these APIs.

The difference between resource metrics and custom metrics is that resource metrics are pre-existing (ex. CPU) and pre-aggregated (ex. Avg.), whereas custom metrics need to be created e.g. from resource objects by the logical layer. A typical use-case for custom metrics is implementing of a controller for the Kubernetes Horizontal Pod Autoscaler (HPA).

Metric colleciton

The canonical implementation of the resource API is the metrics-server, an observer collecting metrics from the kubelet stats via Pull approach API, aggregating and storing them on the cluster. Metrics server is accessible via REST API which allows for kubectl top, Kubernetes Dashboard and other third-party client tools like for example k9s and kube-capacity to easily display compute (CPU) and memory (MEM) utilisation at both node and resource (i.e. pod) level.

$ kubectl top node
NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
prec-5520       3094m        38%    24481Mi         76%

K9s header with metrics-server activated

$ ./kube-capacity --util --sort cpu.util
NODE       CPU REQUESTS  CPU LIMITS    CPU UTIL    MEMORY REQUESTS  MEMORY LIMITS   MEMORY UTIL
prec-5520  5405m (67%)   18800m (235%) 2773m (34%) 15669Mi (49%)    23937Mi (74%)   24557Mi (76%)

As stated by the metrics-server documentation, current version is limited to 2 "advanced" use-cases:
  • Horizontal Pod Autoscaler (HPA)
  • Scheduler
There are plans to extend the Metrics server capabilities in the coming future, like for ex. with a support of kubectl top view or time series in Kubernetes dashboard, as well as custom application metrics. In fact Heapster is not developped any more and metrics-server is not yet mature for the enterprise. Go for something else.

In addition to the Metrics server, Prometheus already adopted the efficient practice of scraping HTTP endpoints instead of relying on agents by the time Heapster was the default, and therefore became large adoption accross the Kubernetes industry.

Custom metrics

Because of its popularity, Prometheus also had the first implementation k8s-prometheus-adapter of the Kubernetes custom metrics API, making it the solution of choice for any Kubernetes monitoring requirement.

Prometheus setup

Prometheus is a popular monitoring tool for pulling, storing and accessing metrics as time-series data. It can be setup via different ways, like for example locally, but of course it makes especially sense to set-it-up directly inside the monitored cluster. There is an offical Prometheus Helm chart available, and a dedicated microk8s add-on which basically deploys the Prometheus Operator (owned by RedHat sinced they acquired CoreOS in 2018) along with a bunch of example rules, as well as dashbaords and alerts which I believe to be forked from the Mixin project. With this, Prometheus follows default configuration and recommendation to watch all namespaces and applications. You can obviously change that as per the documentation.
$ microk8s.enable prometheus
$ kubectl get pods -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   0          6m37s
grafana-7789c44cc7-j8t8z               1/1     Running   0          6m42s
kube-state-metrics-78c549dd89-khgcp    4/4     Running   0          6m28s
node-exporter-svzc2                    2/2     Running   0          6m43s
prometheus-adapter-644b448b48-zkfvl    1/1     Running   0          6m43s
prometheus-k8s-0                       3/3     Running   1          6m29s
prometheus-operator-7695b59fb8-v9qds   1/1     Running   0          6m43s
Storage

Prometheus comes with its own single-node storage by default, an on-disk times series database (TSDB). Support for storage scalability and/or durability can be achieved via the numerous client libraries and third-party integration connectors to long-term storages such as external databases and file systems.

Access

Prometheus basically offers 3 points of access to monitoring data:
  • Browser-friendly endpoint on the prometheus server (simplistic webapp for querying metrics, rules, and alerts using own query language PromQL)
  • Console templates (scripting library supporting Go templates)
  • Grafana (standard datasource)
The webbapp is exposed unsecurely at port 9090.
sensible-browser http://$(kubectl -n monitoring describe pod prometheus-k8s-0 | grep IP: | awk '{print $2}'):9090/graph

Prometheus dashboard

In microk8s, Grafana runs on port 3000. It automatically provisions a user admin/admin as well as a number of standard Prometheus dashboards. 
sensible-browser http://`kubectl -n monitoring describe pod $(kubectl -n monitoring get pods | grep grafana | cut -d' ' -f1) | grep ^IP: | awk '{print $2}'`:3000
Dashboard K8s / USE Method / Node

Alterting

Prometheus comes with a notification service called alertmanager that is executing rules described in YAML format. For example, you could define an alerting rule for sending a notification via E-Mail or PagerDuty when a disk is getting 90% full.

amtool is a command line utility for interacting with alertmanager. You may install it locally from release or use directly the one on your cluster:
$ kubectl exec alertmanager-main-0 -- sh -c "/bin/amtool alert --alertmanager.url=http://localhost:9093"
Defaulting container name to alertmanager.
Use 'kubectl describe pod/alertmanager-main-0 -n monitoring' to see all of the containers in this pod.
Alertname                        Starts At                Summary  
DeadMansSwitch                   2020-06-08 18:10:45 UTC           
KubeClientCertificateExpiration  2020-06-08 18:11:47 UTC           
KubeClientCertificateExpiration  2020-06-08 18:11:47 UTC           
KubeCPUOvercommit                2020-06-08 18:16:47 UTC           
KubeMemOvercommit                2020-06-08 18:16:47 UTC           
TargetDown                       2020-06-08 18:21:15 UTC           
KubeletDown                      2020-06-08 18:25:47 UTC           
KubeControllerManagerDown        2020-06-08 18:25:47 UTC           
KubeSchedulerDown                2020-06-08 18:25:47 UTC
Since alerting rule might have to change frequently, Promgen web-UI is potentially a good solution for handling corresponding configurations.

Prometheus metrics architecture overview and custom pipelines


Alternatives to Prometheus

Most established provider is certainly InfluxData. They maintain different open-source projects like Telgraf for collecting and forwarding metrics which are then made available both at rest in their Timeseries Database InfluxDB and in motion through their streaming engine Kapacitor.

Source: InfluxData

See also

Comments