Metrics with Prometheus

Jun. 2020

Metrics with Prometheus

Introduction

Prometheus is, de facto, quite relevant to the area of monitoring metrics. As an open source project, Prometheus, has a large community and is actively supported. I recently introduced Prometheus to my cluster setup, and I will describe here how this was achieved on a DO kubernetes pool. In addition, I will introduce Alert Manager, which goes hand in hand with Prometheus and provides a method to creating alerts to issues.

Preparation and Installation

Prometheus

I recommend that a PersistentVolumeClaim be created for data persistence. The following would create a 50Gi block storage volume in the monitoring namespace.

# PrometheusPersistentVolumeClaim.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
  namespace: monitoring
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: do-block-storage

kubectl apply -f PrometheusPersistentVolumeClaim.yaml will create/apply the PersistentVolumeClaim. Next we need to create a ClusterRole and ClusterRoleBinding:

# PrometheusClusterRole.yaml
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs: ["get", "list", "watch"]
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics", "/healthz"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: default
    namespace: monitoring

We need to create these permissions as Prometheus can/will scrape pods, nodes, services, endpoints, and so on for metrics. kubectl apply -f PrometheusClusterRole.yaml is needed to create/apply the roles.

We will also implement a ConfigMap to store both the prometheus.yaml and the prometheus.rules. However, what is provided below is just an example and will certainly require changes for your situation.

# PrometheusConfigMap.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus
  labels:
    name: prometheus
  namespace: monitoring
data:
  prometheus.rules: |-
    # https://github.com/alerta/prometheus-config/blob/master/config/prometheus.rules.yml
    groups:
    - name: ./rules.conf
      rules:

      # heartbeat alert
      - alert: Heartbeat
        expr: vector(1)
        labels:
          severity: informational

      # service availability alert
      - alert: service_down
        expr: up == 0
        labels:
          service: Platform
          severity: major
          correlate: service_up,service_down
        annotations:
          description: Service {{ $labels.instance }} is unavailable.
          value: DOWN ({{ $value }})
          runbook: http://wiki.alerta.io/RunBook/{app}/Event/{alertname}

      - alert: service_up
        expr: up == 1
        labels:
          service: Platform
          severity: normal
          correlate: service_up,service_down
        annotations:
          description: Service {{ $labels.instance }} is available.
          value: UP ({{ $value }})

      # system load alert
      - alert: high_load
        expr: node_load1 > 0.5
        annotations:
          description: '{{ $labels.instance }} of job {{ $labels.job }} is under high load.'
          summary: Instance {{ $labels.instance }} under high load
          value: '{{ $value }}'

      # disk space alert (with resource=<instance>:<mountpoint> event=disk_space
      - alert: disk_space
        expr: (node_filesystem_size_bytes - node_filesystem_free_bytes) * 100 / node_filesystem_size_bytes > 5
        labels:
          instance: '{{ $labels.instance }}:{{ $labels.mountpoint }}'
        annotations:
          value: '{{ humanize $value }}%'

      # disk space alert (with resource=<instance> event=disk_util:<mountpoint>
      - alert: disk_util
        expr: (node_filesystem_size_bytes - node_filesystem_free_bytes) * 100 / node_filesystem_size_bytes > 5
        labels:
          instance: '{{ $labels.instance }}'
          event: '{alertname}:{{ $labels.mountpoint }}'  # python templating rendered by Alerta
        annotations:
          value: '{{ humanize $value }}%'

      # API request rate alert
      - alert: api_requests_high
        expr: rate(alerta_alerts_queries_count{instance="alerta:8080",job="alerta"}[5m]) > 5
        labels:
          service: Alerta,Platform
          severity: major
        annotations:
          description: API request rate of {{ $value | printf "%.1f" }} req/s is high (threshold 5 req/s)
          summary: API request rate high
          value: '{{ humanize $value }} req/s'
  prometheus.yml: |-
    # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus.yml
    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).

    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          # - alertmanager:9093

    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"

    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'

        # metrics_path defaults to '/metrics'
        # scheme defaults to 'http'.

        static_configs:
        - targets: ['localhost:9090']

kubectl apply -f PrometheusConfigMap.yaml will also apply/create the ConfigMap. prometheus.rules contains your alert rules and will function later on with Alert Manager. prometheus.yaml contains scraping rules amongst other things and plays and obvious role in what metrics are actually collected. Here you can also modify labels. As a more complex example:

...
  - job_name: 'kubernetes-nodes-cadvisor'
    scrape_interval: 10s
    scrape_timeout: 10s
    scheme: https  # remove if you want to scrape metrics on insecure port
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      # Only for Kubernetes ^1.7.3.
      # See: https://github.com/prometheus/prometheus/issues/2916
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    metric_relabel_configs:
      - action: replace
        source_labels: [id]
        regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
        target_label: rkt_container_name
        replacement: '${2}-${1}'
      - action: replace
        source_labels: [id]
        regex: '^/system\.slice/(.+)\.service$'
        target_label: systemd_service_name
        replacement: '${1}'

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name
...

Now we can deploy our Prometheus with kubectl apply -f PrometheusDeployment.yaml. As of today, the latest release is 2.19.0. Specifically in this deploy, retention is set to 60 days. Feel free to change. Also, the data path is set to /prometheus, which aligns with the PVC. We require an initcontainer to set the correct owner and group on the data folder.

# PrometheusDeployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:v2.19.0
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--storage.tsdb.retention.time=60d"
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: prometheus-config
              mountPath: /etc/prometheus
            - name: prometheus-storage
              mountPath: /prometheus
      volumes:
        - name: prometheus-config
          configMap:
            defaultMode: 420
            name: prometheus
        - name: prometheus-storage
          persistentVolumeClaim:
            claimName: prometheus-pvc
      initContainers:
        - name: fix-permissions
          image: busybox
          command: ["sh", "-c", "chown -R nobody:nogroup /prometheus"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: prometheus-storage
              mountPath: /prometheus

Finally, we create our, kubectl apply -f PrometheusService.yaml, service.

# PrometheusService.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port:   '9090'
spec:
  selector:
    app: prometheus
  ports:
    - name: "http"
      port: 9090

Here, we have the annotations prometheus.io/scrape: 'true' and prometheus.io/port: '9090'. This allows to enable on the scraping for this specific service. If we look at the more complex prometheus.yaml example:

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  action: keep
  regex: true

if scrape is not set to true it is dropped.

Alert Manager

We can look at the example provided my Prometheus:

# AlertManagerConfigmap.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monitoring
data:
  config.yml: |-
    global:
      # The smarthost and SMTP sender used for mail notifications.
      smtp_smarthost: 'localhost:25'
      smtp_from: 'alertmanager@example.org'
      smtp_auth_username: 'alertmanager'
      smtp_auth_password: 'password'

    # The directory from which notification templates are read.
    templates: 
    - '/etc/alertmanager/template/*.tmpl'

    # The root route on which each incoming alert enters.
    route:
      # The labels by which incoming alerts are grouped together. For example,
      # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
      # be batched into a single group.
      #
      # To aggregate by all possible labels use '...' as the sole label name.
      # This effectively disables aggregation entirely, passing through all
      # alerts as-is. This is unlikely to be what you want, unless you have
      # a very low alert volume or your upstream notification system performs
      # its own grouping. Example: group_by: [...]
      group_by: ['alertname', 'cluster', 'service']

      # When a new group of alerts is created by an incoming alert, wait at
      # least 'group_wait' to send the initial notification.
      # This way ensures that you get multiple alerts for the same group that start
      # firing shortly after another are batched together on the first 
      # notification.
      group_wait: 30s

      # When the first notification was sent, wait 'group_interval' to send a batch
      # of new alerts that started firing for that group.
      group_interval: 5m

      # If an alert has successfully been sent, wait 'repeat_interval' to
      # resend them.
      repeat_interval: 3h 

      # A default receiver
      receiver: team-X-mails

      # All the above attributes are inherited by all child routes and can 
      # overwritten on each.

      # The child route trees.
      routes:
      # This routes performs a regular expression match on alert labels to
      # catch alerts that are related to a list of services.
      - match_re:
          service: ^(foo1|foo2|baz)$
        receiver: team-X-mails
        # The service has a sub-route for critical alerts, any alerts
        # that do not match, i.e. severity != critical, fall-back to the
        # parent node and are sent to 'team-X-mails'
        routes:
        - match:
            severity: critical
          receiver: team-X-pager
      - match:
          service: files
        receiver: team-Y-mails

        routes:
        - match:
            severity: critical
          receiver: team-Y-pager

      # This route handles all alerts coming from a database service. If there's
      # no team to handle it, it defaults to the DB team.
      - match:
          service: database
        receiver: team-DB-pager
        # Also group alerts by affected database.
        group_by: [alertname, cluster, database]
        routes:
        - match:
            owner: team-X
          receiver: team-X-pager
          continue: true
        - match:
            owner: team-Y
          receiver: team-Y-pager


    # Inhibition rules allow to mute a set of alerts given that another alert is
    # firing.
    # We use this to mute any warning-level notifications if the same alert is 
    # already critical.
    inhibit_rules:
    - source_match:
        severity: 'critical'
      target_match:
        severity: 'warning'
      # Apply inhibition if the alertname is the same.
      # CAUTION: 
      #   If all label names listed in `equal` are missing 
      #   from both the source and target alerts,
      #   the inhibition rule will apply!
      equal: ['alertname', 'cluster', 'service']


    receivers:
    - name: 'team-X-mails'
      email_configs:
      - to: 'team-X+alerts@example.org'

    - name: 'team-X-pager'
      email_configs:
      - to: 'team-X+alerts-critical@example.org'
      pagerduty_configs:
      - service_key: <team-X-key>

    - name: 'team-Y-mails'
      email_configs:
      - to: 'team-Y+alerts@example.org'

    - name: 'team-Y-pager'
      pagerduty_configs:
      - service_key: <team-Y-key>

    - name: 'team-DB-pager'
      pagerduty_configs:
      - service_key: <team-DB-key>

After configuring Alert Manager we can simply run kubectl apply -f AlertManagerConfigmap.yaml to create the ConfigMap. However, from the example above, it may not be so obvious, but we can define reusable templates and we could also configure those by ConfigMap.

# AlertTemplateConfigMap.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: alertmanager-templates
  namespace: monitoring
data:
  default.tmpl: |
    {{ define "slack.myorg.text" }}https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}{{ end}}

We deploy Alert Manager kubectl apply -f AlertManagerDeployment.yaml; however, in this case we have not configured a PersistentVolumeClaim. It may be a good idea here to add a block storage. Refer to the Prometheus Deployment manifest and PersistentVolumeClaim on how to do so.

# AlertManagerDeployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      name: alertmanager
      labels:
        app: alertmanager
    spec:
      containers:
        - name: alertmanager
          image: prom/alertmanager:v0.20.0
          args:
            - "--config.file=/etc/alertmanager/config.yml"
            - "--storage.path=/alertmanager"
          ports:
            - name: alertmanager
              containerPort: 9093
          volumeMounts:
            - name: config-volume
              mountPath: /etc/alertmanager
            - name: templates-volume
              mountPath: /etc/alertmanager-templates
            - name: alertmanager
              mountPath: /alertmanager
      volumes:
        - name: config-volume
          configMap:
            name: alertmanager-config
        - name: templates-volume
          configMap:
            name: alertmanager-templates
        - name: alertmanager
          emptyDir: {}

Finally, we have our service kubectl apply -f AlertManagerService.yaml. We add annotations here for Prometheus, but they are not mandatory.

# AlertManagerService.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
  namespace: monitoring
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port:   '9093'
spec:
  selector:
    app: alertmanager
  ports:
    - name: "http"
      port: 9093

Final Words

Prometheus and Alert Manager require some reading to fully grasp how to configure properly. However, Prometheus is a powerful metrics monitor and once aligned with Grafana becomes an amazing group of tools to manage a system.

Back