Kubernetes Monitoring

Visualize and alert on your Kubernetes cluster in minutes, not days.

Why use Kubernetes Monitoring in Grafana Cloud?

Accelerate time to value

Reduce deployment, setup, and troubleshooting time with this ready-to-use monitoring tool that only requires running a few CLI commands or adding some small changes to your Helm chart.

Identify root causes faster

Drill down through your infrastructure with the cluster navigation view to identify and resolve issues, without the hassle of switching between different windows and monitoring tools.

Reduce costs

Efficiency and monitoring visualizations deliver comprehensive insights into your spending, enabling data-driven decisions about resource allocation, scaling strategies, and tech investments.

Opinionated metrics and alerts

Access kube-state-metrics and alerting rules needed to effectively monitor Kubernetes clusters.

A curated set of metrics to avoid cardinality explosion
Community-built alerting standards

Learn more

Watch a demo

Cost management

Gain better insight into your Kubernetes costs, spending trends, and potential savings with the cost monitoring feature, which is based on the open source project OpenCost.

Usage and cost attribution on every component level
Break down costs and resource allocation across cloud providers
Visualize cost trends and projected savings
Organize Kubernetes costs by resource type
Get savings suggestions based on your resource usage

Learn more

Watch a demo

High-priority issues at a glance

Instantly identify fleet issues with an overall snapshot of all your infrastructure components that have breached preset thresholds for:

Node CPU and memory usage
Node disk and persistent volume capacity
Pods in a non-running state and the cause for this state

Learn more

Full visibility, from Kubernetes clusters to containers

Get a full view of Kubernetes clusters, then drill down to see specific container-level information.

Cost and resource usage attribution for every infrastructure level
Color-coded resource usage visualizations and icons lead to faster issue identification and resolution
Side-by-side peak vs. average resource efficiency comparisons

Learn more

Optimize, analyze, and anticipate your resource usage

Instantly analyze CPU and memory usage trends. Correlate actual usage with limits and requests. Proactively identify issues to achieve optimized resource management.

Detailed insights at every infrastructure level with historical trends
Resource forecasting powered by machine learning
Automated pod CPU outlier detection

Learn more

Watch a demo

Kubernetes container insights

Use the cluster-to-container navigation for instant container clarity.

Sizing recommendations
Access to historical data to pinpoint CPU throttling and restarts

Learn more

Watch a demo

Easy deployment

Deploy the Helm chart on any of the major cloud-managed Kubernetes services and Kubernetes distributions.

Choose which features to enable
Get Helm installation instructions tailored to your needs

Learn more

Instant Prometheus-correlated logs

Prometheus’ and Grafana Loki’s metadata keep the same labels for your Kubernetes cluster, so accessing correlated Kubernetes metrics and logs couldn’t be easier.

Learn more

It’s easy to get started

For full implementation details and best practices

See the guide

1

Sign up

Create your free Grafana Cloud account.

2

Connect your data

With a few clicks, set up default configurations for prebuilt dashboards and alerting rules.

3

Deploy

Data will stream from your cluster into Grafana Cloud.

The Kubernetes Monitoring integration on Grafana Cloud enables our engineers to have native monitoring. No longer do they have to reach out to our SRE team. Instead, they just click a button on the Grafana Cloud integrations tab, navigate to the out-of-the-box dashboard, and see all the information — CPU usage, logs, metrics — they need to solve the problem themselves. It’s so simple, helps us spot issues fast, and saves us all a lot of custom development time.

James Wojewoda

Lead Site Reliability Engineer | Beeswax

Kubernetes metrics and alerting rules

The Kubernetes Monitoring solution in Grafana Cloud ingests a set of default metrics at a 60-second scrape interval. The set of alerting rules helps with setting up and running alerts for clusters and their workloads.

Read more about Kubernetes metrics and alerting rules

Key alerting rules included

*scrollable

KubeNodeNotReady

KubeNodeUnreachable

KubeletTooManyPods

KubeNodeReadinessFlapping

KubeletPlegDurationHigh

KubeletPodStartUpLatencyHigh

KubeletClientCertificateExpiration

KubeletServerCertificateExpiration

KubeletClientCertificateRenewalErrors

KubeletServerCertificateRenewalErrors

KubeletDown

KubeVersionMismatch

KubeClientErrors

KubeCPUOvercommit

KubeMemoryOvercommit

KubeCPUQuotaOvercommit

KubeMemoryQuotaOvercommit

KubeQuotaAlmostFull

KubeQuotaFullyUsed

KubeQuotaExceeded

CPUThrottlingHigh

KubePodCrashLooping

KubePodNotRead

KubeDeploymentGenerationMismatch

KubeDeploymentReplicasMismatch

KubeStatefulSetReplicasMismatch

KubeStatefulSetGenerationMismatch

KubeStatefulSetUpdateNotRolledOut

KubeDaemonSetRolloutStuck

KubeContainerWaiting

KubeDaemonSetNotScheduled

KubeDaemonSetMisScheduled

KubeJobCompletion

KubeJobFailed

KubeHpaReplicasMismatch

KubeHpaMaxedOut

Key metrics included

*scrollable

cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits

cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests

cluster:namespace:pod_memory:active:kube_pod_container_resource_limits

cluster:namespace:pod_memory:active:kube_pod_container_resource_requests

container_cpu_cfs_periods_total

container_cpu_cfs_throttled_periods_total

container_cpu_usage_seconds_total

container_fs_reads_bytes_total

container_fs_reads_total

container_fs_writes_bytes_total

container_fs_writes_total

container_memory_cache

container_memory_rss

container_memory_swap

container_memory_working_set_bytes

container_network_receive_bytes_total

container_network_receive_packets_dropped_total

container_network_receive_packets_total

container_network_transmit_bytes_total

container_network_transmit_packets_dropped_total

container_network_transmit_packets_total

go_goroutines

kube_daemonset_status_current_number_scheduled

kube_daemonset_status_desired_number_scheduled

kube_daemonset_status_number_available

kube_daemonset_status_number_misscheduled

kube_daemonset_updated_number_scheduled

kube_deployment_metadata_generation

kube_deployment_spec_replicas

kube_deployment_status_observed_generation

kube_deployment_status_replicas_available

kube_deployment_status_replicas_updated

kube_horizontalpodautoscaler_spec_max_replicas

kube_horizontalpodautoscaler_spec_min_replicas

kube_horizontalpodautoscaler_status_current_replicas

kube_horizontalpodautoscaler_status_desired_replicas

kube_job_failed

kube_job_spec_completions

kube_job_status_succeeded

kube_namespace_created

kube_node_info

kube_node_spec_taint

kube_node_status_allocatable

kube_node_status_capacity

kube_node_status_condition

kube_pod_container_resource_limits

kube_pod_container_resource_requests

kube_pod_container_status_waiting_reason

kube_pod_info

kube_pod_owner

kube_pod_status_phase

kube_replicaset_owner

kube_resourcequota

kube_statefulset_metadata_generation

kube_statefulset_replicas

kube_statefulset_status_current_revision

kube_statefulset_status_observed_generation

kube_statefulset_status_replicas

kube_statefulset_status_replicas_ready

kube_statefulset_status_replicas_updated

kube_statefulset_status_update_revision

kubelet_certificate_manager_client_expiration_renew_errors

kubelet_certificate_manager_client_ttl_seconds

kubelet_certificate_manager_server_ttl_seconds

kubelet_cgroup_manager_duration_seconds_bucket

kubelet_cgroup_manager_duration_seconds_count

kubelet_node_config_error

kubelet_node_name

kubelet_pleg_relist_duration_seconds_bucket

kubelet_pleg_relist_duration_seconds_count

kubelet_pleg_relist_interval_seconds_bucket

kubelet_pod_start_duration_seconds_count

kubelet_pod_worker_duration_seconds_bucket

kubelet_pod_worker_duration_seconds_count

kubelet_running_container_count

kubelet_running_containers

kubelet_running_pod_count

kubelet_running_pods

kubelet_runtime_operations_duration_seconds_bucket

kubelet_runtime_operations_errors_total

kubelet_runtime_operations_total

kubelet_server_expiration_renew_errors

kubelet_volume_stats_available_bytes

kubelet_volume_stats_capacity_bytes

kubelet_volume_stats_inodes

kubelet_volume_stats_inodes_used

kubernetes_build_info

machine_memory_bytes

namespace_cpu:kube_pod_container_resource_limits:sum

namespace_cpu:kube_pod_container_resource_requests:sum

namespace_memory:kube_pod_container_resource_limits:sum

namespace_memory:kube_pod_container_resource_requests:sum

namespace_workload_pod

namespace_workload_pod:kube_pod_owner:relabel

node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate

node_namespace_pod_container:container_memory_cache

node_namespace_pod_container:container_memory_rss

node_namespace_pod_container:container_memory_swap

node_namespace_pod_container:container_memory_working_set_bytes

node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile

process_cpu_seconds_total

process_resident_memory_bytes

rest_client_request_duration_seconds_bucket

rest_client_requests_total

storage_operation_duration_seconds_bucket

storage_operation_duration_seconds_count

storage_operation_errors_total

volume_manager_total_volumes

Helpful resources

Video

GrafanaLive: Improving Observability of the Beeswax platform with Grafana Cloud

3 min read

Introducing Kubernetes Monitoring in Grafana Cloud

10 min read

With multiplying microservices running on Kubernetes, PayIt turned to Grafana and Prometheus for observability at cloud native scale

3 min read

Introducing Kubernetes Monitoring in Grafana Cloud

4 min read

Five tricks for logging at scale in a Kubernetes environment with Grafana Loki

5 min read

5 key benefits of Kubernetes monitoring

5 min read

How to monitor the health and resource usage of Kubernetes nodes in Grafana Cloud

4 min read

Introducing instant Kubernetes logging with Kubernetes Monitoring in Grafana Cloud

11 min read

How to monitor Kubernetes clusters with the Prometheus Operator

10 min read

How to use Kubernetes events for effective alerting and monitoring

8 min read

Monitoring Kubernetes layers: Key metrics to know

9 min read

Distributed tracing in Kubernetes apps: What you need to know

11 min read

A beginner's guide to Kubernetes application monitoring

14 min read

How to collect and query Kubernetes logs with Grafana Loki, Grafana, and Grafana Agent

12 min read

How to use Argo CD to configure Kubernetes Monitoring in Grafana Cloud

4 min read

How to migrate existing Grafana dashboards and alerts into Kubernetes Monitoring in Grafana Cloud

5 min read

How to optimize resource utilization with Kubernetes Monitoring for Grafana Cloud

5 min read

Kubernetes alerting: Simplify anomaly detection in Kubernetes clusters with Grafana Cloud

7 min read

The case for Kubernetes resource limits: predictability vs. efficiency

Webinar

Intro to Kubernetes monitoring in Grafana Cloud

Video

GrafanaLive: Improving Observability of the Beeswax platform with Grafana Cloud

Ready to get started with Kubernetes Monitoring?

To use Kubernetes Monitoring, you have three options in Grafana Cloud. All plans come with prebuilt dashboards plus metrics and alerting rules.

Cloud Free

Perfect for early stage and small teams. Free forever

Up to 3 active users.

10k metrics, 50GB logs, & 50GB traces.

Features include:

14-day retention
Grafana OnCall
Synthetic Monitoring
Grafana Alerting

Create free account

Cloud Pro

Perfect for growing teams at only $8/mo + usage.

Includes all features in Free, plus:

Retention: 13 months for metrics; 30 days for logs & traces
Grafana Machine Learning
SSO/SAML/LDAP
Data source permissions
Cloud SLA and support
Query caching
Reporting and export
Optional add-on Enterprise plugin

Start 14-day trial

Cloud Advanced

Perfect for global teams. Custom pricing

Includes all features in Pro, plus:

Customized retention
Access to all Enterprise plugins
Audit logging
Enhanced LDAP
Team sync
Custom branding
Dedicated technical account management

See pricing details

Feedback

Kubernetes Monitoring

Why use Kubernetes Monitoring in Grafana Cloud?

Accelerate time to value

Identify root causes faster

Reduce costs

Opinionated metrics and alerts

Cost management

High-priority issues at a glance

Full visibility, from Kubernetes clusters to containers

Optimize, analyze, and anticipate your resource usage

Kubernetes container insights

Easy deployment

Instant Prometheus-correlated logs

It’s easy to get started

1

Sign up

2

Connect your data

3

Deploy

Kubernetes metrics and alerting rules

Key alerting rules included

Key metrics included

Helpful resources

Ready to get started with Kubernetes Monitoring?

Cloud Free

Cloud Pro

Cloud Advanced