Menu
Grafana Cloud

Introduction to Incident Response Management (IRM)

Grafana Cloud’s Incident Response Management (IRM) solution can be integrated to simplify your incident workflow so you can detect issues, escalate alerts, automate response processes, and identify actionable insights.

Products

Grafana Cloud has an expanding range of products and features that can be integrated alongside Grafana Cloud IRM to further enhance the effectiveness of your incident response workflows.

Note

Grafana Cloud IRM is a paid add-on, billed based on monthly active IRM users. For more details, refer to Understand your Grafana Cloud IRM invoice.

Grafana SLO (Service Level Objectives)

SLOs help teams measure the quality of their service, improve reliability, and make informed decisions. Use SLOs to collect data on the reliability of your systems over time and provide better service to your customers.

To learn more, refer to the Grafana SLO documentation.

Grafana Alerting

Grafana Alerting consolidates both Grafana-managed alerts and alerts from Mimir or Loki-compatible data sources in one place. Alerting can be easily set up to integrate with Grafana OnCall and Grafana Incident so you can improve your team’s ability to identify and resolve issues quickly.

To learn more, refer to the Grafana Alerting documentation.

Grafana OnCall

A developer-first on-call management solution that allows you to automate escalations, define alert rules, and integrate with existing alert sources and third-party tools. Use OnCall to create on-call schedules, notify the right teams, and declare an incident directly from a firing alert.

To learn more, refer to the Grafana OnCall documentation.

Grafana Incident

When an incident occurs, leverage the Incident tool to define roles, automate task assignment, and establish collaboration spaces. Benefit from integrations with your favorite tools, such as GitHub, Slack, and Google Suite.

To learn more, refer to the Grafana Incident documentation.

Grafana Machine Learning

Grafana IRM comes with an expanding range of generative AI and machine learning capabilities aimed to facilitate and inform proactive decision-making and incident response.

To learn more, refer to the Grafana Machine Learning documentation.

How do they work together?

When things go wrong, Grafana dashboards are the go-to place for teams to find answers in metrics, logs, and traces and the last place they look to put together a postmortem. Grafana sits at the heart of incident response management. With Grafana SLO, Grafana Alerting, Grafana Incident, Grafana OnCall, and Grafana Machine Learning on Grafana Cloud, integrating IRM into familiar workflows is more convenient than ever before.

Use Grafana IRM to proactively detect issues, keep your services healthy, and easily respond to incidents. Utilize machine learning features throughout your IRM workflows to create alerts, sift through metadata during an incident, and never miss a detail in your post-incident review with Incident Auto-Summary.

Detect, respond, learn diagram that illustrates how Grafana IRM products work together

Detect

SLOs are the key to measuring how reliable your service should be. By providing key reliability targets in the form of SLIs and SLOs, you set stakeholder expectations and ensure transparency. Combine using SLOs with Grafana Alerting to track and generate alerts and send notifications, providing an efficient way for engineers to monitor, respond, and triage issues within their services.

Standard alerts and alert notifications provide a lot of value as key indicators to issues during the triage process, providing engineers with the information they need to understand what is going on in their system or service. Paired with SLOs, an SLO alert notifies teams of an issue and provides runtime behavior to aid in the triage process.

Dashboards and insights help you monitor the status of your SLOs and alerts and quickly identify crucial operational details.

Respond

When an alert is generated, leverage Grafana OnCall to ensure a swift and effective response. Let your on-call rotations and automated escalations route alerts to the right teams and notify on-call engineers using their preferred notification methods. With an intuitive API and versatile integration capabilities, the developer-first workflow allows for highly customizable configurations tailored to any use case.

After an issue is identified, Grafana Incident makes it easy to create incidents from alerts and immediately begin your response process. Grafana Incident simplifies response and provides a centralized platform for managing incidents so you can promptly assign roles, utilize built-in task management, and automate routine tasks such as creating an incident channel or a virtual meeting space for collaboration. Integrations with familiar tools like GitHub, Slack, and Google Suite enhance communication and coordination throughout the incident resolution process.

With a structured approach to incident management that alleviates the stress and burden of incident response, responders can focus on resolving the issue without additional distractions, such as communicating with stakeholders.

Learn

After an incident is resolved, Grafana Incident provides a centralized platform for a thorough review of incident details. Pull in relevant information from GitHub issues, Slack messages, and other integrated tools to extract actionable insights. Then, leverage ML-powered auto-summary generation to alleviate the post-incident review burden, ensuring no core findings are overlooked.

Grafana’s analytics capabilities provide teams with deeper insights into incident data that enable you to extract valuable lessons and refine your overall IRM strategy. Dashboards and insights remain essential for monitoring the status of SLOs and alerts, enabling teams to quickly identify operational details and make informed decisions during the post-incident review process. Turning incidents into valuable experiences that drive continuous growth significantly enhances the overall resilience of your systems and services.