Unified API for any alert from any source

shaharglazner

shaharglazner

Posted on November 26, 2023

Unified API for any alert from any source

TL;DR;

In this blog post, we will demonstrate the strength of a unified API in consolidating and managing alerts.

We will create a workflow that, upon an alert triggers, generates a ServiceNow ticket, enriches it with data from a production database, and notifies the stakeholders.

What's in it for you

This technical blog post will guide you on how to:

  1. Connect with any tool that generates alerts.
  2. Aggregate all alerts in a single interface.
  3. Enhance alerts with additional information from various sources.
  4. Automate processes based on these alerts.

Introduction

Before we delve into the technicalities, let's have a brief introduction.

What is Keep?

Keep is an open-source alert management and automation platform that integrates with your monitoring tools' alerts and provides an abstraction layer.

What's the problem Keep solves?

Despite a trend towards consolidation in the observability space, many organizations still utilize multiple tools to generate alerts.

The Grafana's Observability Survey from 2023 indicates that over 52% of companies employ more than six observability tools, often due to legacy systems, cost considerations, and specific functionalities.

Alerting terminology

  1. Providers - These are third-party tools that either trigger alerts, enrich alerts with data, or notify about alerts. Providers can include monitoring tools, databases, ticketing systems, or communication platforms.
  2. Alerts - Essentially, these are events or signals triggered by your monitoring tools.
  3. Workflows - Configurable automated processes that are initiated in response to alerts, designed to streamline your response to incidents by executing predefined actions, such as opening tickets, sending notifications, or initiating scripts.

Enough talking, let's get started

Install the CLI

# Clone Keep's repo and install Keep CLI using poetry
gh repo clone keephq/keep 
cd keep && poetry install
# or just install it using pip
pip install keepcli
# for other installation options (e.g. docker) see https://docs.keephq.dev/cli/installation
Enter fullscreen mode Exit fullscreen mode

Configure the CLI:

You can easily start using Keep's managed platform without any other prerequisites by running:

# This will launch an oauth2 flow that will create a tenant for you and set you up
keep auth login
Enter fullscreen mode Exit fullscreen mode

If you are using Keep's open source, run keep config to configure the CLI:

keep config
Enter your keep url [http://localhost:8080]: 
Enter your api key (leave blank for localhost) []: 
Config file created at .keep.yaml
Enter fullscreen mode Exit fullscreen mode

Verify everything is OK

keep whoami
Api key valid
{'tenant_id': 'XXXXXX-YYYY-ZZZZ-8b5a-939af9d7f63b'}
Enter fullscreen mode Exit fullscreen mode

1. Connect your tools

Now we are going to connect all the providers we need - Datadog to get the alerts, ServiceNow to create and track the tickets, MySQL to enrich alerts with production data, and Slack - to notify who is needed.

# no providers
keep provider list
+----+------+------+--------------+-------------------+
| ID | Type | Name | Installed by | Installation time |
+----+------+------+--------------+-------------------+
+----+------+------+--------------+-------------------+

# list available providers
keep provider list --available
+-----------------+-------------------------------------------------------+
|     Provider    |                      Description                      |
+-----------------+-------------------------------------------------------+
|       aks       |           Enrich alerts using data from AKS.          |
...
|      zabbix     |        Pull/Push alerts from Zabbix into Keep.        |
|     zenduty     |              Create incident in Zenduty.              |
+-----------------+-------------------------------------------------------+
Enter fullscreen mode Exit fullscreen mode

Now, let's connect datadog, MySQL, servicenow and slack

# For every provider, you can what authentication details needed
keep provider connect datadog --help
+----------+--------------+----------+-----------------+
| Provider | Config Param | Required |   Description   |
+----------+--------------+----------+-----------------+
| datadog  |   api_key    |   True   | Datadog Api Key |
|          |   app_key    |   True   | Datadog App Key |
+----------+--------------+----------+-----------------+
# Connect Slack
keep provider connect slack --provider-name slack-prod --webhook-url https://hooks.slack.com/services/T03PMXXXXX/B0656YYYY/yQ7zncdkuhzrGDWILtuZZZZZ
Provider slack-prod installed successfully
Provider id: 82a2c69d26e64d3f8ec81eb25d13f972

# Connect datadog
keep provider connect datadog --provider-name datadog-prod --api-key XXXXXXX --app-key YYYYYYY
Provider datadog-prod installed successfully
Provider id: e33c9960d862453dace829f6a8aecbcf

# Connect mysql
keep provider connect mysql --provider-name mysql-prod --username dbuser --password dbpass --host keepdb
Provider mysql-prod installed successfully
Provider id: d1c3a24621254565970ac6fab74697b7

# Connect Service Now
keep provider connect servicenow --provider-name servicenow-prod --service-now-base-url https://dev123456.service-now.com --username user --password password

# Verify the providers connected
keep provider list
+----------------------------------+------------+-----------------+-------------------+----------------------------+
|                ID                |    Type    |       Name      |    Installed by   |     Installation time      |
+----------------------------------+------------+-----------------+-------------------+----------------------------+
| e33c9960d862453dace829f6a8aecbcf |  datadog   |   datadog-prod  | apikey@keephq.dev | 2023-11-08T13:23:29.531775 |
| d1c3a24621254565970ac6fab74697b7 |   mysql    |    mysql-prod   | apikey@keephq.dev | 2023-11-08T13:26:12.249923 |
| 066f2a02326c41819c19d61ed6976b65 | servicenow | servicenow-prod | apikey@keephq.dev | 2023-11-08T13:28:35.930792 |
| 82a2c69d26e64d3f8ec81eb25d13f972 |   slack    |    slack-prod   | apikey@keephq.dev | 2023-11-08T13:19:00.539780 |
+----------------------------------+------------+-----------------+-------------------+----------------------------+
Enter fullscreen mode Exit fullscreen mode

If we go the the UI at http://localhost:3000, we can see that the providers are installed:

Keep UI

2. Review alerts

In this section, we are going to review the alerts, show how the alert looks in Keep, and demonstrate enrichment and filtering capabilities.

# list all alerts
keep alert list
+---------------------+------------------------------------------------------------------+--------------------------------+----------+-----------+-------------+---------+-------------+---------------------+
|          ID         |                           Fingerprint                            |              Name              | Severity |   Status  | Environment | Service |    Source   |    Last Received    |
+---------------------+------------------------------------------------------------------+--------------------------------+----------+-----------+-------------+---------+-------------+---------------------+
| 7308482322424796476 | 5bcafb4ea94749f36871a2e1169d5252ecfb1c589d7464bd8bf863cdeb76b864 |  Unauthorized access to API    |   high   | Recovered |  undefined  |   None  | ['datadog'] | 2023-11-13T15:32:38 |
| 7308433771057253905 | 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0 |           Test Alert           | critical | Recovered |  undefined  |   None  | ['datadog'] | 2023-11-13T14:44:24 |
...
more alerts
...
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+

# Filter by attribute
keep alert list --filter service=keep-api
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+
|     ID    |        Fingerprint         |            Name            | Severity | Status | Environment | Service  |    Source   |       Last Received       |
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+
| 120458754 | 5bcafb4ea94749f36871a2e1169d5252ecfb1c589d7464bd8bf863cdeb76b864  | 4xx-5xx Status Code Alert  |  medium  |   OK   |  production | keep-api | ['datadog'] | 2023-05-31T10:59:29+00:00 |
| 122655180 | 5bcafb4ea94749f36871a2e1169d5252ecfb1c389d7464bd8bf863cdeb76b864 | Unauthorized access to API |   high   |   OK   |  production | keep-api | ['datadog'] | 2023-11-08T13:29:31+00:00 |
+-----------+----------------------------+----------------------------+----------+--------+-------------+----------+-------------+---------------------------+


keep alert list --filter severity=critical
+-----------+-------------+------------+----------+--------+-------------+----------+-------------+---------------------------+
|     ID    | Fingerprint |    Name    | Severity | Status | Environment | Service  |    Source   |       Last Received       |
+-----------+-------------+------------+----------+--------+-------------+----------+-------------+---------------------------+
| 117493674 |  5bcafb4ea94749f36871a2e1169d5252ecfb1c589d7464bd8bf863cdeb76b862 | Prod Alert | critical |   OK   |  production | tal-test | ['datadog'] | 2023-09-13T11:20:25+00:00 |
+-----------+-------------+------------+----------+--------+-------------+----------+-------------+---------------------------+
Enter fullscreen mode Exit fullscreen mode

But what's even cooler is that we can filter on ANY alert attribute. Together with that Keep lets you enrich alerts with attributes from different sources, and you can achieve very cool things.

To put things into earth, let's say we created (we will of course automate this later) a ticket in our ticketing system.
We want to correlate the alert with the ticket, so we will be able to sync any further changes to the ticket.

We also want information about the customer that is stored on our customers' database.

We can get this information by running

select * from customers where customer_id = %customer_id%

+----+---------------------+------------+---------------------+--------------+---------------+-----------------------------+--------------------------------------+
| id | name                | tier       | email               | phone_number | address       | notes                       | customer_id                          |
+----+---------------------+------------+---------------------+--------------+---------------+-----------------------------+--------------------------------------+
|  1 | ABC Corporation     | Enterprise | abc@example.com     | 123-456-7890 | 123 Main St   | Customer since 2010         | 05bc71af-820a-11ee-b23f-0242ac110002 |
Enter fullscreen mode Exit fullscreen mode

Assuming we want to enrich the alert with customer name, customer email and ticket id:

keep alert enrich --fingerprint 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0 customer_id=1234 ticket_id=INC00001 customer_email=abd@example.com

# Now we can filter by responder:
keep alert list --filter ticket_id=INC00001
Enter fullscreen mode Exit fullscreen mode

3. Create workflows

So far, we connected the providers, reviewed our Datadog alerts, and enriched them with customer data and ServiceNow tickets.

Now we will wrap it up and automate the whole process using Keep Workflows.

Anatomy of a Workflow

Before diving into the CLI commands, let's review the workflow we are going to run. Keep Workflows are very similar to GitHub Action workflows. We didn't want to invent the wheel here, so you should be pretty familiar with the syntax.

The full workflow YAML can be found here.

workflow:
  # some metadata
  id: example-workflow
  description: Enriches the alert and create a ServiceNow ticket

  # The first part is the triggers. We want this workflow to execute only on critical alerts. We can filter on any alert attribute and also use regex.
  triggers:
    - type: alert
      filters:
        - key: severity
          value: critical
  steps:
  # The first step is to enrich the alert based on the SQL query. We want to add the customer name, email, and tier. 
  - name: get-more-details
    provider:
      type: mysql 
      config: " {{ providers.mysql-prod }} "
      # {{ alert.customer_id }} will be extracted on runtime
      with:
        query: "select * from customers where customer_id = {{ alert.customer_id }}"
        # Add those fields to the alert so we can use it
        enrich_alert:
          - key: customer_name
            value: results[0].name
          - key: customer_email
            value: results[0].email
          - key: customer_tier
            value: results[0].tier
  # second part - the actions 
  actions:
    # create the servicenow ticket
    - name: create-service-now-ticket
      # In case the alert already assigned a ticket id, don't create a new one (imagine the case when the alert was triggered and then resolved, we don't want another ticket for the resolved). Also, we want to create a ticket only for Enterprise customers.
      if: "not '{{ alert.ticket_id }}' and '{{ alert.tier }}' == 'Enterprise'"
      provider:
        type: servicenow
        config: " {{ providers.servicenow }} "
        with:
          table_name: INCIDENT
          payload:
            short_description: "{{ alert.name }} - {{ alert.description }} [created by Keep]"
            description: "{{ alert.description }}"
          # Enrich the alert with these fields so we will have correlation between the alert and the ticket
          enrich_alert:
            - key: ticket_type
              value: servicenow
            - key: ticket_id
              value: results.sys_id
            - key: ticket_url
              value: results.link
            - key: ticket_status
              value: results.stage
            - key: table_name
              value: "{{ alert.annotations.ticket_type }}"

Enter fullscreen mode Exit fullscreen mode

Now after we have the workflow, let's apply and run it.

# no workflows
keep workflow list
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------------------------+--------------------------+----------------+
|                  ID                  |             Workflow ID              |         Start Time         |                   Triggered By                  |          Status          | Execution Time |
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------------------------+--------------------------+----------------+
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------------------------+--------------------------+----------------+
# Apply it:
keep workflow apply -f workflow.yaml
Workflow examples/workflows/blogpost.yml applied successfully
Workflow id: 652fe84e-5239-425b-8271-40accb1af72f
Workflow revision: 1
keep workflow list
+--------------------------------------+-------------------+-----------------------------------+----------+--------------+----------------------------+----------------------------+----------------------------+-----------------------+
|                  ID                  |        Name       |            Description            | Revision |  Created By  |       Creation Time        |        Update Time         |    Last Execution Time     | Last Execution Status |
+--------------------------------------+-------------------+-----------------------------------+----------+--------------+----------------------------+----------------------------+----------------------------+-----------------------+
| 652fe84e-5239-425b-8271-40accb1af72f | blogpost-workflow | Enrich the alerts and open ticket |    10    |     keep     | 2023-11-12T08:08:43.585226 | 2023-11-12T14:34:07.544301 |            None            |          None         |
+--------------------------------------+-------------------+-----------------------------------+----------+--------------+----------------------------+----------------------------+----------------------------+-----------------------+
# Run it with alert as input 
keep workflow run --workflow-id blogpost-workflow --fingerprint 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0
Workflow blogpost-workflow run successfully
Workflow Run ID 33e71955-81f4-4118-9771-7b638f8c59b0

# Let's review the run
keep workflow runs logs 33e71955-81f4-4118-9771-7b638f8c59b0

+-----+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |  ID |         Timestamp          | Message                                                                                                                                                                                                                                                         |
+-----+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 733 | 2023-11-13T16:11:40.462000 | Running step get-more-details                                                                                                                                                                                                                                   |
| 734 | 2023-11-13T16:11:40.463000 | Action get-more-details evaluated to run! Reason: no condition, hence true.                                                                                                                                                                                     |
| 735 | 2023-11-13T16:11:40.524000 | Step get-more-details ran successfully                                                                                                                                                                                                                          |
| 736 | 2023-11-13T16:11:40.525000 | Running action create-service-now-ticket                                                                                                                                                                                                                        |
| 737 | 2023-11-13T16:11:40.525000 | Action create-service-now-ticket evaluated to run! Reason: no condition, hence true.                                                                                                                                                                            |
| 738 | 2023-11-13T16:11:44.784000 | Created ticket: {'result': {'parent': '', 'made_sla': 'true', 'caused_by': '', 'watch_list': '', 'upon_reject': 'cancel', 'sys_updated_on': '2023-11-13 14:11:41', 'child_incidents': '0', 'hold_reason': '', 'origin_table': '', 'task_effective_number': 'INC' |
| 740 | 2023-11-13T16:12:47.552000 | Enriching alert                                                                                                                                                                                                                                                 |
| 741 | 2023-11-13T16:12:47.572000 | Alert enriched                                                                                                                                                                                                                                                  |
| 742 | 2023-11-13T16:12:47.573000 | Action create-service-now-ticket ran successfully                                                                                                                                                                                                               |
| 743 | 2023-11-13T16:12:47.574000 | Finish to run workflow blogpost-workflow                                                                                                                                                                                                                        |
+-----+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

keep workflow runs list
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------+-------------+----------------------------------------------------+----------------+
|                  ID                  |             Workflow ID              |         Start Time         |          Triggered By         |    Status   | Error                                              | Execution Time |
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------+-------------+----------------------------------------------------+----------------+
| 103df0aa-d6be-4290-9938-1563f8005e55 | 75c7eba2-51dc-411d-b39c-a500c98e3893 | 2023-11-13T14:11:37.911898 | manually by apikey@keephq.dev |   success   | None                                               |       69       |
+--------------------------------------+--------------------------------------+----------------------------+-------------------------------+-------------+----------------------------------------------------+----------------+
# Let's make sure the alert was enriched with the ticket id
keep alert get 39f3a0d2cfe87885be0283c94ffd1cc35be1fd1bdd108c86ddf8e9db5d3bd7f0 | jq .ticket_id
"0f9982ec97667110beb0f0571153afa1"
# :)

Enter fullscreen mode Exit fullscreen mode

Voila! Now, whenever an alert is triggered, it will be automatically enriched with data from our production database, and appropriate actions will be taken. If the alert is of high or critical severity, a ServiceNow ticket will be created and the alert will be updated with the ticket ID. For less severe alerts, the relevant individual will simply be notified.

The alert has ticket assigned

Next steps

  1. Join our Slack at https://slack.keephq.dev and start talking about alerting and monitoring.
  2. ⭐️ our repo at https://github.com/keephq/keep
  3. Start playing with Keep (no credit card needed!) at https://platform.keephq.dev
  4. Missing any provider/feature? just open an issue at https://github.com/keephq/keep and we will add it ASAP (and of course contributions are welcome!)
💖 💪 🙅 🚩
shaharglazner
shaharglazner

Posted on November 26, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related