Alerting

Humio has support for alerting and can be configured to send notifications to external systems or internally to another Humio repository about error conditions or other events. Every repository has its own set of alerts.

The alerting concept in Humio consists of two parts: Alerts and Notifiers.

Alerts

Alerts are standard Live Queries that run continuously, and trigger whenever there are one or more rows in the search result.

For example, you can configure an alarm to trigger whenever there are more than five status 500s in the access log.

#type=accesslog statuscode=500
| count(as=internal_server_errors)
| internal_server_errors > 5

If there are fewer than five events in the time window, the search will be an empty result and nothing will happen. If there are more than five events, a non-empty result will be returned and then the alert will trigger the notifier.

Types of alerts

You can think of Alerts as one of two types:

  • Single events that can affect one or more users’ experience with the product. Usually not something that should wake engineers up at night over, but could result in a ticket on your issue tracker.
  • Faulty state is when one or more components have reached a bad state and are unable to function properly. This usually affects most users and is something that should wake engineers up at night.

Creating alerts

The easiest way to create a new alert is by building up your query in the Search view.

  1. Don’t forget to set a live time window for the search.
  2. Select the Save As… > Alert option on the right.
  3. Give it a name, select a notifier, and finally a throttle period. The throttle period is the minimum time before the same alert will be triggered again.

Throttle period

The throttle period is used to control how often the alert can trigger. When the alert has triggered, it will not trigger again until after the throttle period has passed.

If you set throttling to Throttle all alert notifications, once the alert has triggered, it will not trigger again until after the throttle period has passed. If you set throttling to Throttle only alert notifications with identical field values, the alert will not trigger again until the throttle period has passed for events with the same values for the specified field, but it will trigger right away for events with different values.

Field-based throttling

You can use field-based throttling, if you want to only throttle certain results from your alert.

Example

If you have an alert that triggers when a machine is running out of disk space, you might want to throttle further alerts notifications for the same machine. However, you still want to receive an alert notification, if another machine also starts running out of disk space within the throttle period. Then you can decide to throttle on Throttle only alert notifications with identical field values, and select the field in your logs containing the name of the machine.

Say that you have such an alert, which is a search for a specific log event with a time window of 1 hour and a throttle period of 1 hour. At some point, machine1 runs out of disk space, which results in an event in the log, and the alert triggers on this event. The alert search will continue to run and find this event every time, but it will not trigger the alert, since it is throttled. After some time, machine2 also runs out of disk space. The alert search will now find both events, but will only trigger for machine2, since machine1 is throttled. After an hour, if machine1 is still out of disk space (and thus there are newer log events for this), the alert will trigger again for machine1.

The field you throttle on should be in the result of the query, not just in the events that are input to the query. If a result from the query does not contain the field, it will be treated as if it had an empty value for the field.

When an alert triggers, Humio stores the value of the throttle field in memory. To limit memory usage, there is a fixed limit on the number of values, which Humio stores per alert. Thus, if you select a throttle field that can assume more values than the limit, your alert might trigger more frequently than indicated by the given throttle period. For self-hosted installations, the limit can be altered with ALERT_MAX_THROTTLE_FIELD_VALUES_STORED in the Humio config.

Multiple fields

It is only possible to throttle on a single field. If you need to throttle on multiple fields, you can simply add a new field that concatenates these fields in the alert query.

Example

If your events have a service and a host field, and you want to throttle on the combination of these, you can add a new field in the alert query by adding the following line to it:

| serviceathost := concat([service, host]])

and then throttle on serviceathost.

Relation between throttle period and the time window

If your search finds specific events, that you want to trigger the alert on, for example specific errors, you want to set the throttle period to match the time window of the search. If you set the throttle period higher than the time window, you might miss events, and if you set it lower, you might get duplicate alerts.

If your search involves an aggregate, you might want to set the time window larger in some cases. For example, if you want to be notified every hour, whether there are more than 5 errors within a 4 hour search window. You probably do not want to set the time window smaller than the throttle period, as this means that there will be events that are never evaluated by the alert.

For notifiers like email and Slack, you want a higher throttle period since these triggers do not deduplicate.

Notifiers

A notifier is a module that sends notifications when alerts trigger.

Built-in notifiers

Humio currently supports the following notifier types:

Configuring a notifier

  • Go to Alerts > Notifiers > New Notifier.
  • Select a type of notifier from the Notifier Type dropdown list.

You must assign all notifiers a name.

For self-hosted installations, remember to set the PUBLIC_URL field in the Humio config. This will ensure that links in notifications will go to the correct URL.

Custom notifiers

If the built-in notifiers are not enough and you need to make something custom, Humio supports webhooks that allow you to call an external service with HTTP. You can add headers and customize the body of the message as seen below.

HTTP proxy

If Humio is set up to use an HTTP proxy, it will per default be used for all notifiers communicating via HTTP. It is possible to configure an individual notifier to not use the globally configured HTTP proxy.

Message templates

Humio uses Notifier templates to create the messages sent by notifiers. They currently apply to Slack, Email and Webhook notifiers. The template engine is a simple “search/replace” model, where the {…} marked placeholders are replaced with context-aware variables.

See the list for an explanation of the placeholders:

Placeholder Description
{field:$FIELD_NAME} Extracts the value of $FIELD_NAME from the alert result set. If there are multiple rows in the result, Humio uses the first result. Put field names with spaces in double quotes, {field"My Field"}.
{field_raw:$FIELD_NAME} Extracts the value of $FIELD_NAME from the alert result set without JSON escaping it. If there are multiple rows in the result, Humio uses the first result. Put field names with spaces in double quotes, {field_raw"My Field"}.
{alert_name} The user-made name of the alert fired.
{alert_description} A user-made description of the alert fired.
{alert_triggered_timestamp} The time at which the alert was triggered.
{alert_id} The id for the alert that was triggered.
{alert_notifier_id} The id of the notifier used to deliver this alert.
{event_count} The number of events in the query result.
{url} A URL to open Humio with the alert’s query.
{query_string} The query that triggered the alert.
{query_result_summary} A summary of data in the query result.
{query_time_start} The query start time (e.g. 10m)
{query_time_end} The query end time (e.g. now)
{query_time_interval} The time interval for the alert’s query. (e.g. 10m -> now)
{warnings} Any warnings that were generated by the query.
{repo_name} Query repository name.
{events_str} Events encoded as a string.
{events} Events encoded as a JSON array of event objects.
{events_html} Events encoded as an HTML table inside <table> tags. All fields from all events are shown as columns. If you want fewer fields, remove them in the alert query using e.g. table, select or drop.

It is also possible to use these placeholders in the name and description fields of your alert. This is useful, if you want to use the same notifier for multiple alerts, and you want different templates for the different alerts. As an example, you can use different {field:$FIELD_NAME} placeholders in the name for the alerts to extract the value of different fields, and then use {alert_name} in the notifier to get the alert names with the placeholders replaced.

You can also use this feature to save yourself from having to write near-identical alerts, if you use a notifier where you cannot specify the message template. This is currently the notifiers OpsGenie, PagerDuty and VictorOps. These all use the alert name as part of the message. Also, the default email subject and email template for the Email notifier uses the alert name.

The {field:$FIELD_NAME} placeholder will only extract the value of the field from the first event from the alert query.