Humio has support for alerting and can be configured to send messages to external systems or internally to another Humio repository about error conditions or other events. Every repository has its own set of alerts.

Incidentally, you can disable all alerts from running by changing the ENABLE_ALERTS environment variable in Humio configuration to false.

Alerts are standard Live Queries that run continuously, and trigger whenever there are one or more rows in the search result. For example, you can configure an alarm to trigger whenever there are more than five status 500s in the access log. You’d do that like so:

#type=accesslog statuscode=500
| count(as=internal_server_errors)
| internal_server_errors > 5

If there are fewer than five events in the time window, the search will be an empty result and nothing will happen. If there are more than five events, a non-empty result will be returned and then the alert will trigger the associated actions.

You can think of Alerts as one of two types:

  • Single events that can affect one or more users’ experience with the product. Usually not something that should wake engineers up at night over, but could result in a ticket on your issue tracker.
  • Faulty state is when one or more components have reached a bad state and are unable to function properly. This usually affects most users and is something that should wake engineers up at night.

Creating Alerts

The easiest way to create a new alert is by building up your query in the Search view.

  1. Don’t forget to set a live time window for the search.
  2. Select the Save As… → Alert option on the right.
  3. Give it a name, select a action, and finally a throttle period. The throttle period is the minimum time before the same alert will be triggered again.

Throttle Period

The throttle period is used to control how often the alert can trigger. When the alert has triggered, it will not trigger again until after the throttle period has passed.

If you set throttling to Throttle all actions, once the alert has triggered, it will not trigger again until after the throttle period has passed. If you set throttling to Throttle only events with identical field values, the alert will not trigger again until the throttle period has passed for events with the same values for the specified field, but it will trigger right away for events with different values.

Field-Based Throttling

You can use field-based throttling, if you want to only throttle certain results from your alert.

For example, if you have an alert that triggers when a machine is running out of disk space, you might want to throttle further messages for the same machine. However, you still want to receive a message, if another machine also starts running out of disk space within the throttle period. Then you can decide to throttle on Throttle only events with identical field values, and select the field in your logs containing the name of the machine.

Say that you have such an alert, which is a search for a specific log event with a time window of 1 hour and a throttle period of 1 hour. At some point, machine1 runs out of disk space, which results in an event in the log, and the alert triggers on this event. The alert search will continue to run and find this event every time, but it will not trigger the alert, since it is throttled. After some time, machine2 also runs out of disk space. The alert search will now find both events, but will only trigger for machine2, since machine1 is throttled. After an hour, if machine1 is still out of disk space (and thus there are newer log events for this), the alert will trigger again for machine1.

The field you throttle on should be in the result of the query, not just in the events that are input to the query. If a result from the query does not contain the field, it will be treated as if it had an empty value for the field.

When an alert triggers, Humio stores the value of the throttle field in memory. To limit memory usage, there is a fixed limit on the number of values, which Humio stores per alert. Thus, if you select a throttle field that can assume more values than the limit, your alert might trigger more frequently than indicated by the given throttle period. For self-hosted installations, the limit can be altered with ALERT_MAX_THROTTLE_FIELD_VALUES_STORED in the Humio configuration.

Multiple Fields

It is only possible to throttle on a single field. If you need to throttle on multiple fields, you can simply add a new field that concatenates these fields in the alert query.

For example, if your events have a service and a host field, and you want to throttle on the combination of these, you can add a new field in the alert query by adding the following line to it:

| serviceathost := concat([service, host]])

and then throttle on serviceathost.

Relation between throttle period and the time window

If your search finds specific events, that you want to trigger the alert on, for example specific errors, you want to set the throttle period to match the time window of the search. If you set the throttle period higher than the time window, you might miss events, and if you set it lower, you might get duplicate alerts.

If your search involves an aggregate, you might want to set the time window larger in some cases. For example, if you want to be notified every hour, whether there are more than 5 errors within a 4 hour search window. You probably do not want to set the time window smaller than the throttle period, as this means that there will be events that are never evaluated by the alert. For actions like email and Slack, you want a higher throttle period since these triggers do not deduplicate.

Errors & Warnings

If there is an error when running an alert, the error will be logged and also set on the alert, so that it can be seen on the alerts overview page. If an alert has multiple actions attached, and some of the actions fail to run, this will be logged, but no error will be set on the alert. The alert will be considered to have fired, and will be throttled as normal. It will only be considered an error if all actions fail.

If there are warnings from running the alert query, they are logged, but the warning is not stored on the alert. Many warnings are transient and will go away after some time, but some require user interaction, for instance a warning on too many groups in a groupBy function invocation in the alert query. Some warnings will result in the alert query only returning partial results, which may trigger the alert when it should not have been triggered, or make the alert only return some of the events it would otherwise have returned. There is usually a lot of warnings on alert queries right after Humio starts up, for instance indicating that Humio is trying to catch up on ingested data. Because of this, the default behavior is to not fire an alert if there are warnings from the alert query and instead wait for the warning to go away. It is possible to make alerts fire even if there are warnings by setting ALERT_DESPITE_WARNINGS in the Humio configuration.

Incident Management

Humio alerts can be set to trigger various acts, such as informing an administrator of a potential problem with your servers. There are several tools and incident management platforms that may be used to do this, as well as some security monitoring systems.

Incident Management Systems

You can also use simple tools for sending an email or a chat message to an administrator, to bring a situation to their attention. Below is a list of such tools:

Security Monitoring

For monitoring Humio for security situations (e.g., hacker attempts, denial of service attacks, etc.), there are a few security monitoring systems that can be integrated into Humio. Below is a list of them, with links to pages which explain how to configure them and Humio to work together: