Scheduled Searches

A scheduled search is a static query, set to run on a schedule. At a scheduled interval, the query will run and if its result is non-empty, the scheduled search will trigger its associated actions.

Feature Status

Scheduled Searches is a BETA feature. You should test it before relying on it in production. It is disabled by default.

Use Feature on Cloud

For Humio cloud users you may contact Humio support, if you wish to try out the feature.

Use Feature On-prem

For an on-prem deployment of Humio, you can enable the feature yourself, if you have root permissions, by setting the ScheduledSearches feature flag through GraphQL. The following mutations can be used to enable scheduled searches at different levels:

  • Enable for all users:
    mutation{ enableFeature(feature: ScheduledSearches) }
  • Enable for all users within an organization:
    mutation{ enableFeatureForOrg(feature: ScheduledSearches, orgId: "<ORG-ID>") }
  • Enable for a single user:
    mutation{ enableFeatureForUser(feature: ScheduledSearches, userId: "<USER-ID>") }

Note that for any scheduled searches to execute the ENABLE_SCHEDULED_SEARCHES configuration option has to be set to true. See the ENABLE_SCHEDULED_SEARCHES documentation page.

Use Case

Scheduled searches are related to alerts and they are able to trigger the same actions. However, scheduled searches are applicable in other use cases than alerts, such as when:

  • You need to automatically report some search result on a schedule. For instance, you have stakeholders that expect to get an email every Monday at 10:00 containing the top most important security events for the previous week.

  • You have an ingest delay on some logs, which results in them never appearing in searches made by alerts. For instance, if an alert looks back in time using a 1h time window, it won’t trigger on logs ingested with a 12 hour delay. With a scheduled search, you can choose to run your search at a point in time, where you’re fairly certain that every log of interest has been ingested.

  • You need to take delayed action on search results. For instance, if you trigger user bans using an alert, offending users will be banned immediatly upon a transgression and can then easily figure out what triggered their ban. Using a scheduled search, you can choose to ban all offending users at the same time every day, as to obscure the conditions of a ban.

If your situation doesn’t fall into one of these use cases, you should probably use an alert instead. Alerts run as live queries, rather than historic ones, and should thus generally be considered more performant.

  1. Go to View/Repo → Alerts → Scheduled Searches → New Scheduled Search.
  2. Fill out the fields prompted for
  3. Save

Alternatively you can use the GraphQL API to view, create, update and delete scheduled searches using the associated queries and mutations.

Scheduled searches, per default, will not trigger any action(s), if a query result contains a warning. You can alter this behavior by switching on the SCHEDULED_SEARCH_DESPITE_WARNINGS variable in the configuration file. Scheduled searches have the same behavior as alerts in regards to warnings and errors, see here.

In the following we discuss some of the fields you set on a scheduled search.

Schedule

This field allows you to specify the schedule on which your scheduled search should be run. The schedule is defined using a UNIX cron expression, as known from the crontab file found in many UNIX-like systems. Scheduled searches are not allowed to run more than once an hour. Therefore the minutes field in the cron expression is restricted to only allow values in the range [0-59]. There are many online tools to help you generate UNIX cron expressions, that you can use if you need help writing up an expression for your use case.

UTC Offset

The Coordinated Universal Time(UTC) offset defines the temporal offset from UTC in which the search is scheduled. For instance, with a schedule 0 6 * * * and an offset UTC+01:00, the search will be scheduled for 5AM at UTC.

Time Interval

As for all searches, a time interval must be specified. For scheduled searches the time interval is given by a start and end time relative to the scheduled execution time. For instance, if a scheduled search is executing at midnight Jan 2nd, with a time interval of start = 24h and end = now, the search will consider all logs within the time interval: [20xx-01-01T00:00- 20xx-01-02T00:00].

Backfill Limit

If Humio is down or an error prevents an action from being triggered, you will miss searches that would have otherwise been scheduled and executed. When it again becomes possible for schedule searches to run and have them trigger actions, Humio will attempt to backfill searches, which were missed previously.

The backfilling behavior depends on the value given to the backfill limit, which determines how many missed searches will be executed before any new searches are scheduled.

Let us say that we schedule a search every hour, 0 * * * *, and Humio is down between 10:30 and 14:15. This means that the searches at 11:00, 12:00, 13:00 and 14:00 were missed.

Executing the most recent ‘missed’ search is not considered backfilling, as this can also occur under normal operation, if there is a slight delay within Humio. Thus, if the backfill limit is set to 0, as per default, only the search at 14:00 will be executed at startup.

If we increase the limit to 1 we would start off by executing the search scheduled at 13:00, if we increase the limit to 2 we start with the search at 12:00 and if we increase the limit to 3 we start with the search at 11:00. Increasing the value of the backfill limit beyond this point will not have any effect in this example. Note that the missed searches are executed in sequence from oldest to newest.

The backfill limit may not exceed the global maximum backfill limit. This is set using the configuration option, SCHEDULED_SEARCH_BACKFILL_LIMIT.

Spacing Out Searches

Humio will always attempt to run a search exactly according to schedule. This makes scheduled searches predictable, but also risks that many scheduled searches will be configured to run at the same time, which might cause delays. It is common to schedule many jobs for midnight, if they are to be run daily, but if you experience delays in search execution because of a sudden high search load, try to space the searches out over a larger span of time.

If you decide to run a search on another schedule, but wish to keep the same search window, you need to update start and end on your scheduled search. For instance, if your search was running at midnight and searching through the previous day, you would have configured the interval parameters as start=24h and end=now. But if you need to reschedule this search run at 3AM instead, you would have to update the interval parameters as start=27h and end=3h to search within the same 24 hour time window.