Query Language

The Humio Query Language is the syntax that lets you compose queries to retrieve, process, and analyze data in Humio.

Before reading this section, we recommend that you work through the tutorial. The tutorial introduces you to queries in Humio, and lets you try out sample queries that demonstrate the basic principles.


The Query Language is built around a ‘chain’ of data-processing commands linked together. Each expression passes its result to the next expression in the sequence, allowing you to create complex queries by combining query expressions.

This architecture is similar to the idea of command pipes, a powerful and flexible mechanism for advanced data analysis in Unix and Linux shells.

Structure of a query

The basic model of a query is that data arrives at the left of the query, and the result comes out at the right of the query. When Humio executes the query, it passes the data through from the left to the right.

graph LR; A{Repository} -->|Events| B B[Tag Filters] -->|Events| C C[Filters] -->|Events| D D[Aggregates] -->|Rows| E E{Final Result}
Events flow through the query pipeline, from the repository to the left. Events are filtered or transformed as they pass through filters, and aggregates. After an aggregation the data is usually no longer events, but one or more simple rows containing the results.

As the data passes through the query, Humio filters, transforms, and aggregates it according to the query expressions.

Expressions are chained using the ‘pipe’ operator |.
This causes Humio to pass the output from one expression (left) into the next expression (right) as input.

For example, the following query has these components

  • Two tag filters
  • One filter expression
  • Two aggregate expressions

    #host=github #parser=json | // <-- Tag Filters
    repo.name=docker/* | // <-- Filter Expression
    groupBy(repo.name, function=count()) | sort() // <-- Aggregates

Free-Text Filters (grepping)

The most basic query in Humio is to search for a particular string in any field of events. All fields (except for the special @id, @timestamp, @ingesttimestamp fields and the tag fields) are searched, including @rawstring. See the events documentation for more details on @rawstring.

Grepping runs on the fields in the event that are present at the start of the pipeline when performing a search. It does not take into account any fields added or removed within the pipeline.

When grepping is applied in a parser this differs: The event is processed as it is present at the point where the grepping happens. Humio recommends using Field filters whenever possible within a parser to avoid ambiguous matches.

Note: Humio versions before 1.13 searched only the @rawstring field when grepping.

Grepping does not specify the order in which fields are searched. When not extracting fields, the order in which fields are checked is not relevant as any match will let the event “pass” the filter.

But when extracting fields using a regular expression, matches can yield non-deterministic extracted fields. To make extracted fields be the same if a match was also possible in the older versions, Humio prefers a match on @rawstring before trying other fields when extracting fields.

You can perform more complex regular expression searches on all fields of an event by using the regex function or the // regex syntax.


Query Description
foo Find all events matching “foo” in any field of the events.
"foo bar" Use quotes if the search string contains white spaces or special characters, or is a keyword.
"msg: \"welcome\"" You can include quotes in the search string by escaping them with backslashes.

You can also use a regular expression on all fields. To do this, write the regex.

Query Description
/foo/ Find all events matching “foo” in any field of the events.
/foo/i Find all events matching “foo” ignoring case.

Field Filters

Besides the Free-text filters, you can also query specific event fields, both as text and as numbers.


Text fields

Query Description
url = *login* The url field contains login. You can use * as a wild card.
user = *Turing The user field ends with Turing.
user = "Alan Turing" The user field equals Alan Turing.
user != "Alan Turing" The user field does not equal Alan Turing.
url != *login* The url field does not contain login.
user = * Match events that have the field user.
user != * Match events that do not have the field user.
name = "" Match events that have a field called name but with the empty string as value.
user="Alan Turing" You do not need to put spaces around operators (for example, = or !=).

Regex filters

In addition to globbing (* appearing in match strings) you can match fields using regular expressions.

Query Description
url = /login/ The url field contains login.
user = /Turing$/ The user field ends with Turing.
loglevel = /error/i The loglevel field matches error case insensitively; for example, it could be Error, ERROR or error.
/user with id (?<id>\S+) logged in/ | top(id) The user id is extracted into a field named id at search time. The id is then used in the top function to find the users that logged in the most. It is possible to extract fields in regular expressions using named groups. See Regex field extraction for details.

Comparison operators on numbers

Query Description
statuscode < 400 Less than
statuscode <= 400 Less than or equal to
statuscode = 400 Equal to
statuscode != 400 Not equal to
statuscode >= 400 Greater than or equal to
statuscode > 400 Greater than
400 = statuscode (!) The field ‘400’ is equal to statuscode.
400 > statuscode This comparison generates an error. You can only perform a comparison between numbers. In this example, statuscode is not a number, and 400 is the name of an field.

The left-hand-side of the operator is interpreted as a field name. If you write 200 = statuscode, Humio tries to find a field named 200 and test if its value is "statuscode".

If the specified field is not present in an event, then the comparison always fails — unless it is !=. You can use this behavior to match events that do not have a given field, using either not (foo = *) or the equivalent foo != * to find events that do not have the field foo.

Tag Filters

Tag filters are a special kind of field filter. They behave in the same way as regular filters.

In the example shown in the previous section (Basic Query Components), we have separated the tag filters from the rest of the query by a pipe character |.

We recommend that you include the pipe character before tag filters in your queries to improve the readability of your queries.

However, these pipe characters are not mandatory. The Humio query engine can recognize tag filters, and use this information to narrow down the number of datasources to search. This feature decreases query time.

See the tags documentation for more on tags.

Logical Operators: and, or, not, and !

You can combine filters using the and, or, not Boolean operators, and group them with parentheses. ! can also be used as an alternative to unary not.


Query Description
foo and user=bar Match events with foo in any field and a user field matching bar.
foo bar Since the and operator is implicit, you do not need to include it in this simple type of query.
statuscode=404 and (method=GET or method=POST) Match events with 404 in their statuscode field, and either GET or POST in their method field.
foo not bar This query is equivalent to the query foo and (not bar) .
!bar This query is equivalent to the query not bar .
not foo bar This query is equivalent to the query (not foo) and bar . This is because the not operator has a higher priority than and and or.
foo and not bar or baz This query is equivalent to the query foo and ((not bar) or baz) . This is because Humio has a defined order of precedence for operators. It evaluates operators from the left to the right.
foo or not bar and baz This query is equivalent to the query foo or ((not bar) and baz) . This is because Humio has a defined order of precedence for operators. It evaluates operators from the left to the right.
foo not statuscode=200 This query is equivalent to the query foo and statuscode!=200 .

Negating the result of filter functions

The not and ! operators can also be used to negate filter function expressions, which is syntactically more clean than passing in an explicit negate=true argument. Examples of this are

... | !cidr(ip, subnet="127.0.0/16") | ...
... | !in(field, values=[a, b, c]) | ...
... | !regex("xxx") | ...

Adding new fields

New fields can be created in two ways

Regex Field Extraction

You can extract new fields from your text data using regular expressions and then test their values. This lets you access data that Humio did not parse when it indexed the data.

For example, if your log entries contain text such as ... disk_free=2000 ... , then you can use a query like the following to find the entries that have less than 1000 free disk space

regex("disk_free=(?<space>[0-9]+)") | space < 1000

Named capturing groups are used to extract fields in regular expressions. The field space is extracted and is then available after the regex function.
The same can be written using a regex literal

/disk_free=(?<space>[0-9]+)/ | space < 1000

You can apply repeat to field extration to yield one event for each match of the regex. This allows processing multiple values for a named field, or a field name that matches a pattern, as in this example:

regex("value[^=]*=(?<someBar>\\S+)", repeat=true) | groupby(someBar)

On an input event with a field value of type=foo value=bar1 valueExtra=bar2 value=bar3 the groupby sees all three “bar” values

In order to use field-extraction this way, the regex must be a top-level expression, that is, | between bars |. The following doesn’t work

type=FOO or /disk_free=(?<space>[0-9]+)/ | space < 1000

Since regular expressions do need some computing power, it is best to do as much simple filtering as possible earlier in the query chain before applying the regex function. (For version 1.13+: If the field name is known it also helps to specify the field to search to avoid searching all fields.)

Fields produced by functions (as-parameters)

Fields can also be added by functions. Most functions set their result in a field that has the function name prefixed with a ‘_’ by default. For example the count puts its result in a field _count.

Most functions that produce fields have a parameter called as. By setting this parameter you can specify the name of the output field, for example


Assigns the result of the count to the field named cnt (instead of the default _count).

See also the assignment operator for shorthand syntax for assigning results to a field.

Eval Syntax

The function eval can assign fields while doing numeric computations on the input.

The := syntax is short for eval. Use | between assignments.

... | foo := a + b | bar := a / b |  ...

is short for

... | eval(foo = a + b) | eval(bar = a / b) | ...

Assignment operator

You can use the operator := with functions that take an as-parameter. When what’s on the right hand side of the assignment is a function call, the assignment is rewritten to specify the as= argument which, by convention, is the output field name. For example

... | foo := min(x) | bar := max(x) |  ...

is short for

... | min(x, as=foo) | max(x, as=bar) | ...

Field operator

You can use attr =~ fun() with any function that has a parameter named field. It designates the field=attr argument and lets you write:

... | ip_addr =~ cidr(subnet="") | ...

rather than

... | cidr(subnet="", field=ip_addr) | ...

This also works well with regex and replace . It’s a shorthand, and very convenient.


There is no “if-then-else” syntax in Humio, since the streaming style is not well-suited for procedural-style conditions.

But there are a couple of ways to do conditional evaluation

Case Statements

Using case statements, you can describe alternative flows in your queries. It is similar to case or cond you might know from many other functional programming languages. It essentially allows you to write if-then-else constructs that work on events streams.

The syntax looks like this:

case {
  expression | expression | ...;
  expression | expression | ...;
  expression | expression | ...;
  * | expression | ...

You write a sequence of pipeline clauses separated by a semicolon (;). Humio will apply each clause from top to bottom until one returns a value (matches the input).

You can add wildcard clause case { ... ; * } which matches all events as the “default case”, essentially the else part of an if-statement. If you don’t add a wildcard clause, any events that don’t match any of the explicit clauses will be dropped. You cannot use the empty clause — you must explicitly write * to match all.


Let’s say we have logs from multiple sources that all have a field named time, and we want to get percentiles of the time fields, but one for each kind of source.

First we try to match some text that distinguishes the different types of line. Then we can create a new field type and assign a value that we can use to group by

| case { "client-side"     | type := "client";
         "frontend-server" | ip != | type := "frontend";
         Database          | type := "db" }
| groupBy(type, function=percentile(time)))

Match Statements

Using match statements, you can describe alternative flows in your queries where the conditions all check the same field. It is similar to match or switch which you might recognize from many other programming languages. It essentially allows you to write if-then-else constructs that work on events streams. The matches on the field support the filters listed in Field Filters

The syntax looks like this

field match {
  value => expression | expression... ;
  /regex/ => expression | ...;
  * => expression | ...

You write a sequence of filter and pipeline clauses to run when the filter matches, separated by a semicolon (;). Humio will apply each clause from top to bottom until one returns a value (matches the input).

You can add a wildcard clause match { ... ; * } which matches all events as the “default case”, essentially the else part of an if-statement. If you don’t add a wildcard clause any events that don’t match any of the explicit clauses will be dropped. You cannot use the empty clause — you must explicitly write * to match all.


Let’s say we have logs from multiple sources that all have a field that holds the time spent on some operation, but in different fields and units. We want to get percentiles of the time fields all in the same unit and in one field.

logtype match {
    "accesslog" => time:=response_time ;     // Access log is in seconds.
    /server_\d+/ => time:=server_time*1000 ; // These servers log in millis
| groupBy(logtype, function=percentile(time)))

Setting a field’s default value

You can use the function default to set the value of a missing or empty field.

Composite Function Calls

Whenever a function accepts a function as an argument, there are some special rules. For all variations of groupBy

(bucket and timechart ), that take a function= argument, you can also use a composite function. Composite functions take the form { f1(..) | f2(..) } which works like the composition of f1 and f2. For example, you can do

groupBy(type, function={ avgFoo := avg(foo) | outFoo := round(avgFoo) })

You can also use filters inside such composite function calls, but not macros.

User Functions

If you have stored a query as a ‘saved query’, then it can be used as a top-level element of another query, sort of like a function call.

To use a saved query this way you invoke it using the syntax $"$NAME_OF_SAVED_QUERY"() or, if the name does not contain whitespace or special characters you can use $nameOfSavedQuery() without quotes. A typical use for this is to define a filter or extraction ruleset that you can use as a prefix of another query.

Currently user functions do not support parameters, though this will be part of a future release — that is why we put parentheses at the end.


$"My Saved Query"() | $filterOutFalsePositive() | ...


Queries can have comments. This is useful for long multi-line queries, to add some description

#type=accesslog // choose the type
| top(url)  // count urls and choose the most frequently used

The Humio query language supports // single line and /* multi line */ comments like Java or C++.

When showing search results in a table it is possible to create a link that is clickable. If the value of a field looks like a link, the UI will make it clickable. Links can be constructed using the search language. The format function can be handy for this.

$extractRepo() | top(repo) | format("https://myhumio/%s", field=repo, as=link)


New in version 1.7, Humio supports the filter query function join(). The join() function returns a combined result set from two queries.

This function provides two join modes: inner and left. If you’re familiar with SQL, what would be the left table in SQL is referred to as the primary in Humio, and the right table becomes the subquery.

Inner joins

Let’s first consider the inner join, as it is the join() function’s default behavior.

An inner join returns any intersection of matching values, based upon the criteria you provide, between the primary query and the subquery (shown below in purple).

Left joins

The left join returns everything in the primary and anything that matches from the subquery. In other words, a query using a left join returns any intersection of matching values, based upon the criteria you provide, between the primary query and the subquery (like an inner join) plus all the values of the primary query. This is illustrated below.


The join() function’s signature looks like this

join({subquery}, field=arg1, key=arg2, repo=arg3, mode=inner|left)

The unnamed subquery argument, and the named field and key arguments, are required. The named repo, include, and mode parameters are optional. There are other options available. Please see the Query Function reference documentation on join for more details.

First, let’s consider a join query against a single repository. In that scenario, there is no need to define the optional repo parameter. The query will look like this

primary query | join({subquery}, field=arg1, key=arg2)

So to break down how the join() function is used

  • The primary query is any valid Humio query. This provides the first set of results you wish to work with.
  • The arguments you pass to the join() function define what results should be combined and how.
  • The subquery is any valid Humio query. This provides the second set of results you wish to combine with those from your primary query.
  • The field parameter’s value (arg1) defines the field name in the primary query. This value will be used to match data in the subquery.
  • The key parameter’s value (arg2) defines the field name in the subquery. This value will be used to match data in the primary query.

If the field names being used in field and key are identical, you can use the simplified syntax

primary query | field =~ join({subquery})

In the above example, the field argument would be the actual field name. For example, if both queries share a field named host that you wish to join on, you would use

primary query | host =~ join({subquery})

Joins across two repositories

In addition to joining two queries against a single repository, the join() function can also be used to return a result set from more than two repositories. To do so, use the optional repo parameter to define the repository (as arg3) you want the subquery to run against.

primary query
| join({subquery}, field=arg1, key=arg2, repo=arg3)

Note that the user running the join() query must have permissions on both repositories.

Limitations of live joins

Joins are implemented by starting two queries: the main query and the subquery. For static queries, the subquery runs first and then the main query runs. For live queries, the historic part is done as in the static case, whereas the main query and subquery will run simultaneously for the live part.

Because of this and the distributed nature of Humio, subqueries in live queries are not guaranteed to “see” events before the main query does. In practice this can be thought of as there being a small delay (typically in the order of 5-6 seconds) from an event being ingested to it hitting the subquery. This can lead to surprising results when making live joins on events that occur very close in time.

For example, the query


will typically result a larger count than the query below, which joins events to themselves

join({*}, field=id) | count()

This is because at the time the main query runs for a given event, the subquery has not seen that event yet and the join will therefore not emit the event.

You can avoid this by: - not joining events that are ingested close in time - using static queries, or - using repeating queries

Example 1: An inner join

Let’s say we’re analyzing two repositories to see what kinds of web browsers are generating errors in our web servers. In this example, we want to return the intersection of two types of data: the status codes that return errors (anything 400 or higher) from the accesslog in the current repository, and the web browser type (the user agent) from the weblogs repository.

We can use the following query

#type=accesslog statuscode>=400
| join({useragent != ""}, field=sourceipaddress, key=browseripaddress, repo=weblogs)

Our primary query is

#type=accesslog statuscode>=400

We’re looking in the current working repository for logs of type accesslog where statuscode >= 400. Next, we pipe the primary query to our join() function

join({useragent != ""}, field=sourceipaddress, key=browseripaddress, repo=weblogs)

We pass in {useragent != ""} as the subquery argument to the join() function. This subquery returns any value for the useragent that is not an empty string.

To join the result sets of these two queries together we need a way to match them up. If our accesslog and weblogs both contain IP addresses that would be a good option and is the one we will use in this example. However, in our example the two repositories use different field names for the IP address. The accesslog uses sourceipaddress and the weblogs repository uses browseripaddress. We define that relationship by passing in the field used in the primary query and the key that will be matched against in the subquery.

field=sourceipaddress, key=browseripaddress

To define the external repository that the subquery will run against, we simply pass that repository’s name in with the repo parameter repo=weblogs.

What can we expect back as results? The primary query finds all accesslog entries with an error code. The subquery finds all entries in the weblogs repository with a value present for the useragent. However, being an inner join, the combined results will only be returned for entries where the IP addresses match in both queries.

Example 2: A left join

In this next example, we want to get as much information as possible about people who are accessing hosts. The information we have may or may not contain the location of the user, but if it does, we need to include that in our query results.

Since location information is desirable, although optional, but all access logs are required a left join query would better fit our requirements than an inner join query. Here is what it would look like

| join({username=*}, field=host, key=hostname, repo=auditlogs,  
| table(timestamp, username, host, location)

In this example, we get all results from our current repository using #type=accesslog as the primary query. Our subquery (username=*) states that we wish to return all usernames. As in the previous example, we need a common point of data to map the relationship between our repositories. We’ll use the host, again, defining the differing field names used in the two repositories for that data. We passed those into the join() function as field=host, key=hostname. We define our external repository with repo=auditlogs.

Since we want all the results from the primary query, even if there is no match with the subquery, we specify a left join using mode=left.

Finally, we display the results in a table by piping the results to table(timestamp, username, host, location).

Best practices for join queries

Humio recommends the following when running join() queries

  • If your join() query key is constantly changing with live data, use live queries to keep the query updated.
  • When debugging an inner join query, temporarily set the join mode to left (join(mode=left)). This will help you debug your query by showing all results matched or unmatched by the join clause.
  • Another suggestion is to first write the subquery alone to make sure the results are what you expect. Once they are, then you can confidently include it in a join.