The Humio Query Language is the syntax that lets you compose queries to retrieve, process, and analyze data in Humio. Queries are generally used on the Search page of the Humio User Interface (see UI – Search documentation).
The query language is built around a chain of data-processing commands linked together. Each expression passes its result to the next expression in the sequence, allowing you to create complex queries by combining query expressions. This architecture is similar to command pipes, a powerful and flexible mechanism for advanced data analysis in Unix and Linux shells.
In this reference section on the Humio Query Language, you’ll find explanations on the various components of a query. Below is a list of these sections with brief descriptions; you can click on the name of a section to go to that page for more information on the topic:
When querying data in Humio, filters may be used to reduce the results to the relevant data. You can use free-text filters to grep data, or you can filter based on fields, stipulating acceptable field values or using regular expressions for matching field contents.
For filtering, there are several operators availble: besides logical operators, there are also some comparison operators to narrow search results to what’s most important to you.
To improve results sets, as well as to construct more complex queries, you can add fields when querying data. You would do this by using extracting and creating fields with regex, and by some functions designed for this purpose.
Although Humio’s query language does not provide a typical conditional syntax, there are ways to evaluate data, conditionally. You can use a case statement or a match statement.
Humio supports the joining of queries using the
join()function. It will return a combined results set from given queries.
You can use query functions to get values, or reduce results. Humio provides many built-in query functions, plus you can create your own. This includes repeating queries
For time related queries, you may want to know about Rate Unit Conversion, or about relative time syntax. Click the heading here to learn more on these topics.
The next section below presents the structre of a query. However, if haven’t already, you may want to read the Geting Started tutorial. It will link you to an interactive tutorial that will introduce you to queries in Humio and let you try sample queries that will demonstrate the basic principles.
The basic model of a query is that data arrives at the start of the query, and the result comes out at the end of the query. When Humio executes a query, it passes the data through from one step to the next (see Figure 1 here).
Regarding the flow chart here, events flow through the query pipeline, starting from the repository. Events are filtered or transformed as they pass through filters, and aggregates. After an aggregation, the data is usually no longer events, but one or more simple rows containing the results.
As the data passes through the query, Humio will filter, transform, and aggregate it according to the query expressions. Expressions are chained using the pipe operator (i.e.,
|). This causes Humio to pass the output from one expression into the next expression as input.
To understand better, look at this query:
#host=github #parser=json | repo.name=docker/* | groupBy(repo.name, function=count()) | sort()
On the first line here, there are two tag filters, which narrow the search to events in which the
#host equals github, and the #parser used was json. Notice that line ends with a pipe, which means it will send the results that pass the two tag filters to the next expression or line.
The second line uses a filter expression. It says to limit results to events that are taken from the repositories that start with the name, docker/. When you’re searching only one repository, this wouldn’t be necessary. However, if you’re searching a view which is based on multiple joined repositories, you might want such a filter.
The last line in the example here aggregates the filtered results using the
groupBy( ) function. It groups first by the repository name. It will t hen use the count( ) function to get a count of the number of events from each docker repository, the github and json events only. These results will be piped to the
sort( ) function to sort the results, alphabetically.