When data arrives at Humio it needs to be processed. The journey data takes from arriving at a Humio node until it is presented in search results and saved to disk is called the ingest flow.
If you are planning a large system or tuning the performance of your Humio cluster it can help to understand the flow of data. If you understand the different phases of the ingest flow you can ensure that the right machines have the optimal hardware configuration.
In this section we’ll explain the different ingest phases and how nodes participate.
There are three phases incoming data goes through:
These phases may be handled by different nodes in a Humio cluster, but any node can take part in any combination of the three phases.
When a system sends data (logs) to Humio over one of the
Ingest APIs or through an ingest listener
the cluster node that receives the request is called the arrival node.
The arrival node parses the incoming data (using the configured parsers)
and puts the result (called events)
in a Humio’s
humio-ingest Kafka queue.
If you are not familiar with Kafka - don’t worry.
The events are now ready to be processed by a Digest Node.
After the events are placed in the
humio-ingest queue a Digest Node
will grab them off the queue as soon as possible. A queue in Kafka is configured with a number of partitions (parallel streams), and each such Kafka partition is consumed by a digest node.
A single node can consume multiple partitions and exactly which node that
handles which digest partition is defined in the cluster’s Digest Rules.
Digest nodes are responsible for buffering new events and compiling segment files (the files that are written to disk in the Store phase).
Once a segment file is full it is passed on to Storage Nodes in the Store Phase.
Digest nodes also processes the Real-Time part of search results.
Whenever a new event is pulled off the
humio-ingest queue the
digest node examines it and updates the result of any matching live searches
that are currently in progress. This is what makes results appear instantly in
results after arriving in Humio.
The final phase of the ingest flow is saving segment files to storage. Once a segment file has been completed in the digest phase it is saved in X replicas - how many depend on how your cluster is configured (see Storage Rules).
Now that we have covered all the phases, let’s put the pieces together and give you a more detailed diagram of the complete ingest flow: