There are several reasons why you might want to add more nodes to your Humio cluster including:
This guide takes you through the steps involved in adding a new node to your cluster.
When a new node joins a Humio cluster it will initially not be responsible in processing any incoming data.
There are three core tasks a node perform, Parsing, Digestion and Storing Data. You can read ingestion flow documentation if you want to know more about the different node tasks, but for now we will assume that the node we are adding should take its fair share of entire workload.
We are going to use the Cluster Node Administration UI but every step can be performed and automated using Cluster Management GraphQL API.
The first step is to start the Humio node and point it at the cluster. You can read about how to configure a node in the cluster installation guide.
The important part is that the
KAFKA_SERVERS configuration option
points at the Kafka servers for the existing cluster.
Once the node has successfully joined the cluster it will appear in the Cluster UI’s list of nodes.
Notice that the columns Storage and Digest both say
0 / X. That is because
at this point you the new node’s storage will not be used - indicated by the
0 in the Storage column) -
and it will not be used for digest (processing of events running of real-time queries) -
indicated by the
0 in the Digest column.
A node configured like this is called an Arrival Node since its only task is to parse messages arriving at this node or coordinate queries send to this node.
We want our new node to do some of the digest workload. Digest is the process of looking at incoming events and updating the results of currently running searches, (i.e. updating the data displayed dashboard widgets and search results).
To distribute a fair share of the digest work to the new node you can use the Cluster UI and follow these steps:
This will change the Digest Rules (seen on the right of the screen) to include the new node.
Initially you should not don’t too much if you don’t understand how Digest Rules and Storage Rules work, suffice to say that they work like a routing table for internal cluster traffic and determines which node does what.
Once you have clicked the button and wait a few seconds, you should see that the node now has a share if the digest workload assigned to it, indicated by the value of the column Digest being greater than zero.
We would also like to use the storage of the node for storage data. This means that the node’s SSD or disk will be used to store data and that the node does part of the worked involved with searching (i.e. executing a query on the cluster).
To use node for archiving of new events follow these steps:
This changes the Storage Rules (seen on the right of the screen) to include the node. What this means is that part new incoming data (not existing data) will be stored on the node.
Just like with Digest Rules, Storage Rules are an advanced topic, and you don’t need to fully understand them when getting started. In a nutshell Storage Rules define where data is stored and in how many replicas. You can read a more detailed description of Storage Rules and Replication in the storage rules documentation.
Lastly we would like to have the node to take part of the existing data that was already in the cluster before it joined. This does not happen automatically as you might expect - this is because moving a potentially huge amount of data between cluster node can adversely impact performance and you might want to do it during the night or similar.
To move a fraction of the total data stored in the cluster to the node, the fraction shown in the Storage column follow these steps:
You will see that the “Traffic” column of the node list will indicate that data is being moved to the node.
Steps 2-4 are all optional and in more advanced setups, you will only do some of them. It is recommended that you read the Ingest Flow documentation to understand Digest and Storage in detail.