There are several reasons why you might want to add more nodes to your Humio cluster, including:
This guide takes you through the steps involved in adding a new node to your cluster.
When a new node joins a Humio cluster it initially won’t be responsible for processing any incoming data.
There are three core tasks a node performs; parsing, digestion, and storing data. You can read ingestion flow documentation if you want to know more about the different node tasks, but for now we will assume that the node we are adding should take its fair share of the entire workload.
We are going to use the Cluster Node Administration UI, but every step can be performed and automated using Cluster Management GraphQL API.
The first step is to start the Humio node and point it at the cluster. You can read about how to configure a node in the cluster installation guide.
The important part is that the
KAFKA_SERVERS configuration option
points at the Kafka servers for the existing cluster.
Once the node has successfully joined the cluster it will appear in the Cluster UI’s list of nodes.
Notice that the columns Storage and Digest both say
0 / X. That is because
at this point the new node’s storage will not be used — indicated by the
0 in the Storage column) —
and it will not be used for digest (processing of events running of real-time queries) -
indicated by the
X in the Digest column.
A node configured like this is called an Arrival Node since its only task is to parse messages arriving at this node, or coordinate queries sent to this node.
We want our new node to do some of the digest workload. Digest is the process of looking at incoming events and updating the results of currently running searches by updating the data displayed dashboard widgets and search results.
To distribute a fair share of the digest work to the new node, you can use the Cluster UI and follow these steps
This will change the Digest Rules (seen on the right of the screen) to include the new node.
Once you have clicked the button and waited a few seconds, you should see that the node now has a share of the digest workload assigned to it, indicated by the value of the Digest column being greater than zero.
We would also like to use the storage of the node for storage data. This means that the node’s SSD or disk will be used to store data and that the node does part of the worked involved with searching (that is, executing a query on the cluster).
To use node for archiving of new events follow these steps
This changes the Storage Rules (seen on the right of the screen) to include the node. What this means is that part of the new incoming data (not existing data) will be stored on the node.
Just like with Digest Rules, Storage Rules are an advanced topic, and you don’t need to fully understand them when getting started. Storage Rules define where data is stored and in how many replicas. You can read a more detailed description of Storage Rules and Replication in the storage rules documentation.
We would like to have the node to take part of the existing data that was already in the cluster before it joined. This does not happen automatically, because moving a potentially huge amount of data between cluster nodes can adversely impact performance and you might want to do it during slow or downtime.
To move a fraction of the total data stored in the cluster to the node, the fraction shown in the Storage column, follow these steps:
You will see that the Traffic column of the node list will indicate that data is being moved to the node.
Steps 2-4 are optional and in more advanced setups, you will only do some of them. It is recommended that you read the Ingest Flow documentation to understand digest and storage in detail.