Adding a New Node

There are several reasons why you might want to add more nodes to your Humio cluster, including:

  • High Availability
  • Increased Query Performance
  • Increased Storage Capacity
  • Some nodes for storage and others for query processing or API access.

This guide takes you through the steps involved in adding a new node to your cluster.

Steps to adding a new cluster node

When a new node joins a Humio cluster it initially won’t be responsible for processing any incoming data.

There are three core tasks a node performs; parsing, digestion, and storing data. You can read ingestion flow documentation if you want to know more about the different node tasks, but for now we will assume that the node we are adding should take its fair share of the entire workload.

We are going to use the Cluster Node Administration UI, but every step can be performed and automated using Cluster Management GraphQL API.

1. Starting a new Humio node

The first step is to start the Humio node and point it at the cluster. You can read about how to configure a node in the cluster installation guide.

The important part is that the KAFKA_SERVERS configuration option points at the Kafka servers for the existing cluster.

Once the node has successfully joined the cluster it will appear in the Cluster UI’s list of nodes.

Notice that the columns Storage and Digest both say 0 / X. That is because at this point the new node’s storage will not be used — indicated by the 0 in the Storage column) — and it will not be used for digest (processing of events running of real-time queries) - indicated by the X in the Digest column.

A node configured like this is called an Arrival Node since its only task is to parse messages arriving at this node, or coordinate queries sent to this node.

Step 2: Assigning digest work to the node

We want our new node to do some of the digest workload. Digest is the process of looking at incoming events and updating the results of currently running searches by updating the data displayed dashboard widgets and search results.

To distribute a fair share of the digest work to the new node, you can use the Cluster UI and follow these steps

  1. Select the node in the list.
  2. Go to Actions > Start using this node for digest of incoming data > Add node to Digest Rules.

This will change the Digest Rules (seen on the right of the screen) to include the new node.

Initially you shouldn’t do too much if you don’t understand how Digest Rules and Storage Rules work; they work like a routing table for internal cluster traffic and determine which node does what.

Once you have clicked the button and waited a few seconds, you should see that the node now has a share of the digest workload assigned to it, indicated by the value of the Digest column being greater than zero.

Step 3: Using the node for storage

We would also like to use the storage of the node for storage data. This means that the node’s SSD or disk will be used to store data and that the node does part of the worked involved with searching (that is, executing a query on the cluster).

To use node for archiving of new events follow these steps

  1. Select the node in the Cluster UI.
  2. Go to Actions > Start using this node for storing incoming data > Add node to Storage Rules.

This changes the Storage Rules (seen on the right of the screen) to include the node. What this means is that part of the new incoming data (not existing data) will be stored on the node.

Just like with Digest Rules, Storage Rules are an advanced topic, and you don’t need to fully understand them when getting started. Storage Rules define where data is stored and in how many replicas. You can read a more detailed description of Storage Rules and Replication in the storage rules documentation.

Step 4: Taking part of existing data in the cluster

We would like to have the node to take part of the existing data that was already in the cluster before it joined. This does not happen automatically, because moving a potentially huge amount of data between cluster nodes can adversely impact performance and you might want to do it during slow or downtime.

To move a fraction of the total data stored in the cluster to the node, the fraction shown in the Storage column, follow these steps:

  1. Select the node in the Cluster UI.
  2. Select Actions > Move a share of existing data onto this node > Move data to node.

You will see that the Traffic column of the node list will indicate that data is being moved to the node.


Steps 2-4 are optional and in more advanced setups, you will only do some of them. It is recommended that you read the Ingest Flow documentation to understand digest and storage in detail.