Replacing Hardware

If you need to replace a node in your Humio cluster, for whatever reason, you have options.

About cluster node identity

A cluster node is identified in the cluster by it’s UUID. The UUID is automatically generated the first time a node is started, and stored in $HUMIO_DATA_DIR/cluster_membership.uuid. When moving or replacing a node, you can use this file to ensure a node rejoins the cluster with the same identity.

Examples

This guide assumes a rather basic hardware and network setup. If you are using SANs, Blue-Green Deployment, or other advanced techniques, you can use this as a reference for those more advanced configurations.

The node will continue operating with the same storage

If the node will continue to run on the same storage, meaning it keeps its data directory, all you will need to do is to ensure that the node is not a Digest Node before shutting down the node:

  1. Assign another node to any Digest Rules where this node is assigned.

    This can be done using the Humio’s Management Cluster UI. You can read more about un-assigning digest in the section about removing a node.

  2. Shutdown the Humio process on the node.

    At this point you can see the node being unavailable in the Cluster Management UI.

  3. Replace the hardware components.

  4. Start the Humio process.

    Your node should rejoin the cluster after a short time, and you will see the node become available in the Cluster Management UI.

  5. Reassign the Digest Rules (if you unassigned any in Step 1).

The node will use a new storage unit (slow recovery)

You are moving the node to different machine, or installing a new disk or SSD.

There are two requirements that must be fulfilled:

If your cluster has multiple replicas of data (replication factor >= 2) and it is acceptable for the cluster to be in a state of lower replication while the new hardware is being provisioned.

You must also make sure that the node does not contain any data for which it is the sole owner (this can occur if you have archive divergence ).

You can check this in the Cluster Management UI, indicated by red numbers in the Size column.

In this case, the cluster can self-heal once the node reappears. It will discover that the node is missing data it was expected to have and will start resending it.

  1. Make a copy of the Node UUID file.

    While you won’t have to copy all the data on the node you must make a backup of the Node UUID file.

    It is located in $HUMIO_DATA_DIR/cluster_membership.uuid; you will be copying it to the new data folder on the new storage unit.

  2. Assign another node to any Digest Rules where this node is assigned.

    This can be done using the Humio’s Management Cluster UI. You can read more about un-assigning digest in the section about removing a node.

  3. Shut down the Humio process.

  4. Copy the Node UUID file from step 1 into the node’s data folder.

  5. Start the Humio process using the new storage.

    Your node should rejoin the cluster after a short time, and you will see the node become available in the Cluster Management UI.

    The other nodes will start resending the missing data that is missing, and the Too Low segment of the replication status in the header will initially be high, but will begin dropping as data is replicated.

  6. Reassign the Digest Rules (if you unassigned any in Step 2).

The node will use a new storage unit (quick recovery)

If you are moving the node to a new storage unit and have hard replication requirements, or your cluster is only storing data in one replica, you cannot use Slow Recovery.

To limit the downtime of your node you should copy node’s data directory before and shutting down the original node. This will ensure you only have to copy the most recent data when the node is taken offline.

  1. Use RSync or similar to copy the data directory to the new storage (this includes the UUID File).

  2. Assign another node to any Digest Rules where this node is assigned.

    This can be done using the Humio’s Management Cluster UI. You can read more about un-assigning digest rules in the section about removing a node.

  3. Reassign any archive rules to other cluster nodes.

    This can be done using Humio’s Management Cluster UI. You can read more about un-assigning archive rules in the section about removing a node.

  4. Shut down the Humio process.

  5. Rerun RSync or similar to copy the most recent data to the new storage.

  6. Start the Humio process.

    Your node should rejoin the cluster after a short time, and you will see the node become available in the Cluster Management UI.

  7. Reassign the Digest Rules and Archive Rules (if you unassigned any in Step 2 and 3).

Storage malfunctions and you’re running with no replication or the node had data not found on other nodes

In the case where storage cannot be recovered, there are two options:

  1. Restore the node from backup if you have that enabled. See Restoring from Backup.

  2. Forcibly remove the node from the cluster. Any data that was not stored in multiple replicas will be lost. See Force Remove a Node.