Ansible is an automation tool that simplifies configuration management, program setup, cloud computing, intra-service operations, and more. It communicates over SSH to send and retrieve information to and from remote machines, and can be used to manage a Humio cluster.
This document details how to install Humio alongside Java, Zookeeper, and Kafka on each of three Ubuntu Server 16.04 or later nodes using Ansible playbooks. It also covers Ansible Roles used with Humio.
The three-server Humio installation requires at a minimum for each server, 16 CPU cores, 16 GB of memory and a 1 GBit network card, Disk space depends on the amount of ingested data per day and the number of retention days. It’s calculated as Retention days x GB injected / compression factor = needed disk space for a single server. For more information on retention of data in Humio, see Configuring Data Retention, and on compression, see Index Sizing
On AWS, for each server, start with Ubuntu M5.4XL. This instance type contains 16 vCPUs, 64 GB memory, up to 10 Gbps network). M5d/M5ad (instance storage) can also be used, and would have greater performance in general than its equivalent M5/M5a counterpart.
Humio supports Ubuntu 16.04 and higher, and RHEL 7. Running the installation using Ansible requires Python 3 on all hosts in the cluster, and Ansible 2.6 or later installed on the machine where the playbook is to run. Humio requires Java Virtual Machine 11+, Zookeeper 3.4.X+, and Kafka 2.2+.
As for networking, in addition to port 22 for SSH, the Humio nodes require port 8080 opened to incoming traffic to service requests to the web application and API. As these nodes are to be part of a cluster, they will need to have the following incoming ports open:
Application | Protocol | Port |
---|---|---|
Humio | TCP | 8080 |
Kafka | TCP | 9092 |
SSHD | TCP | 22 |
Zookeeper | TCP | 2181, 2888, 3888 |
For the host running the Ansible playbooks, outbound TCP port 22 is used by SSHD. The host from which you will run Ansible should be able to reach the apt repositories for installing Ansible, and GitHub.com (https) for fetching the playbooks, so plan network access accordingly.
Choose a host from which to run the Ansible playbooks. This should be a separate host. On that host, install Ansible and clone the Humio Ansible Demo from GitHub. Note, you’ll use the directory in that repository called bare_metal
. If you’re unfamiliar with using Git or GitHub, read the Introduction to GitHub tutorial.
For communicating with Ansible, you’ll need to generate an SSH key pair. It will be used for during the installation of Ansible. Your SSH public key must be added to the ~/.ssh/authorized_keys
file on each host of the cluster for the user that will run Ansible.
On the host where you’ll be running Ansible, execute the following to generate the key:
# ssh-keygen
Make a copy of the ssh public key it generates. Then, using a simple text editor on each remote host, edit the authorized_keys
file in the ~/.ssh/
and paste in the key. That’s just about the last thing you’ll have to do on each host. All instructions after this related to configuration files are performed on files from the cloned repository, unless otherwise noted.
Change to the directory where you cloned the repository (i.e., ~/ansible-demo/bare_metal
) and open the inventory.ini
file with a text editor. You’ll need to update your Humio cluster fully qualified domain names (FQDN) or IP addresses (replace the x.x.x.x). You’ll also need to define the correct path to the private key file Ansible will be using to authenticate to the remote hosts. By default, the ansible playbook is run as root
. However, as many systems disable remote root login by default, you may want to edit the ansible_user
to another user. The user chosen must be able to sudo
to root
without providing a password.
The first stanza below defines the Humio host machines. Replace ansible_host
with the actual IP addresses. A typical small cluster has three hosts that each run with all three roles shown below. You can add more hosts if you prefer. It’s best to keep the cluster an odd number of hosts, though. The cluster_index
should just increment for each host in the cluster.
[humios]
humio1 ansible_host=x.x.x.x cluster_index=1
humio2 ansible_host=x.x.x.x cluster_index=2
humio3 ansible_host=x.x.x.x cluster_index=3
[humios:vars]
ansible_port=22
ansible_user=ubuntu
ansible_ssh_private_key_file=/home/ubuntu/.ssh
[kafkas:children]
humios
[zookeepers:children]
humios
In this second stanza, under humios:vars
, the variables are applied to each host. By default, the ansible SSH port being used is the default SSH port (22
). If all hosts share the same port, you can set it here to something else. If each host has a unique port number, you can move it to the previous section to specify it on a per-host basis.
Change the value for ansible_user
to whatever remote user will be executing the commands on the servers. The username must not require a password or you’ll be prompted repeatedly for one when you run the playbook. The last variable is the local path to the private key file that will be used to connect to remote systems. It should be in the ~/.ssh/authorized_keys
file on all of the remote machines under the user account specified above.
The [kafkas:children]
sets a kafkas
group that contains the same hosts as [humios]
. If your kafka hosts aren’t the same as your humio hosts, then you can remove the :children
here and specify new hosts as seen in the humios
section above.
The [zookeepers:children]
sets a zookeepers
group that contain the same hosts as [humios]
. If your zookeeper hosts aren’t the same as your humio hosts, you can remove the :children
and specify new hosts as in [humios]
above.
You now need to do some network configuring. Run ifconfig
on the cluster hosts to get the name of the external network interface. In the example below, you would make a note of ens5
— or whatever your network card is named — and use that in a bit.
$ ifconfig
ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet x.x.x.x netmask 255.255.255.0 broadcast x.x.x.255
...
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
...
Now edit the group_vars/all.yml
file and update the humio_network_interface
variable to reflect the name of the Humio host machine’s network interface. With a simple text editor, edit that file to look something like this:
zookeeper_version: 3.4.14
zookeeper_url: "http://archive.apache.org/dist/zookeeper/zookeeper-{{ zookeeper_version }}/zookeeper-{{ zookeeper_version }} .tar.gz"
humio_version: 1.5.23
humio_network_interface: ens5
kafka_version: 2.1.1
kafka_scala_version: 2.12
You’re now ready to install the roles for Ansible To do that, you would enter the following from the command-line:
# ansible-galaxy install -r requirements.yml
The roles will be installed to a hidden sub-directory in your home directory, in ~/.ansible/roles
. These roles are periodically updated. In the future, to update the installed roles, run the command above again, but with the --force
option included.
Now your server is ready to run the playbook. You would do that like so:
$ ansible-playbook site.yml
After the playbook finishes running, you should have a summary that looks something like this, though the number of packages may be different:
NO MORE HOSTS LEFT ******************************************************************************
PLAY RECAP ***************************************************************************************
humio1 : ok=68 changed=68 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
humio2 : ok=142 changed=142 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
humio3 : ok=73 changed=73 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
For each host in the cluster, verify that Humio is up and running through a web browser. Enter the address the server’s host name or IP address, followed by colon and the port number 8080 (e.g., http://example.com:8080
).
While Ansible has a host of ways to set configuration options, a simple way to configure the playbook is via the group_vars/all.yml
file.
The options that can be set in this file are the combination of the default options listed in the following files:
If these options are not overridden, the defaults in the listed files will be applied to your cluster when running the playbook.
It’s essential that you use the same Ansible scripts for upgrading as you did for installing. Making a separate set of scripts for upgrading is likely to cause accidental configuration changes to your cluster.
The following suggested changes are based on the example given in the installation section above. This section continues the running example using the bare-metal setup, with three hosts.
First, you’ll need to upgrade the requirements. Do this by entering the following:
ansible-galaxy install -r requirements.yml --force
Then, shut down Humio, Kafka, and Zookeeper.
ansible-playbook playbooks/stop-humio.yml
ansible-playbook playbooks/stop-kafka-zookeeper.yml
Now you should upgrade Zookeeper, which will also install the latest Zulu JDK.
Upgrading from Zookeeper 3.4.x
If you’re upgrading from a Zookeeper 3.4.x to Zookeeper 3.5.x — we recommend that you upgrade to at least 3.5.7 — you must upgrade in several steps:
First, in
group_vars/all.yml
add a line that reads like this:snapshot_trust_empty: true
Second, execute the following from the command-line:
ansible-playbook site.yml -t zookeeper
Now, remove the
snapshot_trust_empty
entry you just added togroup_vars.all.yml
. Last, execute the following at the command-line:ansible-playbook site.yml -t zookeeper
If you’re not upgrading from upgrading from Zookeeper 3.4.x and are already using a Zookeeper 3.5 version, but not the most recent release, you can upgrade Zookeeper like so:
ansible-playbook site.yml -t zookeeper
You’re now ready to upgrade Kafka. You can do so by executing this:
ansible-playbook site.yml -t kafka
When that’s done, login to one of the servers and execute the following:
/opt/zookeeper-3.5.6/bin/zkCli.sh -server localhost:2181 ls /brokers/ids | tail -n1
You should see output similar to this, which verifies that Kafka has started:
[1, 2, 3]
Prepare each upgrade step by editing the group_vars/all.yml
file so it sets the humio_version
you want to upgrade to.
Ensure all Humio nodes are stopped. Execute the upgrade by running
ansible-playbook site.yml -t humio
Check that Humio has started and the version is correct by going to the bottom of the main Humio UI page, which lists the running Humio version.
Note, if you’re upgrading past many Humio releases (e.g., upgrading from 1.8.5 to 1.16.0), you may not be able to upgrade in one step. Consult the Humio release notes for which upgrades are possible. Each release lists the minimum compatible version. For instance, to upgrade from 1.8.5 to 1.16.0, you cannot go directly from 1.8.5 to 1.16.0. You would read the 1.16.0 release notes and see that the minimum compatible version is 1.12.0. Picking some 1.12.x version, you see that the minimum compatible version is 1.10.0. Picking some 1.10.x version, you see that it is compatible with upgrading from 1.8.5. Your upgrade sequence becomes like this:
- 1.8.5 → 1.10.x
- 1.10.x → 1.12.x
- 1.12.x → 1.16.0
These upgrade steps must be performed one at a time, and Humio must be stopped between each upgrade.
Ansible is a great way of managing a Humio cluster. This section contains details on the Ansible roles used with Humio, the Ansible Galaxy roles, as well as a few sample projects that demonstrate how they are used.
Humio actively maintains four roles: humio.java; humio.kafka; humio.zookeeper and humio.server.
Additionally, Humio recommends using Ansible beats
The Packet.net and AWS EC2 sample projects demonstrate how Ansible Galaxy roles are used.
The easiest way to refer to these roles is using a requirements.yml
file in the root of the Ansible project. That file would contain the following:
- src: humio.java
- src: humio.zookeeper
- src: humio.kafka
- src: humio.server
- src: https://github.com/elastic/ansible-beats
The purpose of this role is to install Azul Zulu OpenJDK, which is required by Zookeeper, Kafka, and Humio. The defaults will work for most users, so no additional configuration is required.
This role installs Apache Zookeeper.
The configuration options are listed on the related Humio GitHub page.
We recommend having at least three Zookeeper nodes for high availability.
Kafka is at the heart of a Humio installation. Although the use of this exact role isn’t strictly necessary, it’s highly recommended since the Humio team will be maintaining the configuration defaults for Kafka. For optimal performance, it is a good idea to have one Kafka instance per Humio server in your cluster.
The configuration options are listed on related Humio GitHub page.
Humio is installed using the humio.server
role.
The configuration options can be found on the related Humio GitHub page.
We recommend keeping the humio_version
variable up-to-date to maintain compatibility. For more details on configuration, please see Humio’s Ansible-Server GitHub repository.
This is a third-party role maintained by Elastic, the makers of Beats. Currently it’s not pushed to Ansible Galaxy as an official role. The configuration of this role is straightforward, but we strongly recommend reading its documentation.
We recommend the following configuration for Humio nodes:
- role: "ansible-beats"
beat: "filebeat"
beat_conf:
"filebeat":
"inputs":
- paths:
- /var/log/humio/*/humio-debug.log
fields:
"@type": humio
"@tags": ["@host", "@type"]
multiline:
pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
negate: true
match: after
- paths:
- /var/log/kafka/server.log
fields:
"@type": kafka
"@tags": ["@host", "@type"]
multiline:
pattern: '^\[[0-9]{4}-[0-9]{2}-[0-9]{2}'
negate: true
match: after
output_conf:
"elasticsearch":
"hosts": ["localhost"]
"username": "developer"
Don’t forget to replace the values for hosts
and username
with the actual ingest token.