Installing a Single Node

This document shows you how to install Humio on a single Ubuntu Server 18.04 or later LTS node. However, Humio doesn’t recommend this configuration for production purposes. Production environments should be clustered for failover and redundancy purposes. This configuration is only provided for learning purposes and for proof of concept situations. Single server and multi-server installations do have differences in their set up.

For multi-server installation, please see Running Humio in a Cluster.

System Requirements

In order to install and run Humio on a single server, the server will requires a minimum of 16 CPU cores, 16 GB of memory, and a 1 GBit Network card. Disk space will depend on the amount of ingested data per day and the number of retention days. This is calculated as follows:

Retention days x GB injected / compression factor = needed disk space for a single server

For more information on retention in Humio, see Retention. For more information on compression in Humio, see Instance Sizing.

On Amazon’s AWS, for a single server, start with Ubuntu M5.4XL. This instance type contains 16 vCPUs, 64 GB memory, up to 10 Gbps network). Using AWS with Ubuntu is the easiest method for installing Humio — if you’re not going to use Humio Cloud.

In addition to port 22 (required to SSH into the server) the Humio node requires port 8080 opened to incoming traffic to service requests to the web application and API. If the new node is to be part of a cluster it will need to have TCP incoming ports 8080 and 9200 open, although 9200 is optional.

As for additional software, you’ll have to install the following:

The following explains what you need to do to prepare the server for this software. After that there are four sections, one for each component listed above, each explaining how to install and configure that software — and how to start them.

Server Preparation

Before beginning to install Humio and the needed components, you’ll need to prepare the server. Assuming you’re using AWS and Ubuntu, confirm the Ubuntu version by executing the following from the command-line:

# lsb_release -a

Humio won’t install correctly on Ubuntu versions earlier than 16.04, but you should use at least Ubuntu 18.04. Before installing Zookeeper, Kafka, and Humio, update and upgrade the system with apt-get like so:

# apt-get update
# apt-get upgrade

Then create non-administrative users (humio, zookeeper, kafka) to run Humio, Zookeeper, and Kakfa. The Humio account owns the Humio files and directories:

# adduser humio --shell=/bin/false --no-create-home --system --group
# adduser zookeeper --shell=/bin/false --no-create-home --system --group
# adduser kafka --shell=/bin/false --no-create-home --system --group

We recommend adding these three users to the DenyUsers section of your nodes /etc/ssh/sshd_config file to prevent them from being able to ssh or sftp into the node, and remember to restart the sshd daemon after making the change.

Once the system has finished updating and the users are created, you can begin installing and configuring the required components.

Install the JVM

Humio is a Scala-based application that requires a JVM (Java Virtual Machine), version 11 or higher. We recommend using Azul’s JVM, as it is used for Humio Cloud, and so it is well-tested for compatibility. For more information on selecting and configuring the JVM, see JVM Configuration.

First, import Azul’s public key. This can be done by executing the following from the command-line:

# apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 0xB1998361219BD9C9

Next, add the Azul package to the apt repository and then update like so:

# apt-add-repository 'deb http://repos.azulsystems.com/ubuntu stable main'
# apt-get update

# apt-get install zulu-13

With the repository in place, in the last line here, you can now install version 13.0.2 of Azul’s Zulu JVM. When it’s done, you can confirm that the JVM is installed by executing the first line below:

# java --version

openjdk 13.0.2 2020-01-14
OpenJDK Runtime Environment Zulu13.29+9-CA (build 13.0.2+6-MTS)
OpenJDK 64-Bit Server VM Zulu13.29+9-CA (build 13.0.2+6-MTS, mixed mode, sharing)

Install Zookeeper

Humio uses Kafka to buffer ingest and sequence events among the nodes of a Humio cluster. Kafka requires Zookeeper for coordination. To install Zookeeper navigate to opt directory and download a 3.4.x version of Zookeeper:

# cd /opt
# wget http://us.mirrors.quenda.co/apache/zookeeper/zookeeper-x.x.x/zookeeper-x.x.x-bin.tar.gz

After the file downloads, untar the Zookeeper file and create a symbolic to /opt/zookeeper like so:

# tar -zxf zookeeper-x.x.x-bin.tar.gz
# ln -s /opt/zookeeper-x.x.x-bin /opt/zookeeper

Navigate to zookeeper sub-directory and create a data directory for Zookeeper:

# cd /opt/zookeeper
# mkdir -p /var/zookeeper/data

Using a simple text editor, create the Zookeeper configuration file in the conf sub-directory. Name the file, zoo.cfg. Copy the lines below into that file:

tickTime = 2000
dataDir = /var/zookeeper/data
clientPort = 2181
initLimit = 5
syncLimit = 2
maxClientCnxns=60
autopurge.purgeInterval=1
admin.enableServer=false
4lw.commands.whitelist=*
server.1=127.0.0.1:2888:3888
admin.enableServer=false

Create a myid file in the data sub-directory with just the number 1 as its contents. They you can start Zookeeper to verify that the configuration is working:

# bash -c 'echo 1 > /var/zookeeper/data/myid'
# ./bin/zkServer.sh start

ZooKeeper JMX enabled by default
Using config: /opt/zookeeper-x.x.x/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

You can also verify that Zookeeper is running by logging in through the command line interface like so:

# ./bin/zkCli.sh
Connecting to localhost:2181
2019-06-20 20:56:52,767 [myid:] - INFO [main:Environment@100] - Client
...
2019-06-20 20:56:52,822 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x10000f560b50000, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]

The results you see should look something like the above. To exit, hit Ctrl-c once the status is reported as connected.

There’s a little more configuring to do. Stop Zookeeper and change the ownership of the zookeeper directory like so, adjusting for the version number you installed:

# ./bin/zkServer.sh stop
# chown -R zookeeper:zookeeper /opt/zookeeper-x.x.x
# chown -R zookeeper:zookeeper /var/zookeeper/data

So that Zookeeper will start when the server is rebooted, you’ll need to create a Zookeeper service file named zookeeper.service in the /etc/systemd/system/ sub-directory. Use a simple text editor to create the file and copy the following lines into it_

[Unit]
Description=Zookeeper Daemon
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target

[Service]
Type=forking
WorkingDirectory=/opt/zookeeper
User=zookeeper
Group=zookeeper
ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfg
TimeoutSec=30
Restart=on-failure

[Install]
WantedBy=default.target

Now you’re ready to start the Zookeeper service. Enter the first line below to start it. When it finishes, enter the second line to check that it’s running and there are no errors reported:

# systemctl start zookeeper
# systemctl status zookeeper

# systemctl enable zookeeper

After breaking out of the status by pressing q, enter the last line above to set the Zookeeper service to start when the server boots up.

Install Kafka

To install Kafka, you’ll need to go to the opt directory and download the latest release. You can do that like so with wget:

# cd /opt
# wget https://www-us.apache.org/dist/kafka/x.x.x/kafka_x.x.x.x.tgz

You would adjust this last line, change the Xs to the latest version number. Once it downloads, untar the file and then create the directories it needs like this:

# tar zxf kafka_x.x.x.x.tgz

# mkdir /var/log/kafka
# mkdir /var/kafka-data
# chown kafka:kafka /var/log/kafka
# chown kafka:kafka /var/kafka-data

# ln -s /opt/kafka_x.x.x.x /opt/kafka

The four lines in the middle here create the directories for Kafka’s logs and data, and changes the ownership of those directories to the kafka user. The last line creates a symbolic to /opt/kafka. You would adjust that, though, replacing the Xs with the version number.

Using a simple text editor, open the Kafka properties file, server.properties, located in the kafka/config sub-directory. You’ll need to set a few options — the lines below are not necessarily the order in which they’ll be found in the configuration file:

broker.id=1
log.dirs=/var/kafka-data
delete.topic.enable = true

The first line sets the broker.id value to match the server number (myid) you set when configuring Zookeeper. The second sets the data directory. The third line should be added to the end of the configuration file. When you’re finished, save the file and change the owner to the kafka user:

# chown -R kafka:kafka /opt/kafka_x.x.x.x

You’ll have to adjust this to the version you installed. Note, changing the ownership of the link /opt/kafka doesn’t change the ownership of the files in the directory.

Now you’ll need to create a service file for starting Kafka. Use a simple text editor to create a file named, kafka.service in the /etc/systemd/system/ sub-directory. Then add the following lines to the service file:

[Unit]
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
LimitNOFILE=800000
Environment="LOG_DIR=/var/log/kafka"
Environment="GC_LOG_ENABLED=true"
Environment="KAFKA_HEAP_OPTS=-Xms512M -Xmx4G"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
Restart=on-failure

[Install]
WantedBy=multi-user.target

Now you’re ready to start the Kafka service. Enter the first line below to start it. When it finishes, enter the second line to check that it’s running and there are no errors reported:

# systemctl start kafka
# systemctl status kafka

# systemctl enable kafka

After breaking out of the status by pressing q, enter the last line above to set the Kafka service to start when the server boots up.

Install Humio

Having installed Zookeeper and Kafka, you’re now ready to install Humio on your server. First, create the Humio system directories and give the humio user ownership of them:

# mkdir -p /opt/humio /etc/humio/filebeat /var/log/humio /var/humio/data

# chown humio:humio /opt/humio /etc/humio/filebeat
# chown humio:humio /var/log/humio /var/humio/data

# cd /opt/humio/

# wget https://repo.humio.com/repository/maven-releases/com/humio/server/x.x.x/server-x.x.x.jar

# ln -s /opt/humio/server-x.x.x.jar /opt/humio/server.jar

The second to last line here is used to download latest release from https://repo.humio.com/service/rest/repository/browse/maven-releases/com/humio/server/. You’ll have to adjust that line for the correction directory and file name based on the version at the time. After you’ve downloaded it, enter the last line here to create a symbolic link to it.

Using a simple text editor, create the Humio configuration file, server.conf in the /etc/humio directory. Copy the following content into that file, changing the DNS names or IP addresses where appropriate:

BOOTSTRAP_HOST_ID=1
DIRECTORY=/var/humio/data
HUMIO_AUDITLOG_DIR=/var/log/humio
HUMIO_DEBUGLOG_DIR=/var/log/humio
HUMIO_PORT=8080
ELASTIC_PORT=9200
ZOOKEEPER_URL=127.0.0.1:2181
KAFKA_SERVERS=127.0.0.1:9092
EXTERNAL_URL=http://127.0.0.1:8080
PUBLIC_URL=http://127.0.0.1
HUMIO_SOCKET_BIND=0.0.0.0
HUMIO_HTTP_BIND=0.0.0.0

For more information on each of these environment variables, see the Configuration reference page.

Now create a service file, humio.service in the /etc/systemd/system/ sub-directory. Add these lins to that file_

[Unit]
Description=Humio service
After=network.service

[Service]
Type=notify
Restart=on-abnormal
User=humio
Group=humio
LimitNOFILE=250000:250000
EnvironmentFile=/etc/humio/server.conf
WorkingDirectory=/var/humio
ExecStart=/usr/bin/java -server -XX:+UseParallelOldGC -Xms4G -Xmx32G -XX:MaxDirectMemorySize=8G -Xss2M --add-exports java.base/jdk.internal.util=ALL-UNNAMED -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc*,gc+jni=debug:file=/var/log/humio/gc_humio.log:time,tags:filecount=5,filesize=102400 -Dhumio.auditlog.dir=/var/log/humio -Dhumio.debuglog.dir=/var/log/humio -jar /opt/humio/server.jar

[Install]
WantedBy=default.target

Last, change ownership of the Humio files and start the Humio service:

# chown -R humio:humio /opt/humio /etc/humio/filebeat
# chown -R humio:humio /var/log/humio /var/humio/data

# systemctl start humio

Verify that Humio is up and running through a web browser: http://server_IP_or_hostname:8080