Installing Humio on bare metal using Ansible

Introduction

This document details how to install Humio alongside Java, Zookeeper, and Kafka on each of three Ubuntu Server 16.04 or later nodes using Ansible playbooks.

Ansible is an automation tool that simplifies configuration management, program setup, cloud computing, intra-service operations, and more. It communicates over SSH to send and retrieve and information to and from remote machines, and can be used to manage a Humio cluster. For information on the Ansible roles used with Humio, please see Descriptions of Ansible Roles used by Humio.

Installation Requirements

Hardware

Your three-server Humio installation requires a minimum of (per server):

  • 16 CPU cores
  • 16 GB of memory
  • 1 GBit Network card

Disk space depends on the amount of ingested data per day and the number of retention days.

Retention days x GB injected / compression factor = needed disk space for a single server

For more information on retention in Humio, see Configuring Data Retention. For more information on compression, see Index Sizing

On AWS, for each server, start with Ubuntu M5.4XL. This instance type contains 16 vCPUs, 64 GB memory, up to 10 Gbps network). M5d/M5ad (instance storage) can also be used, and would have greater performance in general than its equivalent M5/M5a counterpart.

Network

In addition to port 22 (required to SSH into the server), the Humio nodes require port 8080 opened to incoming traffic to service requests to the web application and API. As these nodes are to be part of a cluster, they will need to have the following incoming ports open:

Application Protocol Port
Humio TCP 8080
Kafka TCP 9092
SSHD TCP 22
Zookeeper TCP 2181, 2888, 3888

Outbound Ports (for the host running the Ansible playbooks)

Application Protocol Port
SSHD TCP 22

The host from which you are running Ansible should be able to reach the apt repositories for installing Ansible, and GitHub.com (https) for fetching the playbooks, so plan network access accordingly.

System

Running this install using Ansible requires:

  • Python 3 on all hosts in the cluster
  • Ansible 2.6 or later installed on the machine where the playbook is run

Humio version 1.5.x requires:

  • Java Virtual Machine 11+
  • Zookeeper 3.4.X+ (recommended)
  • Kafka 2.2+ (recommended)

These Humio requirements will be installed by Ansible playbook below.

Supported operating systems:

  • Ubuntu 16.04 and higher
  • RHEL 7

Software Setup

  1. Choose a host from which to run the Ansible playbooks. This should be a separate host. On that host, install Ansible and clone the Humio Ansible Demo (you’ll be using the directory in that repository called bare_metal).

    If you’re unfamiliar with using Git or GitHub, then the Introduction to GitHub training is a great place to get started.

  2. Generate an SSH key pair to be used for the Ansible install. Your SSH public key must be added to the ~/.ssh/authorized_keys file on each host of the cluster for the user that will run Ansible.

    On the host where you’ll be running Ansible:

    $ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/ubuntu/.ssh/id_rsa):
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /home/ubuntu/.ssh/id_rsa.
    Your public key has been saved in /home/ubuntu/.ssh/id_rsa.pub.
    The key fingerprint is:
    SHA256:kFbz/CUrC9oiurULlRfN4Ft0Uc/ZYfAJ00 ubuntu@ip-127-0-0-1
    

    On the remote hosts, edit the authorized_keys file and paste in your ssh public key:

    $ nano ~/.ssh/authorized_keys
    
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAB.....
    

    All instructions after this to edit configuration files are performed on files from the cloned repository unless otherwise noted.

  3. Change to the directory where you cloned the repository. Edit the inventory.ini file. You’ll need to update your Humio cluster fully qualified domain names (FQDN) or IP addresses (replace the x.x.x.x). You’ll also need to define the correct path to the private key file Ansible will be using to authenticate to the remote hosts. By default, the ansible playbook is run as root. However, as many systems disable remote root login by default, you may want to edit the ansible_user to another user. The user chosen must be able to sudo to root without providing a password.

    $ cd ~/ansible-demo/bare_metal
    
    $ nano inventory.ini
    
    ############################################
    #
    # Define your Humio host machines here. Replace `ansible_host` with
    # their actual IPs. A typical small cluster has 3 hosts that each
    # run with all three roles (humio / kafka / zookeeper). You can add
    # more hosts if you prefer. It's best to keep the cluster an odd
    # number of hosts though. `cluster_index` should just increment for
    # each host in the cluster.
    #
    [humios]
    humio1 ansible_host=x.x.x.x cluster_index=1
    humio2 ansible_host=x.x.x.x cluster_index=2
    humio3 ansible_host=x.x.x.x cluster_index=3
    
    ############################################
    #
    # These variables are applied to each host.
    #
    [humios:vars]
    # By default, the ansible SSH port being used is the default SSH port
    # (`22`). If all hosts share the same port, then you can update
    # it here to something other than `22`. If each host has a
    # unique port number, then you can move it to the above section
    # and specify it on a per-host basis.
    ansible_port=22
    
    # Change this to whatever remote user will be executing the commands
    # on the servers you're installing to. It will need passwordless sudo
    # permissions or you will be prompted repeatedly for a password
    # every time you run the playbook.
    ansible_user=ubuntu
    
    # This is the path to the private key file on your local machine that
    # will be used to connect to the remote systems. This key should be
    # added to the ~/.ssh/authorized_keys file on all of the remote machines
    # under the user account specified above.
    ansible_ssh_private_key_file=/home/ubuntu/.ssh
    
    ############################################
    #
    # This sets up a `kafkas` group that contain the same hosts as the
    # `humios` group. If your kafka hosts aren't the same as your humio
    # hosts, then you can remove the `:children` here and specify new hosts
    # as seen in the `humios` section above.
    [kafkas:children]
    humios
    
    ############################################
    #
    # This sets up a `zookeepers` group that contain the same hosts as the
    # `humios` group. If your zookeeper hosts aren't the same as your humio
    # hosts, then you can remove the `:children` here and specify new hosts
    # as seen in the `humios` section above.
    [zookeepers:children]
    humios
    
  4. Run ifconfig on your cluster hosts. Note the name of your network interface.

    $ ifconfig
    ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
            inet x.x.x.x  netmask 255.255.255.0  broadcast x.x.x.255
            inet6 fe80::33:75ff:fe8a:ef46  prefixlen 64  scopeid 0x20<link>
            ether 02:33:75:8a:ef:46  txqueuelen 1000  (Ethernet)
            RX packets 110313  bytes 124262082 (124.2 MB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 39298  bytes 3437926 (3.4 MB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
            inet 127.0.0.1  netmask 255.0.0.0
            inet6 ::1  prefixlen 128  scopeid 0x10<host>
            loop  txqueuelen 1000  (Local Loopback)
            RX packets 878  bytes 82140 (82.1 KB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 878  bytes 82140 (82.1 KB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
  5. Edit the group_vars/all.yml file and update the humio_network_interface variable to reflect the name of your Humio host machine’s network interface.

    $ nano group_vars/all.yml
    
    ---
    ############################################
    #
    # AnsibleShipyard.ansible-zookeeper variables
    #
    # Currently, only versions in the 3.4.x series are supported.
    # 3.5.x versions may work, but there is a known issue with
    # the Ansible run wherein Zookeeper will bind to port 8080
    # which prevents Humio from starting. If you have to use a
    # newer version of Zookeeper before this has been updated
    # for that, then make sure you change that port by setting
    # `admin.serverPort` in the zoo.cfg.
    zookeeper_version: 3.4.14
    zookeeper_url: "http://archive.apache.org/dist/zookeeper/zookeeper-{{ zookeeper_version }}/zookeeper-{{ zookeeper_version }}    .tar.gz"
    
    ############################################
    #
    # humio.server variables
    #
    
    # the version of Humio to use. a list of versions with
    # their changes can be found in the changelog:
    # https://docs.humio.com/release-notes/
    humio_version: 1.5.23
    
    # the name of the network interface Humio will be listening
    # on as displayed in `ifconfig` output
    humio_network_interface: ens5
    
    ############################################
    #
    # humio.kafka variables
    #
    kafka_version: 2.1.1
    kafka_scala_version: 2.12
    
  6. Install the roles for Ansible:

    $ ansible-galaxy install -r requirements.yml
    
    - extracting ansible-beats to /home/ubuntu/.ansible/roles/ansible-beats
    - ansible-beats was installed successfully
    - downloading role 'haproxy', owned by entercloudsuite
    - downloading role from https://github.com/entercloudsuite/ansible-haproxy/archive/1.2.1.tar.gz
    - extracting entercloudsuite.haproxy to /home/ubuntu/.ansible/roles/entercloudsuite.haproxy
    - entercloudsuite.haproxy (1.2.1) was installed successfully
    - downloading role 'ansible-zookeeper', owned by AnsibleShipyard
    - downloading role from https://github.com/AnsibleShipyard/ansible-zookeeper/archive/0.23.0.tar.gz
    - extracting AnsibleShipyard.ansible-zookeeper to /home/ubuntu/.ansible/roles/AnsibleShipyard.ansible-zookeeper
    - AnsibleShipyard.ansible-zookeeper (0.23.0) was installed successfully
    - downloading role 'kafka', owned by humio
    - downloading role from https://github.com/humio/ansible-kafka/archive/0.1.8.tar.gz
    - extracting humio.kafka to /home/ubuntu/.ansible/roles/humio.kafka
    - humio.kafka (0.1.8) was installed successfully
    - adding dependency: humio.java (0.2.0)
    - downloading role 'server', owned by humio
    - downloading role from https://github.com/humio/ansible-server/archive/0.3.3.tar.gz
    - extracting humio.server to /home/ubuntu/.ansible/roles/humio.server
    - humio.server (0.3.3) was installed successfully
    - dependency humio.java already pending installation.
    - downloading role 'java', owned by humio
    - downloading role from https://github.com/humio/ansible-java/archive/0.2.0.tar.gz
    - extracting humio.java to /home/ubuntu/.ansible/roles/humio.java
    - humio.java (0.2.0) was installed successfully
    

    The roles will be installed to ~/.ansible/roles. These roles are periodically updated. To update the installed roles, run the command above with the --force parameter.

  7. Run the playbook:

    $ ansible-playbook site.yml
    

    After the playbook completes, you should have a summary that looks something like this, though the number of packages may be different:

    NO MORE HOSTS LEFT  ******************************************************************************
    
    PLAY RECAP ***************************************************************************************
    humio1     : ok=68   changed=68    unreachable=0    failed=0    skipped=0   rescued=0    ignored=0
    humio2     : ok=142  changed=142   unreachable=0    failed=0    skipped=0   rescued=0    ignored=0
    humio3     : ok=73   changed=73    unreachable=0    failed=0    skipped=0   rescued=0    ignored=0
    
  8. For each host in the cluster, verify that Humio is up and running through a web browser: http://server_IP_or_hostname:8080