Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon Managed Streaming for Apache Kafka (Amazon MSK) should only be used if you’re deploying Humio in AWS.

When deployed on AWS, Humio is compatible with Amazon MSK. The official AWS MSK documentation can be found here.

Prerequisites

  • Ensure you have the AWS CLI tools installed and configured on your machine, including your Access key and Secret key. This is so you can create custom configurations for your MSK instance.

  • Have a Virtual Private Cloud (VPC) set up for your Availability Zones on AWS with a subnet for each Kafka broker you’d like to have. If you don’t, please follow Steps 1 and 2 on the AWS Documentation here.

Creating your custom MSK configuration

You can find documentation for how to add your custom config file to MSK here. If no configuration file is supplied, MSK will change these configuration parameters from normal Kafka defaults. Because Humio requires certain configuration parameters to be implemented for Kafka, you need to make a custom configuration file for MSK to use.

  1. Create a file named `kafka.properties` and add these values to it. The full list of other MSK parameters that can be used can be found here.

    replica.fetch.max.bytes=104857600
    message.max.bytes=104857600
    compression.type=producer
    unclean.leader.election.enable=false
    
  2. Create the configuration file for use within MSK. Name and description can be anything. Name cannot contain spaces.

    aws kafka create-configuration --name "Humio-MSK-Configuration" --description "Custom Humio configuration for MSK" --kafka-versions "2.3.1" --server-properties file://config-file-path
    
  3. You should see a success message similar to this.

    {
    "Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/Humio-MSK-Configuration/abcdabcd-abcd-1234-abcd-abcd123e8e8e-1",
    "CreationTime": "2019-05-21T00:54:23.591Z",
    "Description": "Custom Humio Configuration for MSK",
    "KafkaVersions": [
        "2.3.1"
    ],
    "LatestRevision": {
        "CreationTime": "2019-05-21T00:54:23.591Z",
        "Description": "Custom Humio Configuration for MSK",
        "Revision": 1
    },
    "Name": "Humio-MSK-Configuration"
    }
    
    

Creating your MSK Cluster using the AWS console

  1. Once you’re logged into the console, go to AWS MSK Service > Create cluster.

  2. Give the Cluster any name. Pick the VPC you have created for this MSK Cluster. Documentation for setting up the VPC can be found here following Steps 1 and 2.

  3. Select your Kafka version. The recommended version for Humio is 2.4.0.

  4. Select your Availability Zones and a subnet for each one. We recommend three Availability Zones.

  5. Add the custom configuration file which we uploaded earlier under “Creating your custom MSK Configuration File.” Select “Use a custom configuration” and select the name of the configuration file you gave when you created it.

  6. Create your brokers. Kafka brokers use m5 instance types. Specifications for these can be found here under the m5 tab. Define how many brokers you’re going to have per availability zones.

  7. Optionally, add some tags for your cluster. You can find more information on tagging here.

  8. Define how much Storage each broker is going to have. MSK uses AWS’ Elastic Block Storage. The amount of storage you chose should correlate to how much data you’re ingesting. There are docs here.

    You cannot decrease the storage once created.

  9. We recommend encryption being enabled within the cluster. Encryption between clients and brokers is possible but requires some additional steps. This can be found in the Configuring Humio section under Configuring Encryption between Humio and Kafka.

    Important: If you select plaintext brokers, this will be on port 9092; brokers using TLS will be accessible on port 9094.

  10. If you require using TLS Client authentication then you can read more about how to do this here.

  11. Choose your monitoring. Basic monitoring is available for free but the enhanced monitoring does come at an additional cost. More information can be found here.

  12. Advance Settings. When selecting your security group it’s important to note that your Humio instance must be able to connect to MSK. This can be either allowing inbound and outbound rules for the IP of your Humio instance, or if Humio is running on AWS, adding them to the same security group.

  13. Create your Cluster.

Configuring Humio

Once your MSK cluster has been created (which can take around 15 minutes), you can then deploy Humio, specifying the correct Kafka and Zookeeper host information.

To find out the Kafka and Zookeeper host information, you can go into your MSK cluster in the AWS Console and View Client Information. If you are using the PLAINTEXT Kafka host information this will be on port 9092, and TLS brokers will be on port 9094.

When running Humio on AWS EC2 instances you should ensure that the security group rules allow Humio to access MSK and vice versa, and we recommend you do this by keeping the two in the same security group. More information on MSK docs here.

Include your Kafka and Zookeeper host information into your Humio configuration

KAFKA_SERVERS=b-1.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:9092,b-2.test-msk-cluster.luq8jf.c3.kafka.us-east--2.amazonaws.com:9092,b-3.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:9092

ZOOKEEPER_URL=z-2.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:2181,z-3.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:2181,z-1.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:2181

Configuring Encryption between Humio and Kafka

It’s possible to enable encrypted connections between Humio and Kafka brokers. To do this you need to create a file on each Humio node that contains this parameter

  security.protocol=SSL
  1. In your Humio configuration file, add the parameter `EXTRA_KAFKA_CONFIGS_FILE=` which points to the name of the file that you just created.

  2. Ensure that your `KAFKA_SERVERS` Humio parameter now points to the Kafka brokers that are using TLS which should be on port 9094.

Once you run Humio, you can visit the administration dashboard to ensure you can see your MSK Kafka Brokers and Zookeeper nodes.