Humio Operator on AWS

The following explains how to quickly set up a Humio cluster using the Humio Operator.

As part of the Quick Start, we will create AWS resources such as MSK, EKS and S3 bucket using terraform, and then install the Humio Operator using helm. For production installations, it is recommended to follow the full Installation Guide and decide how running Humio fits into your infrastructure.

Prerequisites

Tooling

Authentication and Permissions

Ensure you are logged into the AWS through the terminal and have the necessary permissions to create resources such as EKS and MSK clusters and S3 buckets. For additional AWS authentication options, see the authentication section of the terraform AWS provider documentation.

When authenticating with kubectl later in the doc, it will expect that the aws-iam-authenticator is installed, and it will use the above AWS authentication.

Create AWS Resources

The following will create an EKS cluster with three nodes by default, an MSK cluster with three nodes by default, an S3 bucket where the Humio data will be stored, and a number of dependent resources such as a VPC, subnets, security groups and an internet gateway.

First, clone the operator quick-start repo where the terraform quick start files are stored:

git clone https://github.com/humio/humio-operator-quickstart
cd humio-operator-quickstart/aws

Note: review the default values in the variables.tf file. It’s possible to overwrite these, but be careful as changing some may have undesirable effects. A common change may be overwriting region, but changing instance types for example will have downstream consequences such as when setting the resources for the HumioCluster.

And then init and apply terraform:

terraform init
terraform apply

Once the terraform resources have been applied, configure kubectl to point to the newly created EKS cluster:

export KUBECONFIG=./$(ls kubeconfig_humio-quickstart*)

And then verify you can authenticate with the EKS cluster and see pods:

kubectl get pods -A

Install Humio Operator Dependencies

It is necessary to have both cert-manager and the nginx-ingress controller if running the Humio Operator with TLS and/or ingress enabled.

Install Cert Manager

kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v0.16.0 \
  --set installCRDs=true

Once cert manager is installed, create a clusterissuer which will be used to issue the certs for our Humio cluster:

export MY_EMAIL=<your email address>
cat >>clusterissuer.yaml<<EOF
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: $MY_EMAIL
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
EOF
kubectl apply -f clusterissuer.yaml

Install the Nginx Ingress Controller

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.35.0/deploy/static/provider/aws/deploy.yaml

Install the Humio Operator

Now that you have authenticated with the EKS cluster, it’s time to create the Humio Operator.

kubectl create namespace logging
helm repo add humio-operator https://humio.github.io/humio-operator
helm repo update
helm install humio-operator humio-operator/humio-operator \
  --namespace logging \
  --set installCRDs=true

You can check the status of the Humio Operator pod by running:

kubectl get pods -n logging

Prepare for Creating Humio Cluster

Before creating a cluster, we need set a number of attributes specific to the cluster. We will set these as environment variables and then reference them later when creating the HumioCluster spec.

First, generate an encrption key that will be used by Humio to encrypt the data in the S3 bucket.

kubectl create secret generic bucket-storage --from-literal=encryption-key=$(cat /dev/urandom | env LC_CTYPE=C tr -dc a-zA-Z0-9 | head -c 64) -n logging

Also create a developer user password which we will use to login once the Humio cluster is up. By default we will start Humio in single-user mode.

kubectl create secret generic developer-user --from-literal=password=$(cat /dev/urandom | env LC_CTYPE=C tr -dc a-zA-Z0-9 | head -c 16) -n logging

We will need the connection strings for Kafka and Zookeeper, as well as the name of the S3 bucket and Role ARN which has access to write to the bucket. We can obtain those from terraform:

export KAFKA_BROKERS=$(terraform output bootstrap_brokers_tls)
export ZOOKEEPER_CONNECTION=$(terraform output zookeeper_connect_string)
export ROLE_ARN=$(terraform output oidc_role_arn)
export BUCKET_NAME=$(terraform output s3_bucket_name)

Additionally, we’ll need to set hostnames for the HTTP and Elasticsearch ingresses. Use your own domain here. In order to use ingress with Let’s Encrypt encryption, a DNS record must be created later in this process.

export INGRESS_HOSTNAME=humio-quickstart.example.com
export INGRESS_ES_HOSTNAME=humio-quickstart-es.example.com

Also set the region:

export REGION=us-west-2

Create a Humio Cluster

Finally, we can configure a yaml file which contains the HumioCluster spec.

cat >>humiocluster.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: humio-quickstart
  namespace: logging
spec:
  image: "humio/humio-core:1.18.1"
  nodeCount: 3
  targetReplicationFactor: 2
  storagePartitionsCount: 24
  digestPartitionsCount: 24
  extraKafkaConfigs: "security.protocol=SSL"
  tls:
    enabled: true
  autoRebalancePartitions: true
  hostname: ${INGRESS_HOSTNAME}
  esHostname: ${INGRESS_ES_HOSTNAME}
  ingress:
    enabled: true
    controller: nginx
    annotations:
      use-http01-solver: "true"
      cert-manager.io/cluster-issuer: letsencrypt-prod
      kubernetes.io/ingress.class: nginx
  resources:
    limits:
      cpu: "2"
      memory: 12Gi
    requests:
      cpu: "1"
      memory: 6Gi
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - humio
        topologyKey: kubernetes.io/hostname
  dataVolumeSource:
    hostPath:
      path: "/mnt/disks/vol1"
      type: "Directory"
  humioServiceAccountAnnotations:
    eks.amazonaws.com/role-arn: "${ROLE_ARN}"
  environmentVariables:
    - name: S3_STORAGE_BUCKET
      value: "${BUCKET_NAME}"
    - name: S3_STORAGE_REGION
      value: "${REGION}"
    - name: LOCAL_STORAGE_PERCENTAGE
      value: "80"
    - name: LOCAL_STORAGE_MIN_AGE_DAYS
      value: "7"
    - name: S3_STORAGE_ENCRYPTION_KEY
      valueFrom:
        secretKeyRef:
          name: bucket-storage
          key: encryption-key
    - name: USING_EPHEMERAL_DISKS
      value: "true"
    - name: S3_STORAGE_PREFERRED_COPY_SOURCE
      value: "true"
    - name: SINGLE_USER_PASSWORD
      valueFrom:
        secretKeyRef:
          name: developer-user
          key: password
    - name: HUMIO_JVM_ARGS
      value: -Xss2m -Xms2g -Xmx6g -server -XX:MaxDirectMemorySize=6g -XX:+UseParallelOldGC -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc+jni=debug:stdout -Dakka.log-config-on-start=on -Xlog:gc*:stdout:time,tags -Dzookeeper.client.secure=false
    - name: "ZOOKEEPER_URL"
      value: "${ZOOKEEPER_CONNECTION}"
    - name: "KAFKA_SERVERS"
      value: "${KAFKA_BROKERS}"
EOF

And then apply it:

kubectl apply -f humiocluster.yaml

Validate the Humio Cluster

Check the status of the HumioCluster by running:

kubectl get humiocluster -n logging

Initially the cluster will go into the state Bootstrapping as it starts up, but once it starts all nodes it will go into the state of Running.

Access the Humio Cluster

Configure DNS

To access the HumioCluster as well as allow cert-manager to generate a valid certificate for the cluster, there must be a DNS record added for $INGRESS_HOSTNAME as well as $INGRESS_ES_HOSTNAME which point to the NLB name of the ingress service. To get the NLB name of the ingress service, run:

export INGRESS_SERVICE_HOSTNAME=$(kubectl get service ingress-nginx-controller -n ingress-nginx -o template --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")

Configurnig the DNS record depends on your DNS provider. If using AWS Route53, create an Alias record which points both names directly to the INGRESS_SERVICE_HOSTNAME. For other providers, create a CNAME which points them to the INGRESS_SERVICE_HOSTNAME.

Logging In

Once the DNS records exist, you can now open https://${INGRESS_HOSTNAME} in a browser and login. Since we are using single-user authentication mode, the username will be developer and the password can be obtained by running:

kubectl get secret developer-user -n logging -o=template --template={{.data.password}} | base64 -D

Note: this command uses base64 -D, but you may need to use base64 --decode if using linux.

Sending Data to the Cluster

To send data to the cluster, we will create a new Repository, obtain the ingest token, and then configure fluentbit to gather logs from all the pods in our Kubernetes cluster and send them to Humio.

Create Repo, Parser and Ingest Token

Create the Repository using the Humio Operator by running the following.

cat >humiorepository.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioRepository
metadata:
  name: quickstart-cluster-logs
  namespace: logging
spec:
  managedClusterName: humio-quickstart
  name: quickstart-cluster-logs
  description: "Cluster logs repository"
  retention:
    timeInDays: 30
    ingestSizeInGB: 50
    storageSizeInGB: 10
EOF
kubectl apply -f humiorepository.yaml

Next, create a parser which will be assigned to the repository and later on to the ingest token. It is also possible to skip this step and rely on one of the built-in parsers.

cat >humioparser.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioParser
metadata:
  name: quickstart-cluster-parser
  namespace: logging
spec:
  managedClusterName: humio-quickstart
  name: quickstart-cluster-parser
  repositoryName: quickstart-cluster-logs
  parserScript: |
    case {
      kubernetes.pod_name=/fluentbit/
        | /\[(?<@timestamp>[^\]]+)\]/
        | /^(?<@timestamp>.*)\[warn\].*/
        | parseTimestamp(format="yyyy/MM/dd' 'HH:mm:ss", field=@timestamp);
      parseJson();
      * | kvParse()
    }
EOF
kubectl apply -f humioparser.yaml

Now create an Ingest Token using the Humio Operator and assign it to the repository and use the parser that were created in the previous steps.

cat >humioingesttoken.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioIngestToken
metadata:
  name: quickstart-cluster-ingest-token
  namespace: logging
spec:
  managedClusterName: humio-quickstart
  name: quickstart-cluster-ingest-token
  repositoryName: quickstart-cluster-logs
  parserName: quickstart-cluster-parser
  tokenSecretName: quickstart-cluster-ingest-token
EOF
kubectl apply -f humioingesttoken.yaml

Since we set tokenSecretName in the Ingest Token spec, the token content is stored as a secret in Kubernetes. We can then fetch the token:

export INGEST_TOKEN=$(kubectl get secret quickstart-cluster-ingest-token -n logging -o template --template '{{.data.token}}' | base64 -D)

Note: this command uses base64 -D, but you may need to use base64 --decode if using linux.

Ingest Logs into the Cluster

Now we’ll install fluentbit into the Kubernetes cluster and configure the endpoint to point to our $INGRESS_ES_HOSTNAME, and use the $INGEST_TOKEN that was just created.

helm repo add humio https://humio.github.io/humio-helm-charts
helm repo update
cat >> humio-agent.yaml<<EOF
humio-fluentbit:
  enabled: true
  humioHostname: $INGRESS_ES_HOSTNAME
  es:
    tls: true
    port: 443
    inputConfig: |-
      [INPUT]
          Name             tail
          Path             /var/log/containers/*.log
          Parser           docker
          # The path to the DB file must be unique and not conflict with another fluentbit running on the same nodes.
          DB               /var/log/flb_kube.db
          Tag              kube.*
          Refresh_Interval 5
          Mem_Buf_Limit    512MB
          Skip_Long_Lines  On
    resources:
      limits:
        cpu: 100m
        memory: 1024Mi
      requests:
        cpu: 100m
        memory: 512Mi
EOF
helm install humio humio/humio-helm-charts \
  --namespace logging \
  --set humio-fluentbit.token=$INGEST_TOKEN \
  --values humio-agent.yaml

Verify Logs are Ingested

  • Go to the Humio UI and click on the quickstart-cluster-logs repository
  • In the search field, enter "kubernetes.container_name" = "humio-operator" and click Run
  • Verify you can see the Humio Operator logs

Cleanup

It’s possible to run an individual kubectl delete on each resource, but since we have created a dedicated EKS cluster, we will delete everything we just created by deleting the cluster resource and then running terraform destroy.

First, delete the cluster so pods no longer write to the S3 bucket:

kubectl delete -f humiocluster.yaml

Prior to running terraform destroy, it will be necessary to ensure the S3 bucket that was created by terraform is emptied. The name of the S3 bucket can be obtained by running:

terraform output s3_bucket_name

Now empty the S3 bucket either through the AWS console or CLI.

Next we’ll need to ensure the nginx-ingress-controller’s service is removed. This way the NLB will be removed from AWS. If this is not done, terraform will get stuck deleting the subnets when performing a terraform destroy:

kubectl delete -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.35.0/deploy/static/provider/aws/deploy.yaml

Once the bucket has been emptied and the nginx-ingress-controller has been deleted, run:

terraform destroy

Also delete the DNS records which were created for $INGRESS_HOSTNAME and $INGRESS_ES_HOSTNAME.