Humio Operator Resources

The Humio Operator manages a number of Humio components such as Repositories, Parsers and Ingest Tokens.

After installing the Operator by following the Operator Installation Guide, a HumioCluster resource can be created along with a number of other resource types.

Creating the Resource

Any of the resources can be created by applying the yaml via kubectl. First, create a resource.yaml file with the desired content, and then run:

kubectl create -f ./resource.yaml

The content of the yaml file may contain any number of resources. The full list of resources and examples are below.

HumioCluster

A HumioCluster resource tells the Humio Operator to create a Humio Cluster. Any number of HumioClusters may be created and managed by the Operator.

The content of the yaml file will depend on how the Humio Cluster should be configured to run. The next parts of this document explain some common cluster configurations.

Ephemeral with S3 Storage

A highly recommended Humio Cluster configuration is to run in ephemeral mode, using S3 for persistent storage. The Humio pods will be configured with hostPath, which mounts a directory from the host machine to the pod which is local storage, and ideally made up of NVME SSDs. This configuration also has fairly high resource limits and affinity policies that ensure no two Humio pods may be scheduled on the same host. This is an ideal storage configuration for production workloads running in AWS.

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: example-humiocluster
spec:
  image: "humio/humio-core:1.18.1"
  targetReplicationFactor: 2
  storagePartitionsCount: 24
  digestPartitionsCount: 24
  resources:
    limits:
      cpu: "8"
      memory: 56Gi
    requests:
      cpu: "6"
      memory: 52Gi
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: humio_node_type
            operator: In
            values:
            - core
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - humio
        topologyKey: kubernetes.io/hostname
  dataVolumeSource:
    hostPath:
      path: "/mnt/disks/vol1"
      type: "Directory"
  environmentVariables:
    - name: S3_STORAGE_BUCKET
      value: "my-cluster-storage"
    - name: S3_STORAGE_REGION
      value: "us-west-2"
    - name: S3_STORAGE_ENCRYPTION_KEY
      value: "my-encryption-key"
    - name: HUMIO_LOG4J_CONFIGURATION
      value: "log4j2-json-stdout.xml"
    - name: USING_EPHEMERAL_DISKS
      value: "true"
    - name: S3_STORAGE_PREFERRED_COPY_SOURCE
      value: "true"
    - name: HUMIO_JVM_ARGS
      value: -Xss2m -Xms2g -Xmx26g -server -XX:MaxDirectMemorySize=26g -XX:+UseParallelOldGC -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc+jni=debug:stdout -Dakka.log-config-on-start=on -Xlog:gc*:stdout:time,tags
    - name: "ZOOKEEPER_URL"
      value: "z-2-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181,z-3-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181,z-1-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181"
    - name: "KAFKA_SERVERS"
      value: "b-2-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092,b-1-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092,b-3-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092"

Ephemeral with GCS Storage

A highly recommended Humio Cluster configuration is to run in ephemeral mode, using GCS for persistent storage. The Humio pods will be configured with hostPath, which mounts a directory from the host machine to the pod which is local storage, and ideally made up of NVME SSDs. This configuration also has fairly high resource limits and affinity policies that ensure no two Humio pods may be scheduled on the same host. This is an ideal storage configuration production workloads running in GCP.

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: example-humiocluster
spec:
  image: "humio/humio-core:1.18.1"
  targetReplicationFactor: 2
  storagePartitionsCount: 24
  digestPartitionsCount: 24
  resources:
    limits:
      cpu: "8"
      memory: 56Gi
    requests:
      cpu: "6"
      memory: 52Gi
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: humio_node_type
            operator: In
            values:
            - core
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - humio-core
        topologyKey: kubernetes.io/hostname
  dataVolumeSource:
    hostPath:
      path: "/mnt/disks/vol1"
      type: "Directory"
  extraHumioVolumeMounts:
    - name: gcp-storage-account-json-file
      mountPath: /var/lib/humio/gcp-storage-account-json-file
      subPath: gcp-storage-account-json-file
      readOnly: true
  extraVolumes:
    - name: gcp-storage-account-json-file
      secret:
        secretName: gcp-storage-account-json-file
  environmentVariables:
    - name: GCP_STORAGE_ACCOUNT_JSON_FILE
      value: "/var/lib/humio/gcp-storage-account-json-file"
    - name: GCP_STORAGE_BUCKET
      value: "my-cluster-storage"
    - name: GCP_STORAGE_ENCRYPTION_KEY
      value: "my-encryption-key"
    - name: HUMIO_LOG4J_CONFIGURATION
      value: "log4j2-json-stdout.xml"
    - name: USING_EPHEMERAL_DISKS
      value: "true"
    - name: HUMIO_JVM_ARGS
      value: -Xss2m -Xms2g -Xmx26g -server -XX:MaxDirectMemorySize=26g -XX:+UseParallelOldGC -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc+jni=debug:stdout -Dakka.log-config-on-start=on -Xlog:gc*:stdout:time,tags
    - name: "ZOOKEEPER_URL"
      value: "z-2-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181,z-3-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181,z-1-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181"
    - name: "KAFKA_SERVERS"
      value: "b-2-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092,b-1-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092,b-3-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092"

Nginx Ingress with Cert Manager

Configuring Ingress with Cert Manager will ensure you have an Ingress resource that can be used to access the cluster, along with a valid cert provided by Cert Manager.

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: example-humiocluster
spec:
  image: "humio/humio-core:1.18.1"
  environmentVariables:
    - name: "ZOOKEEPER_URL"
      value: "humio-cp-zookeeper-0.humio-cp-zookeeper-headless:2181"
    - name: "KAFKA_SERVERS"
      value: "humio-cp-kafka-0.humio-cp-kafka-headless:9092"
  hostname: "humio.example.com"
  esHostname: "humio-es.example.com"
  ingress:
    enabled: true
    controller: nginx
    annotations:
      use-http01-solver: "true"
      cert-manager.io/cluster-issuer: letsencrypt-prod
      kubernetes.io/ingress.class: nginx

Nginx Ingress with Custom Path

If the case where you want to run Humio under a custom path.

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: example-humiocluster
spec:
  image: "humio/humio-core:1.18.1"
  environmentVariables:
    - name: "ZOOKEEPER_URL"
      value: "humio-cp-zookeeper-0.humio-cp-zookeeper-headless:2181"
    - name: "KAFKA_SERVERS"
      value: "humio-cp-kafka-0.humio-cp-kafka-headless:9092"
  hostname: "humio.example.com"
  esHostname: "humio-es.example.com"
  path: /logs
  ingress:
    enabled: true
    controller: nginx

Persistent Volumes

In the case where you cannot use bucket storage and ephemeral disks, it’s possible to rely on Kubernetes Persistent Volumes.

Note: persitent volumes using network block storage is significantly slower than local disks.

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: example-humiocluster
spec:
  image: "humio/humio-core:1.18.1"
  targetReplicationFactor: 2
  storagePartitionsCount: 24
  digestPartitionsCount: 24
  resources:
    limits:
      cpu: "8"
      memory: 56Gi
    requests:
      cpu: "6"
      memory: 52Gi
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: humio_node_type
            operator: In
            values:
            - core
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - humio-core
        topologyKey: kubernetes.io/hostname
  dataVolumePersistentVolumeClaimSpecTemplate:
    storageClassName: standard
    accessModes: [ReadWriteOnce]
    resources:
      requests:
        storage: 500Gi
  environmentVariables:
    - name: HUMIO_LOG4J_CONFIGURATION
      value: "log4j2-json-stdout.xml"
    - name: HUMIO_JVM_ARGS
      value: -Xss2m -Xms2g -Xmx26g -server -XX:MaxDirectMemorySize=26g -XX:+UseParallelOldGC -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc+jni=debug:stdout -Dakka.log-config-on-start=on -Xlog:gc*:stdout:time,tags
    - name: "ZOOKEEPER_URL"
      value: "z-2-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181,z-3-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181,z-1-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:2181"
    - name: "KAFKA_SERVERS"
      value: "b-2-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092,b-1-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092,b-3-my-zookeeper.c4.kafka.us-west-2.amazonaws.com:9092"

TLS

By default, TLS is enabled on each Humio pod. This is recommended, however, in some cases you may want TLS to be disabled. To do this, use the below configuration.

Note that if TLS is enabled here, it is assumed that TLS is also used for the connection to Kafka. If TLS on the Humio pods is disabled but the connection to Kafka should use SSL, then Kafka will need to be configured explicitly to use SSL. See extra kafka configs.

spec:
  tls:
    enabled: false

Extra Kafka Configs

Extra Kafka configs can be set and used by the Humio pods. This is mainly used to toggle TLS when communicating with Kafka. To enable TLS for example, set the configuration below.

Note that SSL is enabled by default when using TLS for the Humio pods. See TLS.

spec:
  extraKafkaConfigs: "security.protocol=SSL"

Zookeeper

Note that when TLS is enabled for Humio, TLS is by default also enabled for connections to Zookeeper. In some cases, such as with MSK, TLS will be enabled for the Kafka brokers but not for Zookeeper. To disable TLS for Zookeeper, set the following in values for the HUMIO_JVM_ARGS environment variable: -Dzookeeper.client.secure=false.

Authentication - SAML

When using SAML, it’s necessary to follow the SAML instruction and once the IDP certificate is obtained, you must create a secret containing that certificate using kubectl.

kubectl create secret generic <cluster-name>-idp-certificate --from-file=idp-certificate.pem=./my-idp-certificate.pem -n <namespace>

Once the secret has been created, a configuration similar to below can be added to enable SAML, adjusting for your cluster URL and IDP token.

spec:
  environmentVariables:
    - name: AUTHENTICATION_METHOD
      value: saml
    - name: AUTO_CREATE_USER_ON_SUCCESSFUL_LOGIN
      value: "true"
    - name: PUBLIC_URL
      value: https://my-cluster.example.com
    - name: SAML_IDP_SIGN_ON_URL
      value: https://accounts.google.com/o/saml2/idp?idpid=idptoken
    - name: SAML_IDP_ENTITY_ID
      value: https://accounts.google.com/o/saml2/idp?idpid=idptoken

Authentication - Single User

If running Humio in single user mode, you will need to set a password for the developer user. This can be done via a plain text environment variable or using a kuberenetes secret that is referenced by an environment variable. If supplying a secret, you must populate this secret prior to creating the HumioCluster resource otherwise the pods will fail to start.

By setting a password using an environment variable plain text value:

spec:
  environmentVariables:
    - name: "SINGLE_USER_PASSWORD"
      value: "MyVeryS3cretPassword"

By setting a password using an environment variable secret reference:

spec:
  environmentVariables:
    - name: "SINGLE_USER_PASSWORD"
      valueFrom:
        secretKeyRef:
          name: developer-user-password
          key: password

License Management

Humio licenses can be managed with the operator. In order to do so, a Kubernetes secret must be created which contains the value of the license. First create the secret as shown below, where <license> is the license content that is obtained from Humio:

kubectl create secret generic <cluster-name>-license --from-literal=data=<license> -n <namespace>

And then update the HumioCluster resource to use the secret reference:

spec:
   license:
     secretKeyRef:
       name: <cluster-name>-license
       key: data

Custom Service Accounts

ServiceAccount resources may be created prior to creating the HumioCluster resource and then the HumioCluster may be configured to use them rather than relying on the Humio Operator to create and manage the ServiceAccounts and bindings.

These can be configured for the initServiceAccountName, authServiceAccountName and serviceAccountName fields in the HumioCluster resource. They may be configured to use a shared ServiceAccount or separate ServiceAccounts. It is recommended to keep these separate unless otherwise required.

Shared Service Account

In the following example, we configure all three to share the same ServiceAccount.

To do this, first create the ServiceAccount, ClusterRole, ClusterRoleBinding, Role and RoleBinding, adjusting the namespace to be where the HumioCluster resource is created.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: humio
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: humio
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: humio
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: humio
subjects:
- kind: ServiceAccount
  name: humio
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: humio
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: humio
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: humio
subjects:
- kind: ServiceAccount
  name: humio
  namespace: default

Now include the following in the HumioCluster resource so it will use the shared ServiceAccount:

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: example-humiocluster
spec:
  humioServiceAccountName: humio
  initServiceAccountName: humio
  authServiceAccountName: humio
Separate Service Accounts

In the following example, we configure all three to use different ServiceAccount resources.

To do this, create the ServiceAccount, ClusterRole, ClusterRoleBinding for the initServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: humio-init
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: humio-init
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: humio-init
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: humio-init
subjects:
- kind: ServiceAccount
  name: humio-init
  namespace: default

Followed by the ServiceAccount, Role, RoleBinding for the authServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: humio-auth
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: humio-auth
  namespace: default
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: humio-auth
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: humio-auth
subjects:
- kind: ServiceAccount
  name: humio-auth
  namespace: default

And finally the ServiceAccount for the main Humio container.

Note: Ensure to configure the appropriate annotations if using IRSA.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: humio
  namespace: default

Now include the following in the HumioCluster resource so it will use the ServiceAccounts:

apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: example-humiocluster
spec:
  humioServiceAccountName: humio
  initServiceAccountName: humio-init
  authServiceAccountName: humio-auth

HumioRepository

A HumioRepository resource tells the Humio Operator to create a Humio Repository. Any number of HumioRepository resources may be created and managed by the Operator.

The content of the yaml file will depend on how the Humio Repository should be configured. The following shows an example HumioRepoistory resource.

apiVersion: core.humio.com/v1alpha1
kind: HumioRepository
metadata:
  name: example-humiorepository
  namespace: logging
spec:
  managedClusterName: example-humiocluster
  name: example-humiorepository
  description: "Example Humio Repository"
  retention:
    timeInDays: 30
    ingestSizeInGB: 50
    storageSizeInGB: 10

HumioParser

A HumioParser resource tells the Humio Operator to create a Humio Parser. Any number of HumioParser resources may be created and managed by the Operator.

The content of the yaml file will depend on how the Humio Parser should be configured. The following shows an example HumioParser resource.

apiVersion: core.humio.com/v1alpha1
kind: HumioParser
metadata:
  name: example-humioparser
  namespace: logging
spec:
  managedClusterName: example-humiocluster
  name: example-humioparser
  repositoryName: example-humiorepository
  parserScript: |
    case {
      kubernetes.pod_name=/fluentbit/
        | /\[(?<@timestamp>[^\]]+)\]/
        | /^(?<@timestamp>.*)\[warn\].*/
        | parseTimestamp(format="yyyy/MM/dd' 'HH:mm:ss", field=@timestamp);
      parseJson();
      * | kvParse()
    }
EOF

HumioIngestToken

A HumioIngestToken resource tells the Humio Operator to create a Humio Ingest Token. Any number of HumioIngestToken resources may be created and managed by the Operator.

The content of the yaml file will depend on how the Humio Ingest Token should be configured. The following shows an example HumioIngestToken resource.

apiVersion: core.humio.com/v1alpha1
kind: HumioIngestToken
metadata:
  name: example-humioingesttoken
  namespace: logging
spec:
  managedClusterName: example-humiocluster
  name: example-humioingesttoken
  repositoryName: example-humiorepository
  parserName: example-humioparser
  tokenSecretName: example-humioingesttoken-token

By specifying tokenSecretName, the Humio Operator will export the token as this secret name in Kubernetes. If you do not wish to have the token exported, omit this field from the spec.