To configure Humio’s basic functionality, you’ll set environment variables. The example configuration file below contains comments describing each option.

Docker Tip

When running Humio in Docker, you can pass set the --env-file= flag and keep your configuration in a file. For a quick introduction to setting configuration options, see running Humio as a Docker container.

Docker only loads the environment file when the container is initially created. If you make changes to the settings in your environment file, simply stopping and starting the container will not work. You need to docker rm the container and docker run it again to pick up changes.

Example configuration

# The stack size should be at least 2M.

# Make Humio write a backup of the data files:
# Backup files are written to mount point "/backup" by default (when run in the Humio Docker containers),
# Otherwise the backup directory can be specified
# By default, data in backup is deleted 7 days after it has been deleted in Humio. This behavior is configurable.


# ID to choose for this server when starting up the first time.
# Leave commented out to autoselect the next available ID.
# If set, the server refuses to run unless the ID matches the state in data.
# If set, must be a positive nonzero integer.
# Numbers in the range of 1 through 511 are recommended.

# Set the uuid of this server in the cluster to use a unique identifier of this
# local filesystem contents. Not set by default.

# To autoselect an ID in an environment where the disks are ephemeral
# Humio can let zookeeper assign the ID in the case where the local filesystem
# does not have any data files and no "cluster_membership.uuid" file.
# The option `ZOOKEEPER_PREFIX_FOR_NODE_UUID` (defaulting to "/humio_autouuid_")
# sets the prefix to allows rack awareness by using a value that is
# distinct for each rack / availability zone.
# This is disabled by default. Turn on only for a fresh cluster.
# ZOOKEEPER_URL_FOR_NODE_UUID=host1:2181,host2:2181,host3:2181

# The URL that other Humio hosts in the cluster can use to reach this server.
# Required for clustering. Examples:  or  http://humio01:8080
# Security: We recommend using a TLS endpoint.
# If all servers in the Humio cluster share a closed LAN, using those endpoints may be OK.

# The URL which users/browsers will use to reach the server.
# This URL is used to create links to the server.
# It is important to set this property when using OAuth authentication or alerts.

## For how long should dashboard queries be kept running if they are not polled.
## When opening a dashboard, results will be immediately ready if queries are running.
## Default is 3 days.

## Warn when ingest is delayed.
## How much should the ingest delay fall behind before a warning is shown in the search UI.

# Specify the replication factor for the Kafka ingest queue.

# Kafka bootstrap servers list. Used as `bootstrap.servers` towards Kafka.
# should be set to a comma-separated host:port pairs string.
# Example: `my-kafka01:9092` or `kafkahost01:9092,kafkahost02:9092`

# By default, Humio will create topics and manage the number of replicas in Kafka for the topics being used.
# If you run Humio on top of an existing Kafka or want to manage this outside of Humio, set this to false.

# Deletes events from the ingest queue when they have been saved in Humio.
# It is still important to configure Kafka retention on the ingest queue.
# The Kafka retention defines how long data can be kept on the ingest queue and, thus, how much time Humio has to read the data and store it internally.

# It is possible to add extra Kafka configuration properties by creating a properties file and pointing to it.
# These properties are added to all Kafka producers and consumers in Humio.
# For example, this enables Humio to connect to a Kafka cluster using SSL and SASL.
# Note the file must be mapped into Humio's Docker container if running Humio as a Docker container.

# Add a prefix to the topic names in Kafka.
# Adding a prefix is recommended if you share the Kafka installation with applications other than Humio.
# The default is not to add a prefix.

# Zookeeper servers.
# Defaults to "localhost:2181", which is OK for a single server system, but
# should be set to a comma-separated host:port pairs string.
# Example: zoohost01:2181,zoohost02:2181,zoohost03:2181
# Note, there is no security on the Zookeeper connections. Keep inside trusted LAN.

# Maximum number of datasources (unique tag combinations) in a repo.
# There will be a sub-directory for each combination that exists.
# (Since v1.1.10)

# Strategy for compression: Compress (fast) in digest pipeline or (highly) later.
# fast: Compress using LZ4 in the digest pipeline. This is what all versions up to 1.5.x did.
# high: Compress using LZ4 in the digest pipeline, then recompress using Zstd when merging mini-segments into proper segments later.
# extreme: Compress using Zstd in the digest pipeline, then recompress using Zstd when merging mini-segments into proper segments later. Extreme is not recommended as the extra compression is not worth the extra cpu time spent.
# Recommended setting depends on the hardware and use case. The rule
# of thumb is that high provides a 2x compression ratio over fast at the
# cost of using more CPU time for decompressing while searching.
# Go for high as the default for fresh installs and keep fast on existing systems to allow rolling back to 1.5.x
# Default: high

# Compression level for data in segment files. Range is [0 ; 9]
# Defaults to 6 for COMPRESSION_TYPE=fast and 9 for COMPRESSION_TYPE=high and extreme.

# For COMPRESSION_TYPE=high and extreme this sets the compression level of the minisegments.
# Defaults to 0. Range is [0 ; 6]

# Many events have fields with values where one field holds a substring of the value from another.
# This is the case in particular if the event arrives with a @rawstring with the full event which then gets
# parsed using a parser that stores copies of many substrings from @rawstring in other fields.
# Humio removes duplication of these values before storing them in the segment files. This config limits how much cpu time
# is spent on that effort. The default is to follow the compression level.
# Default: 9. Minimum: 0. Maximum: 21. Adding one allows double the time.

# (Approximate) limit on the number of hours a segment file can be open for writing
# before being flushed even if it is not full. (Full is set using BLOCKS_PER_SEGMENT)
# Default: version < 1.4.x had 720, 1.4.x has 24

# How long a mini-segment can stay open. How much data needs replay from Kafka when a fail-over happens?

# Desired number of blocks (each ~1 MB before compression) in a final segment after merge
# Segments will get closed earlier if expired due to MAX_HOURS_SEGMENT_OPEN.
# Default: version < 1.15.x had 2000, 1.15+ has 8000

# Desired number of blocks (each ~1 MB before compression)
# in a mini-segment before merge. Defaults to 128.
# Mini-segments will get closed earlier if expired due to FLUSH_BLOCK_SECONDS

# Minimum size in KB to target for blocks in a segment. Range: [128; 2048]
# Blocks may flush due to time, size of pre-filter bits.
# Default value: 384 KB
# From v1.5.14.

# Maximum size in KB to target for blocks in a segment. Range: [128; 2048]
# Blocks may flush due to time, size of pre-filter bits.
# Default value: 1024 KB. Max value: 2048 KB.
# From v1.5.14.

# Target fill percentage of pre-filter. Default value: 30.
# Percent of the bits to be set in the pre-filters. Range: [10; 100].
# Influences block size: Lower values may trigger smaller blocks. Higher reduces efficienty of search.
# From v1.5.14.

# Select roles for node, with current options being "all" or
# "httponly". The latter allows the node to avoid spending CPU time on
# tasks that are irrelevant to a node that has never had any local
# segment files and that will never have any assigned either. Leave as
# "all" unless the node is a stateless http frontend or ingest
# listener only.

# Whether this node should act as a query coordinator. Query coordinators are
# responsible for sending subqueries to storage nodes and combining the results.
# In clusters with "httponly" nodes (as described above), it often makes sense
# to set this to false for non-httponly nodes.

# Queries posted to `/queryjobs` such as those from the Humio UI can either start
# on the local node that receives the request, or get proxied to the node that
# is the most likely to run the query if another instance of the same search
# has already started. This helps sharing identical searches in a cluster.
# Setting this to false makes requests execute locally on the node that receives them.
# From v1.14.0.

# How long the digest worker thread should keep working on
# flushing the contents of in-memory buffers when Humio is told to shut down
# using "sigterm" (normal shutdown). Default to 300 seconds as millis.
# If too low, then the next startup will need to start further back in
# time on the ingest queue.

# Optional: Allow the Humio JVM to exit if it detects more time is spent on GC than on actual computations.
# The kill factor is a multiplier applied to the time spent GC'ing.
# Humio will periodically compute timeSpentOnGC * GC_KILL_FACTOR - realTime.
# If the accumulated sum over all computed intervals exceeds the GC_KILL_THRESHOLD_MILLIS, Humio will exit
# This means that Humio will only exit if GC is consistently taking up a lot of time for a long time.
# The threshold is not set by default.

# Let Humio send emails using the Postmark service
# Create a Postmark account and insert the token here

# Let Humio send emails using an SMTP server. ONLY put a password here
# if you also enable starttls. Otherwise you will expose your password.
# Example using GMail:
# Example using a local clear-text non-authenticated SMTP server

# Use an HTTP proxy for sending alert notifications.
# This can be useful if Humio is not allowed direct access to the internet.

# Allow alert notifiers to not use the HTTP proxy.
# Default is false.

# Select the TCP port to listen for HTTP.

# Select the TCP port for Elasticsearch Bulk API.

# Select the TCP port for exporting Prometheus metrics. This is disabled by default.

# Select the IP to bind the UDP/TCP/HTTP listening sockets to.
# Each listener entity has a listen-configuration. This ENV is used when
# that is not set.

# Select the IP to bind the HTTP listening socket to.
# (Defaults to HUMIO_SOCKET_BIND)

# Verify checksum of segments files when reading them. Default to true.
# Allows detecting partial and malformed files.
# (Since v1.1.16)

# S3 access keys for archiving of ingested logs in an export format.
# Optionally point to your own hosting endpoint for the S3 to use for archiving. To use a non-AWS endpoint:
# S3_ARCHIVING_ENDPOINT_BASE=http://my-own-s3:8080
# Number of parallel workers for upload. Default is 1.

# Use the globally configured HTTP proxy for communicating with S3.
# Default is true.

# Bucket storage (S3 variant. For Google variant, replace "S3" with "GCP" in all the following keys.)
# - infinite storage using local disks as cache.
#   See the page on Bucket Storage for more information.
# These two take precedence over all other AWS access methods.
# Also supported, same as (S3_STORAGE_ACCESSKEY, S3_STORAGE_SECRETKEY):

# Optionally point to your own hosting endpoint for the S3 to use for storage
# - in order to use a non-AWS endpoint. Comment out for AWS.
# Number of parallel workers for upload. Defaults to 1.
# The secret can be any UTF-8 string. The suggested value is 64 or more random ASCII characters.
# Optional prefix for all object keys, empty if not set.
# Allows sharing a bucket for more Humio clusters by letting them each write to a unique prefix.
# Note! There is a performance penalty when using a non-empty prefix. Humio recommends an unset prefix.

# Use the globally configured HTTP proxy for communicating with S3.
# Default is true.

# Performance tuning settings for S3/GCP storage
# Size of chunks for upload. The default is 8MB. Min is 5 MB, Max is 8MB.
# Number of parallel chunks at a time for each file (S3 only)
# Number of concurrent uploading files.
# Number of concurrent downloading files.

# Makes Humio assume the that the primary data storage may be lost when restarting Humio.
# Setting this to true makes Humio attempt to delay shutdown until all required files have been copied to bucket storage.
# It also affects calculations on replicas to take into account the fact that replicas listed on other hosts cannot be trusted.
# This setting should be set on none or all nodes in the cluster, not on individual nodes.
# This setting requires S3/GCP storage.

# The following settings only take effect when S3/GCP storage is enabled.
# Note! This allows Humio to *delete* files from the local storage in
# using the assumption that it can fetch the file from S3/GCP if it needs it at some point.
# Fetching the file from S3/GCP is much slower than using local storage.
# Segment files will be deleted in a least recently used order, in order to hit the configured fill target.
# Percentage of disk full that Humio aims to keep the disk at.
# Not enabled by default.
# Minimum number of days to keep a fresh segment file before allowing
# it to get deleted locally after upload to a bucket.
# Setting such a lower bound can help ensure that recent files are kept on disk,
# even if they would otherwise be evicted due to queries on older data.
# Mini segment files are kept in any case until their merge result also exists.
# (The age is determined using the timestamp of the most recent event in the file)
# Make sure to leave most of the free space as room for the system to
# manage as a mix of recent and old files.
# Note! Min age takes precedence over the fill percentage, so increasing this
# implies the risk of overflowing the local file system!

# Disable shared dashboards (wall monitors).
# The main reason to do this is if your organization requires
# stricter security than what is afforded by the the URL shared
# secret used for shared dashboards.

# Users need to be created in Humio before they can log in with external
# authentication methods like SAML/LDAP/OAUTH.
# Set this parameter to true - then users are automatically created in
# Humio when successfully logging in with external authentication methods.
# If false - users must be explicitly created in Humio before they can log in.

# In order for the login mechanism to capture and sync the users groups from the 
# authentication mechanism,
# set the following configuration to true.

# If users are created automatically when loggin in they will have access to their sandbox 
# and certain system repos. 
# If set to true users will only be created if the groups synced from the 
# authentication mechanism have access to a view or repository. 

# Default groups for all users. A comma separated list of group names.
# After login an user will always be a member of those groups (if they exist) including any groups included from a given IDP.
DEFAULT_GROUPS=group1, group2

# Allows disabling use of personal API tokens. This may be relevant when
# LDAP or SAML is set as the authentication mechanism, as the personal API tokens
# never expire and thus allow a user to access Humio even when the LDAP/SAML
# account has been closed or deleted. Defaults to true.

# Initial partition count for storage partitions.
# Affects ONLY on the first start of the first node in the cluster.

# Initial partition count for digest partitions.
# Affects ONLY on the first start of the first node in the cluster.

# How big a backlog of events in Humio is allowed before Humio starts responding
# http-status=503 on the http interfaces and rejecting ingesting messages on HTTP?
# Measured in seconds worth of latency from an event arrival at Humio until it has
# been fully processed.
# (Note that typical latency in normal conditions is zero to one second.)
# Set to a large number, such as 31104000 (~1 year as seconds), to avoid
# having this kind of backpressure towards the ingest clients.
# Range: Min=300, Max=2147483647.

# How big a backlog of events in Humio is allowed before Humio starts
# cancelling live queries in order to catch up with the presumed spike in inbound traffic.
# The check occurs every 30 seconds and cancels the queries that account to the percentage of the
# locally running live queries on each node that had the highest cost since last check.

# How big a backlog of events in Humio is allowed before Humio starts
# dropping stale live queries in order to catch up with the presumed spike in inbound traffic.
# Stale live queries are those that have not been refresh on any UI for more than the required keep alive interval.
# The check occurs every 30 seconds and cancels the queries that account to the percentage of the
# running live queries on each node that had the highest cost since last check.

# A configuration flag to limit state in Humio searches.
# This is used to limit the number of groups in the groupBy function.
# This is necessary to limit how much memory searches can use and avoid out of memory errors.

# The maximum allowed value for the limit parameter on timechart (and bucket)

# Maximum allowed file size that can be uploaded to Humio, when uploading CSV or JSON files.
# Used to set a limit on how big files can be.

# Limits how many entries are allowed when using the match and lookup functions

# The maximum allowed number of points in a timechart (or bucket result)
# When this is hit, the result will become approximate and discard input.

# SECONDARY_DATA_DIRECTORY enables using a secondary file system to
# store segment files. When to move the files is controlled by
# Secondary storage is not enabled by default.
# Note! When using Docker, make sure to mount the volume
# into the container as well.
# See the page on Secondary Storage for more information.

# These properties define the disk space limits at which Humio will throttle itself to avoid filling the disks.
# When the primary disk cap is hit, Humio will attempt to use the secondary storage instead.
# If both caps are hit, the affected Humio node will pause processing of logs,
# and will avoid downloading segments from other nodes or buckets, until disk space is freed.

# CACHE_STORAGE_DIRECTORY enables a local cache of segment files copied
# from the primary/secondary storage.
# It only makes sense if the local NVME is ephemeral while the
# primary data dir is trustworthy but slow.
# This is generally not recommended as it is more efficient to
# use the fast local drive as your primary storage, use bucket storage
# for the long term stable storage, and set USING_EPHEMERAL_DISKS=true
# Caching degrades performance if turned on in that case.
# Enable caching of files from a slow network file system (EBS) or for a
# file system on spinning disks.
# The cache should be placed on local NVME or similar drives, providing
# more than 200 MB/s/core in the machine.
# CACHE_STORAGE_PERCENTAGE Defaults to 90 and controls how full the cache
# file system is allowed to become.
# Humio manages the files in the cache directory and will delete files
# when there is too little space remaining.
# (Do not add a RAM-disk as cache: RAM is better kept for page cache.)
# Caching is disabled by default as most install do not benefit from turning it on.

# Humio will write threaddumps to humio-threaddumps.log with the interval specified here
# If not specified Humio will write threaddumps every 10 seconds

# Whether to emit a newline into streaming query responses every 10 seconds if there is nothing else to send
# This setting applies to requests made by clients external to Humio
# Defaults to false
# Whether to emit a newline into streaming query responses for requests every 10 seconds if there is nothing else to send
# This setting applies to requests made internally by Humio itself
# Defaults to false
# The keep alive duration to set on HTTP responses for streaming queries.
# Defaults to not being set. If unset, the keep-alive header will not be used.

# The findTimestamp function will only search this number of characters in the string for a timestamp
# If not specified, it will search the first 128 characters

# Controls if Humio should update the MaxMind ip location database automatically.
# This can be disabled if that update has to be done manually, by setting this to false.
# Defaults to true
# When auto update is disabled you must write a MaxMind database file (including city information) to the IpLocationDb.mmdb file which should be located in the humio data directory. Humio will check for changes to this file every five minutes.

# By default the MaxMind database will be fetched from
# These properties allow you to fetch the database directly from MaxMind instead
#Note that the fetched edition must include city information
# If you're using a custom URL for downloading the MaxMind database you can set
# otherwise the default will be used

# The maximum number of field values stored per alert that is using field-based throttling.
# If such alerts trigger with the same field value before the throttle period has elapsed,
# you might want to increase this limit.
# Note! Increasing this limit might increase the memory usage of every node in the cluster.
# Defaults to 100

# Humio can provide auto-balanced partition table suggestions based on zones and replication
# factor settings. Suggestions will only be enabled when DIGEST_REPLICATION_FACTOR and
# STORAGE_REPLICATION_FACTOR settings are set. If ZONE is set on all hosts, then a node will
# be its own zone.
# Zone label, eg. 'dc1' or 'dc2'. If not set a node will be its own zone
# Sets the replication factor for digest.
# Sets the replication factor for storage.

Performance tuning for long retention settings

The defaults for Humio are targeting retention times of data in the range of 1-6 months. If you plan to keep data for much longer you can reduce the number of files stored on disk by telling Humio to create larger files. As retention “chops off” old data in chunks consisting of one file at a time, this will make those chunks larger.

If your Humio repositories expect to have a retention of more than 6 months on average, you can increase the amount of data in each file, thus reducing the total number of files in the system. Changing these settings on a Humio cluster has effect for files created after the change and making such a change is okay at any point in time.

# The default value in Humio for installs where the average data retention is 0-6 months.

# Suggest using the default for Humio installs where the average data retention is 6-18 months.

# Suggest using the default for Humio installs where the average data retention is 2+ years.

Java virtual machine parameters

You can supplement or tune the Java virtual machine parameters used when running Humio with the HUMIO_JVM_ARGS environment variable. The defaults are:


Number of CPU cores

You can specify the number of processors for the machine running Humio by setting the CORES property. Humio uses this number when parallelizing queries and other internal tasks.

By default, Humio uses the Java available processors function to get the number of CPU cores. This is usually the optimal number. Be aware that the auto-detected number can be too high when running in a containerized environment where the JVM does not always detect the proper number of cores.

Derived from the number of CPU cores, Humio internally sets QUERY_EXECUTOR_CORES and DIGEST_EXECUTOR_CORES to half that number (but a minimum of 2) to reduce pressure on context switching due to hyperthreading since the number of CPU cores usually include hyperthreads. If the number of cores set through CORES is the number of actual physical cores and not hyperthreads, you may want to set these to the same number as CORES. Note that raising this number above the default may lead to an unstable and slow system due to context switching costs growing to a point where no real work gets done when the system gets loaded, while it may appear to work fine when not fully utilized.

Configuring authentication

Humio supports different ways of authenticating users. Read more in the Authentication Documentation.

Configuring Network Time Protocol (NTP)

Humio requires NTP to be installed, configured, and in-sync across nodes for all clustered deployments.

Public URL

PUBLIC_URL is the URL where the Humio instance is reachable from a browser. Leave out trailing slashes.

This property is only important if you plan to use OAuth Federated Login, Auth0 Login, or if you want to be able to have Alert Notifications have consistent links back to the Humio UI.

The URL might only be reachable behind a VPN, but that is no problem, as the user’s browser can access it.

Redirecting the internal logs from Humio to stdout

By default Humio sends it’s own log into the internal Humio repo and into a number of log files. If you would rather avoid the external log files and get the output on stdout, then use this.

# This needs setting through the environment variable along with the rest of configs.
# Note that the name is case sensitive and the value must be given exactly as provided here.
# Valid values for built-in configurations:
# LOG4J_CONFIGURATION=log4j2.xml (This is the default when not set)
# LOG4J_CONFIGURATION=log4j2-stdout.xml
# LOG4J_CONFIGURATION=log4j2-stdout-json.xml

# Alternatively you can provide you own configuration file for log4j2 outside the jar file:
# But be aware that the file needs to be similar to the one within the jar as the internal logging
# into the humio repository depends on some of the configuration in that.
# You can extract the internal version as a starting point using "unzip humio-assembly-0.1.jar log4j2.xml"