Amazon Bucket Storage

Humio supports writing a copy of the ingested logs to Amazon S3 and Google Cloud Storage using the native file format of Humio, allowing Humio to fetch those files and search them efficiently if the local copies are lost or deleted. This page will explain how to set up bucket storage with Amazon S3. For more details on this topic in general, see the Bucket Storage page.

If you use S3 for bucket storage, server-side encryption must be turned off. Otherwise, Humio cannot verify the checksum and upload will fail. The files are encrypted by Humio, as described under security section from the overview page.

Self-Hosting with S3

Humio supports multiple ways of configuring access to the bucket, allowing you to use any of the 5 listed in Using the Default Credential Provider Chain on top of the following, which takes precedence if set.

If you run your Humio installation outside AWS, you need an IAM user with write access to the buckets used for storage. That user must have programmatic access to S3, so when adding a new user through the AWS console make sure programmatic access is set.

Figure 1, Adding User with Programmatic Access

Later in the process, you can retrieve the access key and secret key:

Figure 2, Access Key and Secret Key

Which in Humio is needed in the following configuration:

# These two take precedence over all other AWS access methods.
S3_STORAGE_ACCESSKEY=$ACCESS_KEY
S3_STORAGE_SECRETKEY=$SECRET_KEY

# Also supported:
AWS_ACCESS_KEY_ID=$ACCESS_KEY
AWS_SECRET_ACCESS_KEY=$SECRET_KEY

The keys are used for authenticating the user against the S3 service. For more guidance on how to retrieve S3 access keys, see AWS access keys. More details on creating a new user in IAM.

Configuring the user to have write access to a bucket can be done by attaching a policy to the user.

IAM User Example Policy

The following JSON is an example policy configuration:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
		"s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME/*"
            ]
        }
    ]
}

The policy can be used as an inline policy attached directly to the user through the AWS console:

Figure 3, Attach Inline Policy

You must also tell which bucket to use. These settings must be identical on all nodes across the entire cluster.

S3_STORAGE_BUCKET=$BUCKET_NAME
S3_STORAGE_REGION=$BUCKET_REGION
S3_STORAGE_ENCRYPTION_KEY=$ENCRYPTION_SECRET
S3_STORAGE_OBJECT_KEY_PREFIX=/basefolder
USING_EPHEMERAL_DISKS=true

The first option here is to set the name of the bucket to use. The encryption key given with S3_STORAGE_ENCRYPTION_KEY can be any UTF-8 string. The suggested value is 64 or more random ASCII characters. The S3_STORAGE_OBJECT_KEY_PREFIX is used to set the optional prefix for all object keys. This option allows nodes to share a bucket, but requires them each to write to a unique prefix. It’s empty by default. Note, there is a performance penalty when using a non-empty prefix. We recommend an unset prefix. If there are any ephemeral disks in the cluster, you must set the last option here to true.

You can change the settings using the S3_STORAGE_BUCKET to point to a fresh bucket at any point in time. From that point, Humio will write new files to that bucket while still reading from any previously-configured buckets. Existing files already written to any previous bucket will not get written to the new bucket. Humio will continue to delete files from the old buckets that match the file names that Humio would put there.

Switching to a Fresh Bucket

You can change the settings using the S3_STORAGE_BUCKET and S3_STORAGE_REGION to point to a fresh bucket at any point in time. From that point, Humio will write new files to that bucket while still reading from any previously-configured buckets. Existing files already written to any previous bucket will not get written to the new bucket. Humio will continue to delete files from the old buckets that match the file names that Humio would put there.

Other Options

Configuring for Use With Non-Default Endpoints

You can point to your own hosting endpoint for the S3 to use for bucket storage if you host an S3-compatible service.

There are two styles of S3 base URL Humio can use, depending on which URLs your service supports.

Virtual host style (default)

Humio will construct virtual host-style URLs like https://my-bucket.my-own-s3:8080/path/inside/bucket/file.txt.

For this style of access, you need to set your base URL, so it contains a placeholder for the bucket name.

S3_STORAGE_ENDPOINT_BASE=http://{bucket}.my-own-s3:8080

Humio will replace the placeholder with the relevant bucket name at runtime.

Path-style

Some services do not support virtual host style access, and require path-style access. Such URLs have the format https://my-own-s3:8080/my-bucket/path/inside/bucket/file.txt. If you are using such a service, your endpoint base URL should not contain a bucket placeholder.

S3_STORAGE_ENDPOINT_BASE=http://my-own-s3:8080

Additionally, you must set S3_STORAGE_PATH_STYLE_ACCESS to true

Hitachi Content Platform compatibility

Bucket storage can be used with the Hitachi Content Platform by setting S3_STORAGE_HCP_COMPAT to true.

MinIO compatibility

MinIO in its default mode doesn’t use MD5Sum checksums of incoming streams. This leads to incompatibility with Humio’s client. MinIO provides a workaround: use the --compat option instead to start the server. For example, ./minio --compat server /data.

HTTP Proxy

If Humio is set up to use an HTTP proxy, it will per default be used for communicating with S3. It can be disabled using the following:

# Use the globally configured HTTP proxy for communicating with S3.
# Default is true.
S3_STORAGE_USE_HTTP_PROXY=false

Performance Tuning

The following allows tuning for performance. Note that there may be a costs associated with increasing these as S3 is billed also based on the number of operations executed.

# How many parallel chunks to split each file into when uploading and downloading.
# Defaults to 4, the maximum is 16.
S3_STORAGE_CHUNK_COUNT=4


# Maximum number of files that Humio will run concurrent downloads for at a time.
# Default is the number of hyperthreads / 2
S3_STORAGE_DOWNLOAD_CONCURRENCY=8

# Maximum number of files that Humio will run concurrent uploads for at a time.
# Default is the number of hyperthreads / 2
# S3_STORAGE_UPLOAD_CONCURRENCY=8

# Chunk size for uploads and download ranges. Max 8 MB, which is the default.
# Minimum value is 5 MB.
S3_STORAGE_CHUNK_SIZE=8388608

# Prefer to fetch data files from the bucket when possible even if another
# node in the Humio cluster also has a copy of the file.
# In some environments, it may be less expensive to do the transfers this way.
# The transfer from the bucket may be billed at a lower cost than a transfer from
# a node in another region or in another data center.  This preference does not
# guarantee that the bucket copy will be used, as the cluster can
# still make internal replicas directly in case the file is not yet in
# a bucket.
# Default is false.
S3_STORAGE_PREFERRED_COPY_SOURCE=false

Export to Bucket with Amazon

By default Humio allows downloading the results of a query to a file. This file is generated as a HTTP stream directly from Humio, and can be long-lasting with long periods of no data being transmitted when Humio is searching for rare hits in large data sets. This can cause issues for some networks and load balancers.

As an alternative, Humio allows exporting to Amazon S3. The result of the query will be uploaded to the bucket storage provider and the user will be given a URL to download the file once the upload is complete.

As Humio uses signed URLs for downloads, the user does not need read access to the bucket, however, Humio needs write access to the target bucket.

Following is the most basic example configuration for exporting to Amazon S3:

# The credentials used to authenticate with
S3_EXPORT_ACCESSKEY=$ACCESS_KEY
S3_EXPORT_SECRETKEY=$SECRET_KEY

# The name of the region and bucket to export to
S3_EXPORT_REGION=$BUCKET_REGION
S3_EXPORT_BUCKET=$BUCKET_NAME

Humio supports many other kinds of authentication on AWS, as described earlier on this page.

HTTP Proxy

If Humio is set up to use an HTTP proxy, it will per default be used for communicating with S3. It can be disabled using the following:

# Use the globally configured HTTP proxy for communicating with S3.
# Default is true.
S3_EXPORT_USE_HTTP_PROXY=false