Collecting Logs from AWS S3 with Humio and FluentD

This document provides a cookbook example of how to collect logfiles from AWS S3 and ship that data to Humio. There are lots of options for how to do this, and this particular example is based on AWS CloudTrail data.

This recipe can be used in any situation where log data is being placed into an AWS S3 bucket and that data needs to be shipped into Humio with minimal latency. It does not address the scenario of collecting historical data from AWS S3.

It makes use of AWS SQS (Simple Queue Service) to provide high scalability and low latency for collection. It does not rely on scanning of the AWS S3 bucket (which is why it does not support historical ingestion) as this approach for collection does not work with S3 at large scale.

The scenario documented here is based on the combination of two FluentD plugins; the AWS S3 input plugin and the core Elasticsearch output plugin.

Why FluentD

FluentD offers many plugins for input and output, and has proven to be a reliable data shipper for many modern deployments. It is chosen in this example specifically because the configuration is clear and understandable, and is relatively trivial to deploy and test.


The following assumes that you have a working installation of FluentD on a server. This example was built using CentOS (CentOS Linux release 8.1.1911) and made use of the gem variant of FluentD installation.

Configure AWS

Assuming that you have an AWS S3 bucket with log data already flowing to it, but no SQS queues configured, you will want to complete the following steps.

This approach is using a dedicated user account with minimal permissions, and authenticating using keys. There are alternative ways to configure the IAM settings if you wish, this is provided as an example.

All items in this case are configured in the same region. This is a requirement for some of the components, the recommendation is to configure this in the region closest to your Humio or FluentD instances although it is not critical.

Create SQS Queue

  • In the AWS Console go to ServicesApplication IntegrationSimple Queue Service
  • Choose Create New QueueStandard Queue
  • Give the queue a name, for example, humio-cloudtrail-queue
  • Choose Quick-Create Queue (you may want to tune specific queue parameters depending on the volume of data and your environment. That is beyond the scope of this document.)

Note the ARN, as you will need this later.

Configure SQS Permissions for S3 Events

It is necessary to authorize the S3 bucket to push events into the SQS queue. To do this you will need the ARN for your S3 bucket. Go to the SQS menu in AWS

  1. Select your SQS queue then choose Queue Actions → Add a Permission.
  2. Choose the following settings:
  • Effect: Allow
  • Principle: Everybody (checkbox)
  • Actions: SendMessage
  • Add Conditions:
    • Qualifier: None
    • Condition: ArnLike
    • Key: aws:SourceArn
    • Value: <ARN OF YOUR S3 BUCKET>
  1. Click Add Permission

Setup S3 Events to SQS

Go back to the configuration for the S3 bucket holding the CloudTrail logs.

  1. Choose PropertiesEvents
  2. Select Add Notification
  3. Give the notification a name, such as cloudtrail-to-humio
  4. Check All object create events
  5. Prefix: AWSLogs/XXXXXXXXXXXX/CloudTrail/ where XXXXXXXXXXXX is your AWS account number
  6. Send to: SQS Queue
  8. Click Save

If you get an error at this point then it’s likley you haven’t set the permissions correctly for S3 to post events to that SQS queue. Please review that configuration if needed.

Create user account for FluentD

We recommend that you use a dedicated user account for FluentD. This account will have minimal permissions and be used only for running the FluentD connection.

  1. In the AWS Console go to Security, Identity, & ComplianceIAM
  2. UsersAdd User
  3. Provide a user name and choose Programatic Access (checkbox)
  4. Click Next: Permissions
  5. Click Next: Tags
  6. Click Next: Review
  7. Click Create user (ignore the warning about no permissions for the user)

When you finish creating the user be sure to download and save the Access key ID and Secret access key, as you will need them to complete the FluentD configuration.

We will now create two inline policies for this user (the policies will only exist as part of this user account)

  1. With the user selected, on the Permissions tab, select Add Inline Policy.

  2. Select the JSON editor and paste the following (editing the bucket name to suit)

           "Version": "2012-10-17",
           "Statement": [
                   "Action": [
                   "Effect": "Allow",
                   "Resource": [

    This policy gives full read access to the bucket. It is possible to modify

the Resource section to be more strict on how the permissions are granted. This depends on the layout of your S3 bucket.

  1. Click Review Policy
  2. Give it a name like read-access-to-s3-cloudtrail
  3. Click Create Policy

Repeat the above steps to create a second inline policy for managing the SQS queue. The JSON is

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:sqs:eu-west-2:507820635124:humio-demo-sq"

Configure AWS CloudTrail to send logs to S3

Finally in AWS we configure AWS CloudTrail to send logs to the S3 bucket, using the official Amazon CloudTrail documentation.

What is important is that the CloudTrail logs should go to the S3 bucket that is configured as above, and that the prefix for writing those logs to the bucket matches the configuration in the SQS notification setup.

Create a CloudTrail parser in Humio

CloudTrail data is sent as JSON but it is wrapped in a top level Records array. This means that additional parsing is needed for CloudTrail events to appear individually in Humio. This can be achieved by defining a custom parser in Humio and associating it with the access token for the repository of your choice.

To create the custom parser in Humio

  1. In your repository of choice go to Parsers → New Parser

  2. For the name choose json-cloudtrail

  3. For the Parser Script you can use

    | split(Records, strip=true)
    | @rawstring := rename(@display)
    | parseTimestamp(field=eventTime)
    | drop([@display, _index])

4. Save the new parser and associate it with the access token for the repository
  that you will use in the FluentD configuration.

<!-- TODO:  this may not be the most efficient way to split the CloudTrail messages, rather a configuration that does this work in FluentD may be more optimal. Suggestions welcome!-->

## Configure FluentD input

Install the relevant FluentD plugin for communicating with AWS S3 and SQS. On your
FluentD server you can run:

`gem install fluent-plugin-s3 -v 1.0.0 --no-document`

The input configuration is below:

      @type s3

      aws_key_id XXXXXXXXXXX
      s3_bucket my-s3-bucket
      s3_region eu-west-2
      add_object_metadata true

        queue_name my-queue-name

      store_as gzip

        @type json


Be sure to configure the plugin with the values relevant for your environment,
including the ID and Key for the AWS user, S3 bucket name and region, and the SQS
queue name.

More details and options for the input plugin are available [on GitHub](

## Configure FluentD output

The output for this scenario is the same as the standard output to Humio when using the Elasticsearch plugin for FluentD as documented here:

To install the elasticsearch plugin on your FluentD server you can run:

`fluent-gem install fluent-plugin-elasticsearch`

An example output configuration is below:

    <match input.s3>
      @type      elasticsearch
      host      my.himuo.instance
      port      9200
      scheme     http
      user      ${cloudtrail}
      password    ${YYYYYYYYYYY}
      logstash_format true

Replace `cloudtrail` with your Humio repository name, and `YYYYYYYYYYY` with
your access token.

<div class="notices note" >
    <p>This is filtering on the tag <code>input.s3</code> which should match all the data coming
from our S3 input plugin, as we did not set or parse any additional tag data.</p>