Kafka Connect is a framework for connecting Kafka with other systems such as Humio. If you have your data in Kafka consider this approach for sending data to Humio. We have worked with Confluent, achieved Gold Verification, and are now able to offer our new Kafka Connector which uses our fast and efficient HEC endpoint! You can get started by visiting the Confluent Hub page or or the GitHub repository.
This guide provides step-by-step guidance on how to build, integrate and operate the Humio HEC connector within the Kafka platform.
The purpose of the Humio HEC Sink connector is to read messages from a Kafka topic and submit them as events to the HTTP event collector endpoint of a running Humio system.
The Humio HEC connector uses maven to build and test itself. The version of Kafka
to build for is indicated in the
pom.xml file by the line:
Out of the box, Kafka 2.2.0 is supported. This can (and should) be changed to match your current Kafka or Confluent Platform version; to check which version this is, refer to the Confluent Platform Versions page.
Scripts are provided to automatically build and package the connector jar.
bin/compile.sh automatically compiles and packages the connector, with the
resulting “uber jar” located at
Alternatively, you can run:
mvn -DskipTests=true clean install mvn -DskipTests=true assembly:assembly -DdescriptorId=jar-with-dependencies
To install the connector for “plain Kafka”, copy the uber jar
kafka-connect-hec-sink-1.0-SNAPSHOT-jar-with-dependencies.jar into the
KAFKA_HOME/libs/ folder. Set your configuration
To install the connector for the Confluent platform, build the uber jar and copy it into the proper directory
mkdir /CONFLUENT_HOME/share/java/kafka-connect-hec-sink cp target/kafka-connect-hec-sink-1.0-SNAPSHOT-jar-with-dependencies.jar /CONFLUENT_HOME/share/java/kafka-connect-hec-sink/.
See the Install Connectors Confluent page for more information.
This connector utilizes the
kafka-connect-maven-plugin maven plugin to create
a Confluent Hub compatible archive. Use
mvn package to create the archive
For an example configuration using standalone mode, refer to
and for distributed mode, refer to
||URL to the Humio HEC endpoint, like
||name of the Humio repo you wish to send data to, such as
||The ingest token as supplied by your Humio installation, for example,
||Maximum number of events to send per call to the HEC endpoint. This configuration element must be an integer greater than zero and is required.|
||When set, defines the name of the field which will automatically be set to hold the Kafka topic name the event originated from. It may be useful to use a tag, like
||When set, defines the name of the field which will automatically be set to the partition of the Kafka topic the event originated from. This configuration element must be a non-empty string and is optional.|
||Maximum number of times a call to the HEC endpoint will be retried before failing (and throwing an exception). This configuration element is optional, with a default value of 10 retries.|
Kafka message keys are currently ignored. Values are converted to HEC-compatible JSON based on connector configuration (see below examples).
Connectors require key and value converters
These determine the method by which the HEC Sink handles each message and are described below.
Regarding schema evolution, because this connector converts any given Avro message to JSON on-the-fly (within the constraints of the Avro data types given above) for submission to the Humio HEC endpoint, schema evolution is not an issue, and there are no restrictions other than being limited to the aforementioned supported Avro data types.
Metrics are exposed through JMX as well as dumped to standard out every 30 seconds.
More information about the method used to generate these metrics can be found here.
||Number of active sink tasks.|
||Number of flushes requested by Connect.|
||Number of parsing errors. NOTE: these parse errors are in coercing the sink records into Humio HEC records, not during initial Kafka ingest. See notes regarding parsing errors & logging in the configuration section above.|
||Number of records put.|
||Number of connector task starts.|
||Number of connector task stops.|
||Failed HTTP requests to Humio HEC endpoint.|
||Number of records posted to Humio HEC endpoint.|
||Number of times connector had to wait for a batch to post to Humio as a result of a requested flush from Connect.|
||Batch sizes received from Kafka.|
||Batch sizes submitted to Humio HEC endpoint.|
||Full end-to-end processing time.|
||HEC endpoint POST time.|
com.humio.connect.hec.HECSinkTask.put-batch-sizesreflects values consistently smaller than the configuration property
humio.hec.buffer_sizeand you are sure there is sufficient throughput on the assigned topic partition to fill the buffer, check the Kafka configuration property
max.poll.records(its default is 500), you may have to increase it.
The Humio HEC endpoint supports several more fields which are not explicitly handled by this connector. The techniques outlined below for each field may give you some ideas on how to use these fields by way of Kafka Connector’s Single Message Transformations. Alternatively, these fields may also be set (or modified) by way of Humio’s Parsers.
You can use the below technique to rename a field using SMT, for example, if your messages already have a timestamp
transforms=replacefield transforms.replacefield.renames=other_time_field:time transforms.replacefield.type=org.apache.kafka.connect.transforms.ReplaceField$Value
You can use the below technique to leverage SMT to insert a field with a static value. For example, if you wish to configure events to use a specific time zone, you can set a static value
transforms=insert_tz transforms.insert_tz.static.field=timezone transforms.insert_tz.static.value=Europe/Copenhagen transforms.insert_tz.type=org.apache.kafka.connect.transforms.InsertField$Value
We have created a Docker Compose setup to streamline testing of the Humio HEC sink connector. There are two ways you can use it: completely integrated with and managed by the test framework, or running the docker compose environment separately and running the supplied tests and testing the connector itself is standalone mode.
Unit tests currently cover the internal
Record object functionality,
HECSinkConnectorConfig instantiation, as well as each of
JsonSchemalessRecordConverter for schema and
data conversion functionality.
The end-to-end test performs the following steps (assuming managed docker):
src/test/resources/docker-compose.ymlDocker Compose file;
src/test/resources/docker-compose.ymland waits for them to successfully start;
humio-datadirectory mounted from within the running docker container;
count()query against the Humio docker instance, verifying the correct number of messages has been received;
If any of the above steps fail, the test as a whole fails.
In this context, “managed Docker Compose” means the test framework handles the
work of starting, managing, and stopping the Docker Compose element, as well as
automatically configuring all dependencies based on data in
(hostnames, ports, ingest tokens).
Unit & end-to-end integration test
src/test/resources/docker/docker-compose.ymland verify the port assignments are unused and available on your machine. If they are not, be sure to ensure any ports that were changed are reflected in references made by other services.
src/test/resources/config/json_sink_connector.json, editing configuration properties if necessary.
bin/compile.shif you have not already; this will build the “uber jar” required for the managed instance of Connect.
mvn testto run the full test suite (including unit tests).
Results specific to distributed testing will be contained within the
EndToEndJsonTest test output. For further insight as to its mechanics and what/how everything is being tested, refer to the source.
In this context, the assumption will be that you are managing the docker compose element yourself, with the tests assuming it’s running. This is generally only useful if you want to run the connector in standalone mode and generate some load with
JsonDataPump, or test it by some other means to suit your needs.
Things to know:
bin/stop-docker-instance.sh, and restart it. This will ensure you start with a blank slate.
Before running the connector in standalone mode, but after docker has been started, run the utility
bin/get-ingest-token.sh. Ouptut will look similar to this:
$ bin/get-ingest-token.sh defaulting to humio index (repo) sandbox extracting sandbox ingest key from global data snapshot... located sandbox key: sandbox_e5W4sRju9jCXqMsEULfKvZnc LaBtqQmFXSOrKhG4HuYyk4JZiov2BGhuyB2GitW6dgNi
The last line of the output (e.g.,
LaBtqQmFXSOrKhG4HuYyk4JZiov2BGhuyB2GitW6dgNi) should be placed in the file
config/HECSinkConnector.properties as the value for the
humio.hec.ingest_token property. If you see errors here, it probably cannot find data in
./humio-data, which is mounted from the humio service running in docker; stop and restart the docker services with the scripts provided in
bin (see things to know above).
Unit and end-to-end integration test
Execute tests with the environment variable
$ MANAGE_DOCKER=false mvn test
Stop docker with
Testing the HEC connector with
config/HECStandaloneSinkConnector.properties, editing port assignments if necessary.
config/HECSinkConnector.properties. If you have not updated the ingest token (see things to know above), do so now.
config/HECStandaloneSinkConnector.properties, ensuring the Kafka broker port is correct (if you haven’t edited the
docker-compose.ymlfile, you’re good to go), and ensure that the
rest.portport is not already in use on your machine.
hectopic, the value in the configuration properties if you’ve not changed it) by running
bin/json-data-pump.sh hectopic. Note: this utility will run until killed!