Vector

Vector is a lightweight and ultra-fast tool for building observability pipelines. It can be used to replace Logstash, Fluent, Telegraf, Beats, or similar tools. It has built-in support for shipping logs to Humio through the humio_logs sink.

Vector can be installed on Linux, Windows, and MacOS. The Vector documentation includes several methods of installation.

Vector supports sending its own internal metrics through an internal_metrics source. However, at this time, some of the internal metrics can lead to parsing performance issues & high system load when they are sent to Humio. For this reason we recommend that you do not send Vector’s internal_metrics to Humio, and instead sink it to your other monitoring systems. Vector can send these metrics to Prometheus, statsd, and more.

Configuration

Sending data to Humio with Vector is very easy using the humio_logs sink. We only need the URL of the Humio cluster and an ingest token.

In the example below we configure Vector to read from standard input (stdin) and send each line to a Humio sink, the Humio cluster. Messages entered at the command-line after starting Vector will be sent to Humio.

First, you’ll need to create vector configuration file, vector.toml. Do this with a simple text editor and add the following lines:

data_dir = "/var/lib/vector"

[sources.my_stdin_source]
  type = "stdin"

[sinks.my_humio_cluster]
  inputs = ["my_stdin_source"]
  type = "humio_logs"
  encoding.codec = "json"
  host = "${HUMIO_URL}"
  token = "${HUMIO_INGEST_TOKEN}"

By default, Vector sends events to Humio as json. Vector version 0.9.1 added the option to send logs to Humio in the raw text format by setting the encoding.codec to a value of text.

Now, run Vector with the environment variables HUMIO_URL and HUMIO_INGEST_TOKEN set appropriately and enter test messages:

HUMIO_URL=http://localhost:8080 HUMIO_INGEST_TOKEN=KL95YdaSYEWJ1tV9CPEqWGdMi4FVXghD0xxGrDAU3Wg5 vector --config vector.toml
Mar 04 13:40:19.770  INFO vector: Log level "info" is enabled.
Mar 04 13:40:19.770  INFO vector: Loading configs. path=["vector.toml"]
Mar 04 13:40:19.773  INFO vector: Vector is starting. version="0.8.1" git_version="v0.8.1" released="Wed, 04 Mar 2020 15:11:57 +0000" arch="x86_64"
Mar 04 13:40:19.773  INFO vector::topology: Running healthchecks.
Mar 04 13:40:19.773  INFO vector::topology: Starting source "my_stdin_source"
Mar 04 13:40:19.773  INFO vector::topology: Starting sink "my_humio_cluster"
Mar 04 13:40:19.774  INFO source{name=my_stdin_source type=stdin}: vector::sources::stdin: Capturing STDIN
Mar 04 13:40:19.781  INFO vector::topology::builder: Healthcheck: Passed.
Example Message 1
Example Message 2

If everything started properly, search your Humio repository for the test messages. The messages in Humio will have the following structure. Note that Vector adds timestamp and host to the messages.

{"@timestamp":1583349673000,"#type":"none","host":"MacBook-Pro.local","#repo":"vector-example","@timezone":"Z","message":"Example Message 2","@rawstring":"{\"host\":\"Daniels-MacBook-Pro.local\",\"message\":\"Example Message 2\"}","@id":"mENFVMQVJyQ2M5pV4D1sFMB9_1_1_1583349673"}
{"@timestamp":1583349669000,"#type":"none","host":"MacBook-Pro.local","#repo":"vector-example","@timezone":"Z","message":"Example Message 1","@rawstring":"{\"host\":\"Daniels-MacBook-Pro.local\",\"message\":\"Example Message 1\"}","@id":"mENFVMQVJyQ2M5pV4D1sFMB9_1_0_1583349669"}

As a next step you should configure Vector to watch some file sources or use one of Vector’s many source types to gather data from other parts of your system.

Adding Fields

Vector makes it possible to add fields with static values using its transforms capability. In the example below a field called name will be added to the event sent to Humio with the value set to Name:

[transforms.sourcename_transform]
  type = "add_fields"
  inputs = ["sourcename"]
  fields.name = "Name"

You’ll need to update the inputs section of your sinks to point the transformation that you created in order for the new field to be added to the event (as illustrated below).

[sinks.humio_out]
  type = "humio_logs"
  inputs = ["sourcename_transform"]
  encoding.codec = "json"
  token = "$api-token"
  host = "$humio-url"

See Vector’s documentation on Adding Fields for more information.

Setting the Humio Parser

In the Vector configuration for each input, Humio supports specifying the field #type for which the value can be the name of the Humio parser you would like to use.

To do this, you again can use a transform on the input source. And then using the add_fields type, you can specify your Humio parser under #type like so:

[transforms.sourcename_transform]
  type = "add_fields"
  inputs = ["sourcename"]
  fields."#type" = "$HUMIO_PARSER"

When you do that, Humio will add the tag #type to each event. Humio will then automatically use the value of this field to select the parser once ingested.

Multi-Line Events

By default, Vector creates one event for each line in the in a file. However, you can also split events in different ways. For example, stack traces in many programming languages span multiple lines.

You can specify multiline settings in the Vector configuration. See Vector’s multiline configuration documentation

Often a log event starts with a timestamp, and we want to read all lines until we see a new line starting with a timestamp. In Vector that can be done like this:

  [sources.source_name.multiline]
    # Example: [4/28/20 14:59:25:783 EDT]
    start_pattern = "^\\[[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}"
    mode = "halt_before"
    condition_pattern = "^\\[[0-9]{1,2}/[0-9]{1,2}/[0-9]{2}"
    timeout_ms = 1000

The start_pattern should match your timestamp format.

Wildcard or “glob” log paths

Vector supports using a wildcard or “glob” to match log file pathnames which helps when aggregating logs from several hosts.

# Ingest logfiles in a /var/log/%HOSTNAME%/%LOG%.log hierarchy.
[sources.testhostlogs]
  type = "file" # required
  include = ["/var/log/*.example.com/*.log"]

When using wildcards, keep the following in mind:

  • Wildcards are re-scanned every 1000ms by default. This is controlled by the Vector glob_minimum_cooldown setting.
  • If you have a directory as part of the glob path, as shown above, be sure that the vector user has both “read” and “execute” permissions on the directories used in the path.

Given the example above, without “read” permission on a directory that matches /var/log/*.example.com/, Vector will not be able to examine the directory contents to find matches for the *.log part of the path. Vector will ignore any directories that it cannot read, so check your file permissions if you are not seeing the expected log entries in Humio.

To debug this and other issues, it’s helpful to examine Vector’s logs with journalctl -fu vector and other suggestions from the Vector Troubleshooting Guide.