Filebeat is a lightweight,
open source program that can monitor log files and send data to servers. It has some properties that make it a great tool for sending file data to Humio.
It uses few resources, which is important because the Filebeat agent must run on each server where you want to capture data. It’s also easy to install and run since Filebeat is written in the Go programming language, and is built into one binary. Last, it handles network problems gracefully. When Filebeat reads a file, it keeps track of the last point it read. If there is no network connection, then Filebeat waits to retry data transmission. It continues data transmission when the connection is working again.
Check out Filebeat’s official documentation for more information. You might also read, the Getting Started Guide.
To download Filebeat, visit the Filebeat OSS downloads page.
You can find installation documentation for Filebeat at the Filebeat Installation page. Remember to replace the download URL for Filebeat with the URL for the open source version of Filebeat.
Do not use the Elastic non-OSS version of Filebeats. It will not work with Humio.
Humio supports parts of the ElasticSearch bulk ingest API. This API is served both as a sub-path of the standard Humio API and on its own port (defaulting to 9200). Data can be sent to Humio by configuring Filebeat to use the built-in Elastic Search output.
You can find configuration documentation for Filebeat at the Filebeat configuration page.
The following example shows a simple Filebeat configuration that sends data to Humio, assuming the standard Humio API is hosted on port 8080 and the elasticsearch API is available on port 9200:
filebeat.inputs:
- paths:
- $PATH_TO_LOG_FILE
encoding: utf-8
fields:
aField: value
queue.mem:
events: 8000
flush.min_events: 1000
flush.timeout: 1s
output:
elasticsearch:
# Using the standard Humio API (preferred)
hosts: ["$YOUR_HUMIO_URL:8080/api/v1/ingest/elastic-bulk"]
# Or for humio cloud
# hosts: ["https://$HUMIO_CLOUD_URL:443/api/v1/ingest/elastic-bulk"]
# Or alternatively, using the elasticsearch port
# hosts: ["$YOUR_HUMIO_URL:9200"]
username: anything
password: $INGEST_TOKEN
compression_level: 5
bulk_max_size: 200
worker: 5
The Filebeat configuration file is located at /etc/filebeat/filebeat.yml
on Linux.
You must make the following changes to the sample configuration:
Insert a path
section for each log file you want to monitor in $PATH_TO_LOG_FILE
. It is possible to insert a input configuration (with paths
and fields
) for each file that Filebeat should monitor
Add other fields in the fields section. These fields, and their values, will be added to each event.
Insert the URL and port in the ElasticSearch output to match your configuration. For example, https://$YOUR_HUMIO_URL:443 where $YOUR_HUMIO_URL
is the URL for your Humio Cloud installation. It is important to specify the port number in the URL, otherwise Filebeat defaults to using 9200. If you’re using Humio’s US cloud, the ElasticSearch interface is available at https://cloud-es.us.humio.com:443. The usage of port 9200 is not supported for Humio’s US cloud.
Insert an ingest token from the repository as the password. Set the username to anything — it will get logged in the access log of any proxy on the path so using the hostname of the sender is a good option.
Specify the text encoding to use when reading files using the encoding
field. If the log files use special, non-ASCII characters, then set the encoding here. For example, utf-8
or latin1
.
If all your events are fairly small, you can increase bulk_max_size
from the default of 200 to 300. The default of 200 is fine for most use cases. The Humio server does not limit the size of the ingest request. But keep bulk_max_size
low, as you may get the requests timed out if they get too large. In case of timeouts, Filebeat will back off, thus getting worse performance then with a lower bulk_max_size
.
Note, the Humio cloud on cloud.humio.com limits requests to 32 MB measured in bytes, not in number of events. If you go above this limit, you will get “Failed to perform any bulk index operations: 413 Request Entity Too Large”. If this happens, lower bulk_max_size
as Filebeat will otherwise keep retrying that request and not move on to other events.
You may want to increase the number of worker instances (worker
) from the default of 1 to (say) 5 or 10 to achieve more throughput if Filebeat is not able to keep up with the inputs. To get higher throughput, also increase queue.mem.events
to (say) 32000 to allow buffering for more workers.
An important next step is choosing a parser for your Filebeat events.
When setting up Filebeat, it’s helpful to see what’s going on under the hood. To do that, you need to enable debug logging. Add this to the end of your filebeat.yml
config file:
logging:
level: debug
to_files: true
to_syslog: false
files:
path: /var/log/filebeat
name: filebeat.log
keepfiles: 3
If you’re using Filebeat with systemd
, more recent versions execute Filebeat with the -e
flag by default. This will cause Filebeat to ignore many of these logging options. Notably, it will log to /var/log/messages
regardless of what you’ve specified here. To fix this, you should remove Environment="BEAT_LOG_OPTS=-e"
from Filebeats’ systemd
unit file. See [this GitHub issue]
(https://github.com/elastic/beats/issues/12024) for more details.
Run Filebeat as a service on Linux with the following commands
sudo systemctl enable filebeat
sudo systemctl restart filebeat
On linux Filebeat is often placed at /usr/share/filebeat/bin/filebeat
To test it can be run like /usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml
Humio uses parsers to parse the data from Filebeat into events. Parsers can extract fields from the input data thereby adding structure to the log events. For more information on parsers, see parsing.
Take a look at Humio’s built-in parsers.
The recommended way of choosing a parser is by assigning a specific parser to the Ingest API Token used to authenticate the client. This allows you to change parsers in Humio without changing the client. Alternatively you can specify the parser/type for each monitored file using the type
field in the fields section in the Filebeat configuration.
filebeat.inputs:
- paths:
- $PATH_TO_LOG_FILE`
encoding: utf-8
fields:
"type": $TYPE
If no parser is specified Humio’s built in key value parser (kv
) will be used. The key value parser expects the incoming string to start with a timestamp formatted in ISO 8601. It will also look for key value pairs in the string on the form a=b
.
We do not recommend that you use the JSON parsing built into Filebeat. Instead, Humio has it’s own JSON support. Filebeat processes logs line by line, so JSON parsing will only work if there is one JSON object per line. By using Humio’s built-in json parser you can get JSON fields extracted during ingest. You can also create a custom JSON parser to get more control over the fields that are created.
It’s possible to add fields with static values using the fields section. These fields will be added to each event.
Filebeat automatically sends the host (beat.hostname
) and filename (source
) along with the data. Humio adds these fields to each event. The fields are added as @host
and @source
in order to not collide with other fields in the event.
To avoid having the @host
and @source
fields, specify @host
and @source
in the fields
section with an empty value.
Humio saves data in Data Sources. You can provide a set of Tags to specify which Data Source the data is saved in. See Tagging for more information about tags and Data Sources.
If a type
is configured in Filebeat it’s always used as tag. Other fields can be used as tags by defining the fields as tagFields
in the parser pointed to by the type
. In Humio tags always start with a #
. When turning a field into a tag, the name of the field will be prepended with #
.
By default, the Filebeat handling in Humio keeps only a subset of the
fields shipped by Filebeat since the default handling targets just
getting the message from the input files into Humio as @rawstring
,
not all the extra fields that Filebeat may add. If you want to get the full set of fields, for instance if you are using
Processors
in the Filebeat configuration, then turn off the default handling by adding
these lines to you Filebeat configuration:
# Skip default Filebeat field handling in Humio by
# not including the word `filebeat` in the index name.
# The parser then gets all fields added by Filebeat.
setup.template.name: "beat"
setup.template.pattern: "beat"
output.elasticsearch.index: "beat"
By default, Filebeat creates one event for each line in the in a file. However, you can also split events in different ways. For example, stack traces in many programming languages span multiple lines.
You can specify multiline settings in the Filebeat configuration. See Filebeat’s multiline configuration documentation
Often a log event starts with a timestamp, and we want to read all lines until we see a new line starting with a timestamp. In Filebeat that can be done like this:
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
The multiline.pattern
should match your timestamp format.
Below is an example of all of this. The $YOUR_HUMIO_URL
variable is the URL for your Humio Cloud Account.
filebeat:
inputs:
- paths:
- /var/log/nginx/access.log
fields:
aField: value
- paths:
- humio_std_out.log
fields:
service: humio
multiline:
pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
negate: true
match: after
queue.mem:
events: 8000
flush.min_events: 1000
flush.timeout: 1s
output:
elasticsearch:
hosts: ["https://$YOUR_HUMIO_URL:8080/api/v1/ingest/elastic-bulk"]
username: from-me
password: "some-ingest-token"
compression_level: 5
bulk_max_size: 200
worker: 1
logging:
level: info
to_files: true
to_syslog: false
files:
path: /var/log/filebeat
name: filebeat.log
keepfiles: 3