Getting Metrics out of Humio

Humio generates a number of metrics that can be used to monitor and operate Humio itself.

The metrics are available in different ways.

JMX

Humio can expose all metrics over JMX. To enable this, you need to set the standard JMX options to your JVM by adding them to the HUMIO_JVM_ARGS configuration.

For example

HUMIO_JVM_ARGS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=5000

Prometheus

Setting the PROMETHEUS_METRICS_PORT configuration will enable Prometheus to scrape metrics from Humio.

Humio Debug Logs

The Humio debug log also contains all the metrics. You can find them in the humio repository or the special humio-metrics repository.

Metric Types

There are two types of metrics in Humio; node level metrics and objects level metrics. The last type is for objects such as repositories, ingest listeners, or storage partitions. An example of an object level metric is ingest-bytes/<repo>. Here <repo> is a placeholder for a concrete repository in a given Humio system.

Node Level Metrics

Metric Name Description
backup-disk-usage Percent used on the backup disk. Only present if the backup is enabled
bucket-storage-fetch-for-query-queue Count of segment files queued awaiting fetch from Bucket Storage to local data store due to being referred by a query
bucket-storage-pending-upload Total size of segment files pending upload to Bucket Storage
bucket-storage-pending-upload-underreplicated Total size of segment files pending upload to Bucket Storage for files that are not known to have more than one replica in the local cluster
bucket-storage-total-segment-size Total size of segment files stored in Bucket Storage
cluster-time-skew Largest time skew (in milliseconds) between this node and any other node in the cluster
digest-active-datasources Number of active datasources
digest-buffer-target-latency Latency target of in-memory buffer after ingest queue in digest pipeline
digest-coordinator-changes Number of changes to the set of active digest nodes triggered by digest coordination. For a healthy system this is close to zero, except when an administrator alters the desired digest partition scheme
digest-live-latency Latency of live update part of digest pipeline for internal bulks in milliseconds
digest-segment-latency Latency of segment building part of digest pipeline for internal bulks in milliseconds
elastic-search-ingestion-events-in-bulk Number of events found in an elastic-search bulk
elastic-search-ingestion-request-errors Number of ingest errors in the elastic-search endpoint since the node started
elastic-search-ingestion-requests Time spent ingesting a bulk request using the elasticsearch ingest protocol
event-collector-request-errors Number of ingest errors in the http-event-collector endpoint since the node started
event-latency Overall latency of ingest queue and digest pipeline not including parsers, but from insert into ingest queue, then updating live queries and adding events to blocks for segment files
failed-http-checks Number of nodes that appear to be unreachable using http as seen from this node. A healthy system has zero of these
gcs-storage-read Bytes fetched for raw segment files and aux files from gcs to local data store
gcs-storage-write Bytes stored for raw segment files and aux files using gcs as data store
global-publish-wait-for-value Time spent waiting to see the value being read back from Kafka when pushing an update to the global state
globalsnapshot-size Size of global-snapshot.json file written
hashfilter-included-blocks Number of blocks included using hashfilters in queries and thus read from compressed blocks in segment files
hashfilter-skipped-blocks Number of blocks skipped using informed filters in queries and thus not read from compressed blocks in segment files
http-requests Timing of all inbound http requests
http-requests-external-size Size of external inbound http requests
http-requests-external-timing Timing of external inbound http requests
http-requests-internal-size Size of internal inbound http requests
http-requests-internal-timing Timing of internal inbound http requests
humio-ingestion-request-errors Number of ingest errors in the humio ingestion endpoint since the node started
ingest-bytes-total Number of bytes uncompressed in flushed blocks for segments being constructed across all repos
ingest-listener-tcp-available TCP ingest listener free slots for lines to be processed (high when idle, zero when over-loaded)
ingest-writer-bulksize Histogram of size (bytes) of data for jobs that carry events. Some jobs are no-payload and are not included here
ingest-writer-compressed-bytes Number of bytes written to kafka as compressed events into the ingest queue in total
ingest-writer-jobs Number of jobs pushed to in-memory job queue for digest writers
ingest-writer-queue-add Number of times an ingest queue consumer pushes to in-memory job queue for digest writers, including when the operation fails due to the queue being full
ingest-writer-queue-empty Number of times an ingest queue consumer hit an empty queue while pushing to in-memory job queue for digest writers
ingest-writer-queue-full Number of times an ingest queue consumer hit a full queue while pushing to in-memory job queue for digest writers
ingest-writer-uncompressed-bytes Number of bytes written to kafka before compression for events into the ingest queue in total
jvm-hiccup-latency Latency of timed events inside Humio jvm
kafka-chatter-bytes Number of bytes written to kafka on the chatter topic
kafka-chatter-put Time waiting for getting ack from Kafka when publishing to the chatter topic
kafka-ingestqueue-put Time waiting for getting ack when adding ingest events to the ingest queue
kafka-request-bytes Number of bytes written to kafka as compressed events for the ingest queue
kafka-request-events Number of events written to kafka as compressed events for the ingest queue
live-dashboard-query-count Number of live queries on dashboards
livequeries-canceled-due-to-digest-delay Number of live queries that have been canceled due to excessive digest delay
livequeries-rate The rate of the cost of live queries, in cost/s
livequeries-rate-canceled-due-to-digest-delay The rate of the cost of live queries canceled due to excessive digest delay, in cost/s
livequery-count Number of live- (real time-) queries active
load-segment-total Time spent reading (waiting for) blocks from segment files
local-query-segments-queue Count of elements in queue as number of segments currently queued for query
logplex-ingestion-request-errors Number of ingest errors in the logplex endpoint since the node started
mapsegment Time spent on ‘map’ phase while searching non-real time segment files
mini-segment-created Number of new mini-segment being created. The number gets incremented when the mini-segment gets closed and added to global
missing-cluster-nodes Number of nodes that this node has decided are now dead. A healthy system has zero of these
primary-disk-usage Percent used on the primary disk
proxied-query-polls Timing of internal requests due to polling of queries not hitting the server coordinating the query
queries Total number of queries started since this node started
query Measure how long it takes for queries to complete
query-delta-total-cost 30s delta of total cost on queries for the entire cluster
query-delta-total-memory-allocation 30s delta of total memory allocation on queries for the entire cluster
query-live-delta-cpu-usage 30s delta of cpu usage on live queries for the entire cluster
query-segments-count Segment being queried that hit local files. Includes those fetched from remote once they arrive
query-segments-count-from-remote Segments being queried that missed local, triggering a fetch from remote
query-static-delta-cpu-usage 30s delta of cpu usage on static queries for the entire cluster
query-thread-limit Number of threads allowed to be executing historical parts of queries. Gets turned down if digest is unable to keep up
read-compressed-bytes Number of bytes of read from compressed blocks in segment files
read-prefilter-bytes Number of bytes of read from pre-filter files
recompress-millis Number of milliseconds CPU time spent merging and re-compressing segment files
s3-archiving-bytes-per-second Bytes archived in S3 per second
s3-archiving-errors-per-second Errors per second archiving logs in S3
s3-archiving-writes-per-second Successful S3 archival writes per second
s3-storage-read Bytes fetched for raw segment files and aux files from s3 to local data store
s3-storage-write Bytes stored for raw segment files and aux files using s3 as data store
schedulesegments Time spent scheduling segment files for the ‘map’ phase while searching non-real time segment files
secondary-disk-usage Percent used on the secondary disk. Only present if secondary disk is configured
segment-merge-cpu-time CPU time spent merging segments
serialize-state-bytes Number of bytes serialized for internal query states
serialize-state-time Time spent serializing internal query states
target-segment-blocks Number of blocks in segments created by merging mini-segments
target-segment-compressed-size Size of the file for segments created by merging mini-segments
target-segment-created Number of new segment targets being created. The number gets incremented when the target id is chosen, before any of the mini-segments exist
target-segment-uncompressed-size Number of bytes uncompressed for segments created by merging mini-segments
time-digest CPU time used on digest as a fraction of wall time
time-livequery CPU time used on live queries as a fraction of wall time
timestamp-parsing-failed Total number of timestamp strings that did not parse as a time stamp since start of the node
uploaded-files-cache-entries Cached uploaded files. How many files are cached in memory

Object Level Metrics

Metric Name Description
current-live-events/<repo> Number of events being processed updating live queries
data-ingester-errors/<repo> Number of events that got an @error tag added to their fields during parsing
event-latency-partition/<repo> Per-partition latency of the ‘humio-ingest’ topic of the ingest+digest pipeline including time spent in parsers, updating live queries and adding events to blocks for segment files
event-latency-repo/<repo> For each repository, overall latency of ingest+digest pipeline including time spent in parsers, updating live queries and adding events to blocks for segment files
ingest-bytes/<repo> Number of bytes uncompressed in flushed blocks for segments being constructed
ingest-eventsize/<repo> Number of bytes uncompressed summed over individual events in blocks in progress
ingest-parsing/<repo> Time spent parsing incoming events
ingest-queue-consumer/<repo> Time spent constructing segment file blocks in memory and writing them to disk, including updating live queries if any
ingest-queue-latency/<repo> Latency of the ingest queue from insert into queue (after the parsers has completed) and up to the data has been read but not yet processed in the digest node for each partition.
ingest-reader-partition-bytes/<repo> Number of bytes read from kafka as compressed events from the ingest queue
ingest-reader-partition-events/<repo> Number of events added to segment file blocks being constructed
ingest-reader-polltime/<repo> Time blocked waiting for next message from Kafka from ingest queue
ingest-writer-partition-bytes/<repo> Number of bytes written to kafka as compressed events into the ingest queue in each partition
kafka-chatter-by-kind-bytes/<repo> Number of bytes written to kafka on the chatter topic for each kind of chatter
kafka-chatter-by-kind-serialize/<repo> Time spent serializing value being written to Kafka when publishing to the chatter topic for each kind of chatter
notifications/<repo> Time spent shipping a notification from an alert
query-millis/<repo> Number of milliseconds spent processing historical queries
repo-queries/<repo> Number of queries started per repo
tcp-ingest-bytes/<repo> Number of bytes read by tcp ingest listener
udp-ingest-bytes/<repo> Number of bytes read by udp ingest listener
written-events-after-queue/<repo> Number of events added to segment file blocks being constructed
written-events/<repo> Number of events written to the ingest queue after being parsed