Skip to content

Metrics

Bufstream metrics

Bufstream exposes metrics to monitor Kafka producers, consumers, topics, and the status of the Bufstream agents. It reports all metrics with a cluster.name attribute set by the Helm chart cluster attribute or the cluster setting in the bufstream.yaml config file. We recommend setting this name to a unique value for each cluster—for example, staging for a pre-production cluster and prod for a production cluster.

Available Metrics

Name Type Attributes Description
bufstream.kafka.active_connections Gauge cluster.name Active number of connections to the Bufstream Agent.
bufstream.kafka.active_requests Gauge cluster.name
kafka.api.key
kafka.api.version
Active requests by API key and version.
bufstream.kafka.consumer.group.count Gauge cluster.name
kafka.consumer.group.state
Number of consumer groups by state.
bufstream.kafka.consumer.group.generation Gauge cluster.name
kafka.consumer.group.id
The generation number of a consumer group. Not reported if consumer group aggregation is enabled.
bufstream.kafka.consumer.group.joins Counter cluster.name
kafka.consumer.group.id
The number of joins for a consumer group.
bufstream.kafka.consumer.group.lag Gauge cluster.name
kafka.consumer.group.id
kafka.topic.name
kafka.topic.partition
The lag between a group member's committed offset and the partition's high watermark.
bufstream.kafka.consumer.group.member.count Gauge cluster.name
kafka.consumer.group.id
The number of consumers in a group. Not reported if consumer group aggregation is enabled.
bufstream.kafka.consumer.group.offset Gauge cluster.name
kafka.consumer.group.id
kafka.topic.name
kafka.topic.partition
The latest offset commited for a consumer group. Not reported if consumer group, topic, or partition aggregation is enabled.
bufstream.kafka.fetch.bytes Counter cluster.name
fetch.source
kafka.topic.name
kafka.topic.partition
Amount of data fetched by topic and partition.
bufstream.kafka.fetch.errors Counter cluster.name
kafka.error_code
kafka.topic.name
The number of fetch errors for a given topic. For successful fetches, kafka.error_code will be none.
bufstream.kafka.fetch.record.count Counter cluster.name
fetch.source
kafka.topic.name
kafka.topic.partition
The number of records fetched by topic and partition.
bufstream.kafka.fetch.record.data_enforcement.errors Counter cluster.name
data_enforcement.action
data_enforcement.error_type
kafka.topic.name
kafka.topic.partition
Number of errors fetching from a topic with data enforcement enabled.
bufstream.kafka.produce.bytes Counter cluster.name
kafka.topic.name
kafka.topic.partition
The number of bytes produced by topic and partition.
bufstream.kafka.produce.delay.duration Histogram cluster.name
kafka.topic.name
The delay between record creation time and log append time in seconds.
bufstream.kafka.produce.errors Counter cluster.name
kafka.error_code
kafka.topic.name
The number of produce errors by topic name and error code.
bufstream.kafka.produce.record.data_enforcement.errors Counter cluster.name
data_enforcement.action
data_enforcement.error_type
kafka.topic.name
kafka.topic.partition
Number of errors producing to a topic with data enforcement enabled.
bufstream.kafka.produce.uncompressed_bytes Counter cluster.name
kafka.topic.name
The number of uncompressed bytes produced (by topic).
bufstream.kafka.request.bytes Histogram cluster.name
kafka.api.key
kafka.api.version
The number of bytes in a Kafka request.
bufstream.kafka.request.count Counter cluster.name
kafka.api.key
kafka.api.version
kafka.error_code
Number of processed Kafka requests. Includes the error code for troubleshooting failed requests.
bufstream.kafka.request.latency Histogram cluster.name
kafka.api.key
kafka.api.version
The latency of processed requests in seconds.
bufstream.kafka.response.bytes Histogram cluster.name
kafka.api.key
kafka.api.version
The number of bytes in each Kafka response.
bufstream.kafka.topic.count Gauge cluster.name The number of topics in the cluster.
bufstream.kafka.topic.partition.count Gauge cluster.name
kafka.topic.name
The number of partitions in a topic.
bufstream.kafka.topic.partition.offset.high_water_mark Gauge cluster.name
kafka.topic.name
kafka.topic.partition
A lower bound on the high water mark of a partition. Not reported if topic or partition aggregation is enabled.
bufstream.kafka.topic.partition.offset.last_stable_offset Gauge cluster.name
kafka.topic.name
kafka.topic.partition
A lower bound on the last stable offset of a partition. Not reported if topic or partition aggregation is enabled.
bufstream.kafka.topic.partition.offset.low_water_mark Gauge cluster.name
kafka.topic.name
kafka.topic.partition
A lower bound on the low water mark of a partition. Not reported if topic or partition aggregation is enabled.
bufstream.kafka.topic.partition.retained_bytes Gauge cluster.name
kafka.topic.name
kafka.topic.partition
A lower bound on the size of records in a partition.
bufstream.kafka.topic.partition.retained_records Gauge cluster.name
kafka.topic.name
kafka.topic.partition
An estimate of the number of records retained in a partition.
bufstream.status Gauge cluster.name
status.probe
The result (0 = healthy, 1 = warning, 2 = error) of status probes on each Bufstream agent.

Attributes

Name Description
cluster.name The configured Bufstream cluster name.
data_enforcement.action One of pass_through, reject, or filter.
data_enforcement.error_type One of decode, validate, or internal.
fetch.source The source of the record data (recent, shelf, or archive).
kafka.api.key Kafka API key (string format). For example, Produce or Fetch.
kafka.api.version The API version for the request or response message.
kafka.consumer.group.id The Kafka consumer group ID (group.id).
kafka.consumer.group.state The state of a consumer group (Stable, Empty, PreparingRebalance, CompletingRebalance, Dead).
kafka.error_code The Kafka error code, converted to lower snake case.
kafka.topic.name The name of the topic. If sensitive information redaction is set to OPAQUE, this will contain the topic ID. If topic aggregation is enabled, this will be set to _all_topics_.
kafka.topic.partition The partition index of the topic. Set to -1 if topic or partition aggregation is enabled.
status.probe The name of the Bufstream status probe. For example, storage or etcd.

Controlling metric cardinality

By default, Bufstream reports metrics for individual topics, partitions, and consumer groups. In some environments, reporting metrics at the default cardinality may be too expensive. For example, if topics typically have hundreds of partitions, reporting per-partition metrics requires a significant cost in the monitoring system.

To support environments with high cardinality, Bufstream supports aggregating metrics for topics, partitions, and consumer groups. To configure aggregation in the Helm chart, use the following settings:

observability:
  metrics:
    aggregation:
      # If true, per-topic metrics won't be reported but aggregated across all topics and partitions.
      # Metrics that don't support aggregation such as `bufstream.kafka.topic.partition.offset.high_water_mark` won't be reported.
      topics: false

      # If true, per-partition metrics won't be reported but aggregated across all partitions.
      # Metrics that don't support aggregation such as `bufstream.kafka.topic.partition.offset.high_water_mark` won't be reported.
      partitions: false

      # If true, per-consumer group metrics won't be reported but aggregated across all consumer groups.
      # Metrics that don't support aggregation such as `bufstream.kafka.consumer.group.offset` won't be reported.
      consumerGroups: false

If aggregation is enabled, Bufstream reports the following attributes with metrics:

Aggregation Attribute name Attribute value
Consumer Groups kafka.consumer.group.id _all_groups_
Topics kafka.topic.name _all_topics_
Topics or partitions kafka.topic.partition -1