Skip to content

Release notes

v0.3.5

Release Date: 2025-01-10 | Status: latest

Bug Fixes

  • Fix Kafka service shutdown handling so it waits for in-flight requests to complete before returning
  • Capture transport-level timeouts and cancellations of in-flight Kafka requests, including a new server.error_kind attribute
  • Compute bufstream.kafka.consumer.group.offset.lag metric attributes correctly
  • Only return fatal ILLEGAL_GENERATION errors on consumer group heartbeats and syncs for fenced members, reducing rebalancing noise
  • Fix unit mismatch in latency calculations for archiver, sequencer, and consumer groups used in performing background operations

Features and Improvements

  • Limit number of consumer group offset updates per group to control load on the metadata storage
  • Add options to skip archiving/shelving/vacuuming when running the admin clean-topics CLI command
  • Include bufstream.intake.delete.latency and bufstream.kafka.consumer.group.offset.commit.latency metrics
  • Minimize operations against etcd when deleting keys to improve metadata storage performance
  • Permit fetches of unstable offsets via the unstable_offsets=true client ID option
  • Allow override of log size reporting behavior via exact_log_sizes=false client ID option
  • Additional miscellaneous performance improvements

v0.3.4

Release Date: 2025-01-02

Bug Fixes

  • Fix shelving and archiving delays when a partition has no existing archive data or is idle
  • Only expire group members after a heartbeat request

Features and Improvements

  • Process consumer group offset updates in concurrent batches to improve throughput by several orders of magnitude
  • Improve Datadog dashboard's display of cluster-wide metrics, metrics aggregations, and display settings
  • Reduce logging on intentionally canceled fetch calls
  • admin clean-topics now always blocks and vacuums intake files (unless explicitly skipped)
  • Return retriable error codes from certain Kafka APIs to encourage clients to attempt retries in certain scenarios
  • Additional miscellaneous performance improvements

v0.3.3

Release Date: 2024-12-19

Bug Fixes

  • Return correct error code for Heartbeat, SyncGroup, and OffsetCommit responses when the consumer group is empty, which may cause some clients to fail to join
  • Fix data enforcement PASS_THROUGH behavior for invalid produce records which were instead being rejected
  • Ensure caches always respect key expirations
  • Always include error code attribute in request count metrics
  • Fix consumer group lag metric units

Features and Improvements

  • Include first stable release of user-facing metrics, dashboards, and alerts for Datadog and Grafana
  • Update metrics aggregation behavior to allow user control over cardinality
  • Improve storage provider auto-detection logic
  • Add configurable timeout for all Kafka RPCs
  • Add fetch.source attribute to metrics to identify source of potential issues
  • Improve concurrency and performance of Fetch request and admin commands
  • Deprecate observability.metrics.omit_partition_attribute configuration and OMITTED redaction option replacing with the more fine-grained observability.metrics.aggregation options
  • Additional miscellaneous performance improvements

v0.3.2

Release Date: 2024-12-12

Bug Fixes

  • Reduce logging when metrics cannot resolve a deleted topic name
  • Fix unpopulated fields in DescribeCluster and DescribeLogDirs responses
  • Fix an issue where localhost might resolve to both 127.0.0.1 and ::1 but IPv6 is disabled on the loopback device
  • Fix proper creation of an empty batch in a compacted archive containing no records

Features and Improvements

  • Improve error messages on agent startup and invalid configuration
  • Limit concurrent archiving operations based on available memory
  • Add a cluster.name attribute to all metrics
  • Remove redundant attributes from metrics and traces
  • Improve aggregation behavior for topic and partition gauges
  • Replace bufstream.schema.validation.invalid_records with bufstream.kafka.produce.record.data_enforcement.errors and bufsstream.kafka.fetch.record.data_enforcement.errors metrics
  • Introduce admin resolve command to resolve opaque topic, partition, and consumer group IDs and hashes seen in logs into their user-defined names
  • Additional miscellaneous performance improvements

v0.3.1

Release Date: 2024-12-03

Bug Fixes

  • Fix etcd garbage collection bug which could lead to large database sizes

Features and Improvements

  • Update metrics units to match OTEL recommendations
  • Allow configuring OTLP temporality preference
  • Expose bufstream health status via bufsteam.status metric

v0.3.0

Release Date: 2024-12-03

Bug Fixes

  • Fix routing bug that caused inconsistent load distribution among agents
  • Fix regression in the DescribeCluster API that returned non-unique agent hostnames, affecting certain clients
  • Fix DescribeProducers API to be strongly consistent, making some clients more reliable
  • Make consumer group offset processing more robust when groups are abruptly deleted
  • Redact sensitive record data in debug logs

Features and Improvements

  • Improve compatibility with Kafka 3.9.0 clients
  • Improve consumer group performance
  • Improve Fetch API performance
  • Add idempotent producer memory to match Apache Kafka behavior
  • Bind Kafka listeners to all resolved addresses of a hostname, instead of just the first IPv4 address
  • Improve metadata storage write scalability by ~100x by grouping partition sequencing together
  • Add admin clean-topics, admin get, and admin set CLI commands to flush the Bufstream intake and sequencing system to safely migrate towards grouping partition sequencing (see migration guide below)
  • Add admin usage CLI command for computing write statistics cluster-wide and by-topic.
  • Add timeout and quiet flags for admin commands
  • Retry startup checks to prevent spurious crashes when cluster initializes
  • Improve reliability of inter-cluster RPCs
  • Improve observability metrics, disabling internal debugging metrics by default
  • Add probes for etcd and OTLP endpoints
  • Use Kubernetes StatefulSet for deployments by default for more stable scaling behavior
  • Other miscellaneous performance improvements

Upgrading to v0.3.x

New clusters will automatically opt-in to the new partition sequencing groups, however existing clusters will need to manually perform this migration after upgrading the cluster to v0.3.x:

  1. Read these instructions completely before beginning the migration process
  2. Upgrade Bufstream cluster to 0.3.x
  3. Identify your admin URL (default: http://localhost:9089)
  4. Run bufstream admin clean-topics --url=<admin URL> and check results
  5. Optionally, disable Kafka traffic to the cluster to reduce noise
  6. Save an etcd snapshot as backup
  7. Run bufstream admin set sequence-shard-count 64 --url=<admin URL> and check results
  8. Re-enable Kafka traffic to the cluster if disabled

v0.2.0

Release Date: 2024-11-08

Bug Fixes

  • Fix a data race when flushing the intake cache
  • Log the correct Kafka address on startup
  • Reduce compaction errors by improving etcd locking
  • Fix crash when cluster is shut down unexpectedly
  • Wait for DNS resolution before registering new nodes
  • Fix a bug in epoch calculation that erroneously invalidated in-progress offset updates
  • Improve archiving behavior during cluster auto-scaling

Features and Improvements

  • Support TLS for all cluster communications, including agent-to-agent and among etcd nodes
  • Implement KIP-394
  • Improve etcd performance when reading from archives
  • Improve memory utilization and cluster performance by increasing default cache sizes
  • Add a virtual broker configuration to client IDs, so that Bufstream can present itself as a single broker when necessary
  • Improve error output for bufstream serve failures
  • Support deploying Bufstream as a Kubernetes Stateful Set
  • Expose configurable liveness and readiness probe timeouts
  • Output human readable cluster UUIDs for debugging
  • Update configuration defaults for improved read performance
  • Shutdown gracefully if Kafka or HTTP listeners fail to avoid cluster panics
  • Reduce startup and auto-scaling log verbosity
  • Various improvements to CLI reference documentation
  • Other miscellaneous performance optimizations

v0.1.3

Release Date: 2024-10-30

Bug Fixes

  • Fix a bug in v0.1.2's transaction numbering that silently discarded some commits and aborts
  • Return errors when clients attempt to change the outcome of a committed or aborted transaction
  • Fix stuck producers and consumers by polling etcd, rather than relying exclusively on leases
  • Fix a race that led to serving stale high watermarks and last stable offsets on startup
  • Fix compatibility with Java clients by always setting offset to -1 when returning produce errors
  • Improve AKHQ reliability by limiting the size of archive chunks
  • Various fixes to data management for low-throughput partitions
  • Fix archiving of internal usage-tracking topic
  • Miscellaneous fixes to log and metrics output

Features and Improvements

  • Order transaction-related RPCs with logical clocks, preventing re-ordering within Bufstream
  • Support more concurrent producers by decreasing etcd heartbeat frequency
  • Support L4 load balancers by defaulting to advertising only public hosts
  • Add support for zone-local load balancers
  • Improve graceful shutdown logic
  • Improve produce reliability by retrying more transient errors
  • Improve cluster throughput by increasing default cache size
  • Increase cluster throughput and reduce object storage costs by optimizing hedging
  • Guard against overlapping storage between clusters with a fingerprint check on startup
  • Reduce metrics cardinality by decreasing number of histogram buckets
  • Assorted improvements to logging and internal tracing

v0.1.2

Release Date: 2024-08-19 | Status: archived

This release has been archived due to a regression in the transaction processing system. All production workloads should continue to use version 0.1.1.

Bug Fixes

  • Fixes error-handling bug in topic auto-creation
  • Reduces error probability when Bufstream attempts to calculate the last stable or next unstable offset
  • Resolves error when Bufstream attempts to read the kafka.public_address value in the Helm chart
  • Prevents Bufstream from sending empty values for CPU memory limits by setting a reasonable default in the Helm chart
  • Assigns all transactions a monotonic number so that concurrent complete operations no longer result in transactions completing multiple times for a topic partition -- Bufstream now applies only the first completion for a given transaction number
  • Addresses checkpoint error when Bufstream attempts to archive internal topics

Features and Improvements

  • Expands Bufstream's Kafka conformance testing suite
  • Exposes Kafka configs in the Helm chart so that they can be set directly
  • Adds documentation for the Helm chart and recommended defaults
  • Improves debug log output for transaction state changes
  • Uncaches transactional producers to expose state transition errors
  • Allows topic replication factor to be set to -1 -- in cases where the topic replication factor is not set to -1 or 1, Bufstream will return an error
  • Improves compatibility with RedPanda console when displaying topics and offsets

v0.1.1

Release Date: 2024-08-14

Bug Fixes

  • Fixes off-by-one error in archive requests

Features and Improvements

  • Adds config option kafka.exact_log_offsets that when set to true will always return the exact offset for fetch requests
  • Updates and documents recommended default values in the helm chart
  • Improves error handling for produce requests and transactions

v0.1.0

Release Date: 2024-08-09

Bug Fixes

  • Fixes panic when coercing a message payload to Confluent Schema Registry format
  • Respects acks=0 setting on produce and does not wait for or guarantee the success of the produce request

Features and Improvements

  • Adds helm value exact_log_sizes that determines whether exact log sizes should be fetched for all topics and partitions
  • Documents dynamic configuration options
  • Adds configuration options for consumer group session timeout: group.consumer.session.timeout.ms, group.consumer.min.session.timeout.ms, group.consumer.max.session.timeout.ms

v0.0.4

Release Date: 2024-08-06

Bug Fixes

  • Fixed memory leak when uploading objects to S3 storage
  • Removed redundant zone lookups when resolving metadata requests
  • Fixed Fetch response to work with librdkafka-based clients (including the confluent-kafka Python client)
  • Amended various API responses to match expectations of the segmentio/kafka-go and IBM/sarama Go clients

Features and Improvements

  • Allow command-line flags to override YAML configuration
  • Support deleting topics by name with DeleteTopics
  • Expose additional Bufstream-specific broker and topic configuration options
  • Reduce debug log volume
  • Improve cache throughput

v0.0.3

Release Date: 2024-07-25

Bug Fixes

  • Change dataEnforcement key in helm chart to an empty object so that it does not emit a warning when coalescing values

Features and Improvements

  • Enable retries with backoff by default
  • Return an error if Bufstream cannot resolve the producer ID
  • Set etcd connection wait time to 2 minutes -- Bufstream will return an error and shut down if it cannot establish a connection to etcd within the 2 minute interval
  • Provide traces for all etcd storage errors
  • Improve topic metadata management
  • Change shut down behavior such that Bufstream will wait for the archiver to finish before shutting down

v0.0.2

Release Date: 2024-07-10

Features and Improvements

  • Emit build version in helm chart logs

v0.0.1

Release Date: 2024-07-09

  • Initial release