Release notes
v0.3.23
Release date: 2025-04-30 | Status: latest
Bug fixes
- Partial rollback of KIP-896 deprecations for the Produce RPC. librdkafka clients (as of v2.10.0) erroneously disable compression if the minimum version of the Produce RPC isn't v0. This release reestablishes that minimum, but always returns an error for v0-v2 Produce RPCs.
- Fix Apache Iceberg™ snapshot summary metrics for deleted files and records. Previously, these metrics erroneously reflected the added files and records.
- Correct Apache Iceberg™ deletion flow to remove a Parquet file reference from the table metadata prior to removing the file from object storage. Previously, the cleanup process deleted the file before updating the table.
- Fix bug in data enforcement causing Produce responses to contain partial results. In order to fail open, the enforcement logic "cleans" the Produce request of any failed batches prior to handing it to the actual Kafka logic. However, the enforcement logic was erroneously removing batches without an assigned policy, causing them to be excluded from the result. This release corrects the behavior and ensures responses are complete.
Features and improvements
- Improve broker startup behavior when the metadata store is transitively unavailable. The previous behavior caused the broker to exit non-zero, requiring a restart. Now, brokers will retry with backoff to establish a connection to the store.
- Remove long-deprecated observability configuration options. Though the removed settings are no longer supported in the Bufstream config, the deprecated Helm values will remain supported until the next minor release.
- Additional performance improvements.
v0.3.22
Release date: 2025-04-24
Bug fixes
- Raise limit on inter-broker connections causing head-of-line blocking
Features and improvements
- Reduce operations made to PostgreSQL via batching
- Optimize observability and error handling allocations
- Improve startup resiliency when metadata store is unavailable
- Point Helm chart and documentation at Docker Hub image
- Additional performance improvements
v0.3.21
Release date: 2025-04-22
Note
Breaking changes: This release removes all of the --config.*
CLI flags.
Bug fixes
- Fix OffsetFetch response to conform to client expectations when a partition is unknown to a consumer group
- Allow old data enforcement validators to be evicted from the cache, freeing up their memory
- Fix a bug in SigV4 authorization for Iceberg REST catalogs
- Avoid erroneous logs when
fetch_eager
is disabled
Features and improvements
- Remove
--config.*
flags fromserve
andmigrate
commands - Prevent PostgreSQL connection cycling when transactions are short-circuited
- Add additional tracing around the produce flow
- Reduce
bufstream
binary size by ~30MiB - Additional performance improvements
v0.3.20
Release date: 2025-04-16
Bug fixes
- Partial rollback of changes to in-flight gauges to avoid a deadlock
- Fix the PostgreSQL status probe query which would cause healthchecks to fail
v0.3.19
Release date: 2025-04-15 | Status: archived
This release has been archived due to a permanent start-up bug when using PostgreSQL (other metadata stores are not impacted). All production workloads should upgrade to use version 0.3.20.
Note
Breaking changes: This release removes the following serve
CLI command flags:
--available-memory
: Should be configured instead withBUFSTREAM_AVAILABLE_MEMORY
or the Go runtime'sGOMEMLIMIT
environment variable--etcd.address
: Only impacted the embedded etcd implementation--nokafka
: Vestigial flag removed
Bug fixes
- Fix categorization of certain compound storage errors causing inaccurate observability labeling
Features and improvements
- Release preview of PostgreSQL support on AWS and GCP
- Reduce contention on the metadata service from idle partitions
- Minimize allocations from in-flight gauge metrics and storage key computations
- Remove
--available-memory
,--etcd.address
, and--nokafka
flags fromserve
command - Additional performance improvements
v0.3.18
Release date: 2025-04-08
Bug fixes
- Fix KIP-724 implementation to return an error instead of closing the connection when receiving legacy v0 or v1 message formats
Features and improvements
- Add bearer token and OAuth (including AWS SigV4 and JWT) support to REST Iceberg catalog integrations
- Instrument additional trace spans around Produce API flow
- Include client label attributes on request latency metrics
- Additional performance improvements
v0.3.17
Release date: 2025-04-02
Bug fixes
- Match KIP-32 produce timestamp behavior and support
message.timestamp.difference.max.ms
config setting
Features and improvements
- Release preview of Microsoft Azure cloud support
- Additional performance improvements
v0.3.16
Release date: 2025-03-27
Bug fixes
- Fix metrics aggregation on partition level
Features and improvements
- Update the Prometheus exporter's behavior to always use underscored metric names
v0.3.15
Release date: 2025-03-25
Bug fixes
- Fix possible but rare infinite loop in archive cleaning
- Fix configuring the S3 endpoint in Bufstream's Helm chart values.
Features and improvements
- Improve robustness of handling unexpected Protocol Buffer data when writing to Parquet files
- Allow Iceberg™ tables to self-heal if there are errors during archiving
- Schema enforcement now ensures the specified schema ID on a record matches the subject/topic it's produced to
- Additional performance improvements
v0.3.14
Release date: 2025-03-18
Bug fixes
- Fix
SASL/SCRAM
validation for usernames including,
or=
- Fix
SASL/SCRAM
admin password normalization to match major clients' behavior
Features and improvements
- Add
kafka.partition
column to Parquet and Iceberg™ table schemas - Set identifier fields of Iceberg™ tables as a composite of
kafka.partition
andkafka.offset
fields - Set sort order of Iceberg™ tables as a composite of
kafka.ingest_timestamp
andkafka.offset
fields - Support
delete
retention policies for Iceberg™ tables - Update Kafka API version handling to better reflect our support matrix
- Additional performance improvements
v0.3.13
Release date: 2025-03-11
Bug fixes
- Enhance validation of records to reject corrupted payloads sent by a producer
Features and improvements
- Significantly decrease read traffic to metadata storage
- Update Docker image to move
serve
subcommand to theCMD
stanza - Adjust intake cache size and memory usage to use more of the resources available on the host
- Optimize vacuuming of recently produced records post-archiving
- Add
kafka.authentication.max_receive_bytes
configuration option to limit request sizes prior to authentication - Additional performance improvements
v0.3.12
Release date: 2025-03-04
Bug fixes
- Fix zstd codec pooling to properly release resources when garbage collected
- Fix config validation to not fail when the metrics exporter address is unspecified
- Expanded
SASLHandshake
API minimum version to support librdkafka - Cleanup spurious
SASL/SCRAM
logging
Features and improvements
- Add
OAUTH
mechanism to SASL authentication - Decrease metadata storage read traffic by ~50%
- Improve usability and amount of data shown in the
bufstream client metadata
command - Expand logs around data enforcement with more context to assist debugging
- Reduce string allocations in Kafka API deserialization
- Embedded etcd and local disk storage implementations now default to XDG directories
- Enhance archive creation to avoid offset overlaps between log files
- Additional performance improvements
v0.3.11
Release date: 2025-02-28
Bug fixes
- Fix mTLS client authentication configuration in Helm chart values
v0.3.10
Release date: 2025-02-25
Bug fixes
- Fix config validation on CreateTopics and AlterConfig APIs
- Properly error if
storage.endpoint
config value is set for the GCS provider - Return correct response for non-error empty DescribeClientQuotas API responses
- Fail CSR-based parsing of records that are not Protocol Buffers if the
skip_parse
option is not enabled - Avoid querying the CSR API for out-of-range schema IDs, typically due to malformed data
Features and improvements
- Release preview of mTLS and SASL authentication
- Release preview of Google Cloud Spanner metadata storage
- Switch to use enum logical type instead of int32 for enum fields in Parquet archives
- Include support for partitions in Apache Iceberg™ table archives
- Improve caching of consumer group metadata that is read often and written rarely
- Add more aggressive caching of schema registry lookups used by data enforcement
- Additional performance improvements
v0.3.9
Release date: 2025-02-13
Bug fixes
- Fix regression in schema enforcement coercion behavior
Features and improvements
- Release preview of Apache Parquet and Iceberg™ archiving
- Reduce memory allocations when decoding compressed record batches
- Reduce memory allocations when marshaling Kafka protocol payloads
- Cache topic configurations on produce to minimize metadata queries
- Additional performance improvements
v0.3.8
Release date: 2025-02-11
Note
Bufstream instances will now be referred to as "brokers" instead of "agents" in release notes, documentation, and logs.
Bug fixes
- Fix several issues that would block graceful shutdown of a broker
- Fix panic in data enforcement when a fetch or produce policy is unset
- Return proper
INVALID_REQUEST
error code for Describe/Alter/IncrementalAlterConfig requests targeting unknown resource types
Features and improvements
- Added support for
compression.type
cluster and per-topic configuration to change archival compression mode - Optimized (de)compression behavior for produced records to reduce CPU and memory usage
- Improved performance when creating and destroying a large number of topics sequentially
- Increased default reserved overhead memory and reduced default archival memory
- Set a
bufstream-<version>
user agent when making API calls to a schema registry - Made
intake.sequence_delay_max
a runtime-configurable option viabufstream.intake.sequence.delay.max.ms
- Limited Metadata responses to brokers in the client's zone to minimize cross-zone administrative operations
- Normalized S3 client logs to use same logging mechanism as the rest of Bufstream
- Reduced excessive DEBUG level and S3 SDK log spam
- Additional performance improvements
v0.3.7
Release date: 2025-01-28
Note
Clusters on v0.1.x must first upgrade to a v0.2.x release before upgrading to this or any later release.
Bug fixes
- Fixed archive timestamp calculation when intake files need to be deleted
- Fixed rare data race where shelving would compete with incomplete transactions in a sequence
- Fixed malformed
nonce
log attributes - Treat concurrent archiving as success instead of an error
Features and improvements
- Updated archive storage key format
- Added
kafka.consumer.group.id
attribute tobufstream.kafka.request.count
metrics for Kafka APIs that operate against consumer groups - Support adding
label.<name>=<value>
to a Kafka client ID to expose it as an attribute on thebufstream.kafka.request.count
metric and logs associated with the client (labels must be opted-in via the Bufstream config) - Improved performance when deleting topics
- Increase concurrency of intake fetches used by partition sequencing
- Minimize metadata changes required to add multiple partitions to a transaction
- Improve performance of intake vacuuming via increased concurrency and skipping noop calculations
- Allow changing the archive concurrency via the
bufstream.archive.concurrent.max
runtime configuration option - Added
admin clean intake
to Bufstream CLI admin
commands that taketopic
flags may optionally specify partitions and ranges with the format<topic_name>:<partition_index>
or<topic_name>:<start_index>-<end_index>
(inclusive)
v0.3.6
Release date: 2025-01-14
Bug fixes
- Fix fetching from compacted topics containing offset gaps (introduced by pre-v0.2.0 bugs or from retention-deletions)
- Rate-limit internal administrative tasks to prevent OOM-ing the broker
Features and improvements
- Introduce
bufstream admin repair topics
command to detect and fix errors - Move
bufstream admin clean-topics
tobufstream admin clean topics
- Include
bufstream.storage.keys
andbufstream.storage.bytes
metrics for the metadata storage - Rename
*.errors
metrics to*.requests
to reduce confusion when not filtering out thekafka.error_code=none
attribute - Unify log and metric attribute key names for consistency and easier redaction of sensitive information
- Refine fetch responses to more closely target byte size limits specified by clients
- Improve archive file size calculations to closer match configured target
- Additional performance improvements
v0.3.5
Release date: 2025-01-10
Bug fixes
- Fix Kafka service shutdown handling so it waits for in-flight requests to complete before returning
- Capture transport-level timeouts and cancellations of in-flight Kafka requests, including a new
server.error_kind
attribute - Compute
bufstream.kafka.consumer.group.offset.lag
metric attributes correctly - Only return fatal
ILLEGAL_GENERATION
errors on consumer group heartbeats and syncs for fenced members, reducing rebalancing noise - Fix unit mismatch in latency calculations for archiver, sequencer, and consumer groups used in performing background operations
Features and improvements
- Limit number of consumer group offset updates per group to control load on the metadata storage
- Add options to skip archiving/shelving/vacuuming when running the
admin clean-topics
CLI command - Include
bufstream.intake.delete.latency
andbufstream.kafka.consumer.group.offset.commit.latency
metrics - Minimize operations against etcd when deleting keys to improve metadata storage performance
- Permit fetches of unstable offsets via the
unstable_offsets=true
client ID option - Allow override of log size reporting behavior via
exact_log_sizes=false
client ID option - Additional miscellaneous performance improvements
v0.3.4
Release date: 2025-01-02
Bug fixes
- Fix shelving and archiving delays when a partition has no existing archive data or is idle
- Only expire group members after a heartbeat request
Features and improvements
- Process consumer group offset updates in concurrent batches to improve throughput by several orders of magnitude
- Improve Datadog dashboard's display of cluster-wide metrics, metrics aggregations, and display settings
- Reduce logging on intentionally canceled fetch calls
admin clean-topics
now always blocks and vacuums intake files (unless explicitly skipped)- Return retriable error codes from certain Kafka APIs to encourage clients to attempt retries in certain scenarios
- Additional miscellaneous performance improvements
v0.3.3
Release date: 2024-12-19
Bug fixes
- Return correct error code for
Heartbeat
,SyncGroup
, andOffsetCommit
responses when the consumer group is empty, which may cause some clients to fail to join - Fix data enforcement
PASS_THROUGH
behavior for invalid produce records which were instead being rejected - Ensure caches always respect key expirations
- Always include error code attribute in request count metrics
- Fix consumer group lag metric units
Features and improvements
- Include first stable release of user-facing metrics, dashboards, and alerts for Datadog and Grafana
- Update metrics aggregation behavior to allow user control over cardinality
- Improve storage provider auto-detection logic
- Add configurable timeout for all Kafka RPCs
- Add
fetch.source
attribute to metrics to identify source of potential issues - Improve concurrency and performance of
Fetch
request and admin commands - Deprecate
observability.metrics.omit_partition_attribute
configuration andOMITTED
redaction option replacing with the more fine-grainedobservability.metrics.aggregation
options - Additional miscellaneous performance improvements
v0.3.2
Release date: 2024-12-12
Bug fixes
- Reduce logging when metrics cannot resolve a deleted topic name
- Fix unpopulated fields in
DescribeCluster
andDescribeLogDirs
responses - Fix an issue where
localhost
might resolve to both127.0.0.1
and::1
but IPv6 is disabled on the loopback device - Fix proper creation of an empty batch in a compacted archive containing no records
Features and improvements
- Improve error messages on broker startup and invalid configuration
- Limit concurrent archiving operations based on available memory
- Add a
cluster.name
attribute to all metrics - Remove redundant attributes from metrics and traces
- Improve aggregation behavior for topic and partition gauges
- Replace
bufstream.schema.validation.invalid_records
withbufstream.kafka.produce.record.data_enforcement.errors
andbufsstream.kafka.fetch.record.data_enforcement.errors
metrics - Introduce
admin resolve
command to resolve opaque topic, partition, and consumer group IDs and hashes seen in logs into their user-defined names - Additional miscellaneous performance improvements
v0.3.1
Release date: 2024-12-03
Bug fixes
- Fix etcd garbage collection bug which could lead to large database sizes
Features and improvements
- Update metrics units to match OTEL recommendations
- Allow configuring OTLP temporality preference
- Expose bufstream health status via
bufsteam.status
metric
v0.3.0
Release date: 2024-12-03
Bug fixes
- Fix routing bug that caused inconsistent load distribution among brokers
- Fix regression in the DescribeCluster API that returned non-unique broker hostnames, affecting certain clients
- Fix DescribeProducers API to be strongly consistent, making some clients more reliable
- Make consumer group offset processing more robust when groups are abruptly deleted
- Redact sensitive record data in debug logs
Features and improvements
- Improve compatibility with Kafka 3.9.0 clients
- Improve consumer group performance
- Improve Fetch API performance
- Add idempotent producer memory to match Apache Kafka behavior
- Bind Kafka listeners to all resolved addresses of a hostname, instead of just the first IPv4 address
- Improve metadata storage write scalability by ~100x by grouping partition sequencing together
- Add
admin clean-topics
,admin get
, andadmin set
CLI commands to flush the Bufstream intake and sequencing system to safely migrate towards grouping partition sequencing (see migration guide below) - Add
admin usage
CLI command for computing write statistics cluster-wide and by-topic. - Add
timeout
andquiet
flags foradmin
commands - Retry startup checks to prevent spurious crashes when cluster initializes
- Improve reliability of inter-cluster RPCs
- Improve observability metrics, disabling internal debugging metrics by default
- Add probes for etcd and OTLP endpoints
- Use Kubernetes
StatefulSet
for deployments by default for more stable scaling behavior - Other miscellaneous performance improvements
Upgrading to v0.3.x
New clusters will automatically opt-in to the new partition sequencing groups, however existing clusters will need to manually perform this migration after upgrading the cluster to v0.3.x:
- Read these instructions completely before beginning the migration process
- Upgrade Bufstream cluster to 0.3.x
- Identify your admin URL (default: http://localhost:9089)
- Run
bufstream admin clean-topics --skip-vacuuming --url=<admin URL>
and check results - Optionally, disable Kafka traffic to the cluster to reduce noise
- Save an etcd snapshot as backup
- Run
bufstream admin set sequence-shard-count 64 --url=<admin URL>
and check results - Re-enable Kafka traffic to the cluster if disabled
Older releases
Expand to see previous releases
v0.2.0
Release date: 2024-11-08
Bug fixes
- Fix a data race when flushing the intake cache
- Log the correct Kafka address on startup
- Reduce compaction errors by improving etcd locking
- Fix crash when cluster is shut down unexpectedly
- Wait for DNS resolution before registering new nodes
- Fix a bug in epoch calculation that erroneously invalidated in-progress offset updates
- Improve archiving behavior during cluster auto-scaling
Features and improvements
- Support TLS for all cluster communications, including broker-to-broker and among etcd nodes
- Implement KIP-394
- Improve etcd performance when reading from archives
- Improve memory utilization and cluster performance by increasing default cache sizes
- Add a virtual broker configuration to client IDs, so that Bufstream can present itself as a single broker when necessary
- Improve error output for
bufstream serve
failures - Support deploying Bufstream as a Kubernetes Stateful Set
- Expose configurable liveness and readiness probe timeouts
- Output human readable cluster UUIDs for debugging
- Update configuration defaults for improved read performance
- Shutdown gracefully if Kafka or HTTP listeners fail to avoid cluster panics
- Reduce startup and auto-scaling log verbosity
- Various improvements to CLI reference documentation
- Other miscellaneous performance optimizations
v0.1.3
Release date: 2024-10-30
Bug fixes
- Fix a bug in v0.1.2's transaction numbering that silently discarded some commits and aborts
- Return errors when clients attempt to change the outcome of a committed or aborted transaction
- Fix stuck producers and consumers by polling etcd, rather than relying exclusively on leases
- Fix a race that led to serving stale high watermarks and last stable offsets on startup
- Fix compatibility with Java clients by always setting offset to -1 when returning produce errors
- Improve AKHQ reliability by limiting the size of archive chunks
- Various fixes to data management for low-throughput partitions
- Fix archiving of internal usage-tracking topic
- Miscellaneous fixes to log and metrics output
Features and improvements
- Order transaction-related RPCs with logical clocks, preventing re-ordering within Bufstream
- Support more concurrent producers by decreasing etcd heartbeat frequency
- Support L4 load balancers by defaulting to advertising only public hosts
- Add support for zone-local load balancers
- Improve graceful shutdown logic
- Improve produce reliability by retrying more transient errors
- Improve cluster throughput by increasing default cache size
- Increase cluster throughput and reduce object storage costs by optimizing hedging
- Guard against overlapping storage between clusters with a fingerprint check on startup
- Reduce metrics cardinality by decreasing number of histogram buckets
- Assorted improvements to logging and internal tracing
v0.1.2
Release date: 2024-08-19 | Status: archived
This release has been archived due to a regression in the transaction processing system. All production workloads should continue to use version 0.1.1.
Bug fixes
- Fixes error-handling bug in topic auto-creation
- Reduces error probability when Bufstream attempts to calculate the last stable or next unstable offset
- Resolves error when Bufstream attempts to read the
kafka.public_address
value in the Helm chart - Prevents Bufstream from sending empty values for CPU memory limits by setting a reasonable default in the Helm chart
- Assigns all transactions a monotonic number so that concurrent complete operations no longer result in transactions completing multiple times for a topic partition -- Bufstream now applies only the first completion for a given transaction number
- Addresses checkpoint error when Bufstream attempts to archive internal topics
Features and improvements
- Expands Bufstream's Kafka conformance testing suite
- Exposes Kafka configs in the Helm chart so that they can be set directly
- Adds documentation for the Helm chart and recommended defaults
- Improves debug log output for transaction state changes
- Uncaches transactional producers to expose state transition errors
- Allows topic replication factor to be set to
-1
-- in cases where the topic replication factor is not set to -1 or 1, Bufstream will return an error - Improves compatibility with RedPanda console when displaying topics and offsets
v0.1.1
Release date: 2024-08-14
Bug fixes
- Fixes off-by-one error in archive requests
Features and improvements
- Adds config option
kafka.exact_log_offsets
that when set to true will always return the exact offset for fetch requests - Updates and documents recommended default values in the helm chart
- Improves error handling for produce requests and transactions
v0.1.0
Release date: 2024-08-09
Bug fixes
- Fixes panic when coercing a message payload to Confluent Schema Registry format
- Respects
acks=0
setting on produce and does not wait for or guarantee the success of the produce request
Features and improvements
- Adds helm value
exact_log_sizes
that determines whether exact log sizes should be fetched for all topics and partitions - Documents dynamic configuration options
- Adds configuration options for consumer group session timeout:
group.consumer.session.timeout.ms
,group.consumer.min.session.timeout.ms
,group.consumer.max.session.timeout.ms
v0.0.4
Release date: 2024-08-06
Bug fixes
- Fixed memory leak when uploading objects to S3 storage
- Removed redundant zone lookups when resolving metadata requests
- Fixed
Fetch
response to work withlibrdkafka
-based clients (including theconfluent-kafka
Python client) - Amended various API responses to match expectations of the
segmentio/kafka-go
andIBM/sarama
Go clients
Features and improvements
- Allow command-line flags to override YAML configuration
- Support deleting topics by name with
DeleteTopics
- Expose additional Bufstream-specific broker and topic configuration options
- Reduce debug log volume
- Improve cache throughput
v0.0.3
Release date: 2024-07-25
Bug fixes
- Change
dataEnforcement
key in helm chart to an empty object so that it does not emit a warning when coalescing values
Features and improvements
- Enable retries with backoff by default
- Return an error if Bufstream cannot resolve the producer ID
- Set etcd connection wait time to 2 minutes -- Bufstream will return an error and shut down if it cannot establish a connection to etcd within the 2 minute interval
- Provide traces for all etcd storage errors
- Improve topic metadata management
- Change shut down behavior such that Bufstream will wait for the archiver to finish before shutting down
v0.0.2
Release date: 2024-07-10
Features and improvements
- Emit build version in helm chart logs
v0.0.1
Release date: 2024-07-09
- Initial release