Release notes
v0.3.5
Release Date: 2025-01-10 | Status: latest
Bug Fixes
- Fix Kafka service shutdown handling so it waits for in-flight requests to complete before returning
- Capture transport-level timeouts and cancellations of in-flight Kafka requests, including a new
server.error_kind
attribute - Compute
bufstream.kafka.consumer.group.offset.lag
metric attributes correctly - Only return fatal
ILLEGAL_GENERATION
errors on consumer group heartbeats and syncs for fenced members, reducing rebalancing noise - Fix unit mismatch in latency calculations for archiver, sequencer, and consumer groups used in performing background operations
Features and Improvements
- Limit number of consumer group offset updates per group to control load on the metadata storage
- Add options to skip archiving/shelving/vacuuming when running the
admin clean-topics
CLI command - Include
bufstream.intake.delete.latency
andbufstream.kafka.consumer.group.offset.commit.latency
metrics - Minimize operations against etcd when deleting keys to improve metadata storage performance
- Permit fetches of unstable offsets via the
unstable_offsets=true
client ID option - Allow override of log size reporting behavior via
exact_log_sizes=false
client ID option - Additional miscellaneous performance improvements
v0.3.4
Release Date: 2025-01-02
Bug Fixes
- Fix shelving and archiving delays when a partition has no existing archive data or is idle
- Only expire group members after a heartbeat request
Features and Improvements
- Process consumer group offset updates in concurrent batches to improve throughput by several orders of magnitude
- Improve Datadog dashboard's display of cluster-wide metrics, metrics aggregations, and display settings
- Reduce logging on intentionally canceled fetch calls
admin clean-topics
now always blocks and vacuums intake files (unless explicitly skipped)- Return retriable error codes from certain Kafka APIs to encourage clients to attempt retries in certain scenarios
- Additional miscellaneous performance improvements
v0.3.3
Release Date: 2024-12-19
Bug Fixes
- Return correct error code for
Heartbeat
,SyncGroup
, andOffsetCommit
responses when the consumer group is empty, which may cause some clients to fail to join - Fix data enforcement
PASS_THROUGH
behavior for invalid produce records which were instead being rejected - Ensure caches always respect key expirations
- Always include error code attribute in request count metrics
- Fix consumer group lag metric units
Features and Improvements
- Include first stable release of user-facing metrics, dashboards, and alerts for Datadog and Grafana
- Update metrics aggregation behavior to allow user control over cardinality
- Improve storage provider auto-detection logic
- Add configurable timeout for all Kafka RPCs
- Add
fetch.source
attribute to metrics to identify source of potential issues - Improve concurrency and performance of
Fetch
request and admin commands - Deprecate
observability.metrics.omit_partition_attribute
configuration andOMITTED
redaction option replacing with the more fine-grainedobservability.metrics.aggregation
options - Additional miscellaneous performance improvements
v0.3.2
Release Date: 2024-12-12
Bug Fixes
- Reduce logging when metrics cannot resolve a deleted topic name
- Fix unpopulated fields in
DescribeCluster
andDescribeLogDirs
responses - Fix an issue where
localhost
might resolve to both127.0.0.1
and::1
but IPv6 is disabled on the loopback device - Fix proper creation of an empty batch in a compacted archive containing no records
Features and Improvements
- Improve error messages on agent startup and invalid configuration
- Limit concurrent archiving operations based on available memory
- Add a
cluster.name
attribute to all metrics - Remove redundant attributes from metrics and traces
- Improve aggregation behavior for topic and partition gauges
- Replace
bufstream.schema.validation.invalid_records
withbufstream.kafka.produce.record.data_enforcement.errors
andbufsstream.kafka.fetch.record.data_enforcement.errors
metrics - Introduce
admin resolve
command to resolve opaque topic, partition, and consumer group IDs and hashes seen in logs into their user-defined names - Additional miscellaneous performance improvements
v0.3.1
Release Date: 2024-12-03
Bug Fixes
- Fix etcd garbage collection bug which could lead to large database sizes
Features and Improvements
- Update metrics units to match OTEL recommendations
- Allow configuring OTLP temporality preference
- Expose bufstream health status via
bufsteam.status
metric
v0.3.0
Release Date: 2024-12-03
Bug Fixes
- Fix routing bug that caused inconsistent load distribution among agents
- Fix regression in the DescribeCluster API that returned non-unique agent hostnames, affecting certain clients
- Fix DescribeProducers API to be strongly consistent, making some clients more reliable
- Make consumer group offset processing more robust when groups are abruptly deleted
- Redact sensitive record data in debug logs
Features and Improvements
- Improve compatibility with Kafka 3.9.0 clients
- Improve consumer group performance
- Improve Fetch API performance
- Add idempotent producer memory to match Apache Kafka behavior
- Bind Kafka listeners to all resolved addresses of a hostname, instead of just the first IPv4 address
- Improve metadata storage write scalability by ~100x by grouping partition sequencing together
- Add
admin clean-topics
,admin get
, andadmin set
CLI commands to flush the Bufstream intake and sequencing system to safely migrate towards grouping partition sequencing (see migration guide below) - Add
admin usage
CLI command for computing write statistics cluster-wide and by-topic. - Add
timeout
andquiet
flags foradmin
commands - Retry startup checks to prevent spurious crashes when cluster initializes
- Improve reliability of inter-cluster RPCs
- Improve observability metrics, disabling internal debugging metrics by default
- Add probes for etcd and OTLP endpoints
- Use Kubernetes
StatefulSet
for deployments by default for more stable scaling behavior - Other miscellaneous performance improvements
Upgrading to v0.3.x
New clusters will automatically opt-in to the new partition sequencing groups, however existing clusters will need to manually perform this migration after upgrading the cluster to v0.3.x:
- Read these instructions completely before beginning the migration process
- Upgrade Bufstream cluster to 0.3.x
- Identify your admin URL (default: http://localhost:9089)
- Run
bufstream admin clean-topics --url=<admin URL>
and check results - Optionally, disable Kafka traffic to the cluster to reduce noise
- Save an etcd snapshot as backup
- Run
bufstream admin set sequence-shard-count 64 --url=<admin URL>
and check results - Re-enable Kafka traffic to the cluster if disabled
v0.2.0
Release Date: 2024-11-08
Bug Fixes
- Fix a data race when flushing the intake cache
- Log the correct Kafka address on startup
- Reduce compaction errors by improving etcd locking
- Fix crash when cluster is shut down unexpectedly
- Wait for DNS resolution before registering new nodes
- Fix a bug in epoch calculation that erroneously invalidated in-progress offset updates
- Improve archiving behavior during cluster auto-scaling
Features and Improvements
- Support TLS for all cluster communications, including agent-to-agent and among etcd nodes
- Implement KIP-394
- Improve etcd performance when reading from archives
- Improve memory utilization and cluster performance by increasing default cache sizes
- Add a virtual broker configuration to client IDs, so that Bufstream can present itself as a single broker when necessary
- Improve error output for
bufstream serve
failures - Support deploying Bufstream as a Kubernetes Stateful Set
- Expose configurable liveness and readiness probe timeouts
- Output human readable cluster UUIDs for debugging
- Update configuration defaults for improved read performance
- Shutdown gracefully if Kafka or HTTP listeners fail to avoid cluster panics
- Reduce startup and auto-scaling log verbosity
- Various improvements to CLI reference documentation
- Other miscellaneous performance optimizations
v0.1.3
Release Date: 2024-10-30
Bug Fixes
- Fix a bug in v0.1.2's transaction numbering that silently discarded some commits and aborts
- Return errors when clients attempt to change the outcome of a committed or aborted transaction
- Fix stuck producers and consumers by polling etcd, rather than relying exclusively on leases
- Fix a race that led to serving stale high watermarks and last stable offsets on startup
- Fix compatibility with Java clients by always setting offset to -1 when returning produce errors
- Improve AKHQ reliability by limiting the size of archive chunks
- Various fixes to data management for low-throughput partitions
- Fix archiving of internal usage-tracking topic
- Miscellaneous fixes to log and metrics output
Features and Improvements
- Order transaction-related RPCs with logical clocks, preventing re-ordering within Bufstream
- Support more concurrent producers by decreasing etcd heartbeat frequency
- Support L4 load balancers by defaulting to advertising only public hosts
- Add support for zone-local load balancers
- Improve graceful shutdown logic
- Improve produce reliability by retrying more transient errors
- Improve cluster throughput by increasing default cache size
- Increase cluster throughput and reduce object storage costs by optimizing hedging
- Guard against overlapping storage between clusters with a fingerprint check on startup
- Reduce metrics cardinality by decreasing number of histogram buckets
- Assorted improvements to logging and internal tracing
v0.1.2
Release Date: 2024-08-19 | Status: archived
This release has been archived due to a regression in the transaction processing system. All production workloads should continue to use version 0.1.1.
Bug Fixes
- Fixes error-handling bug in topic auto-creation
- Reduces error probability when Bufstream attempts to calculate the last stable or next unstable offset
- Resolves error when Bufstream attempts to read the
kafka.public_address
value in the Helm chart - Prevents Bufstream from sending empty values for CPU memory limits by setting a reasonable default in the Helm chart
- Assigns all transactions a monotonic number so that concurrent complete operations no longer result in transactions completing multiple times for a topic partition -- Bufstream now applies only the first completion for a given transaction number
- Addresses checkpoint error when Bufstream attempts to archive internal topics
Features and Improvements
- Expands Bufstream's Kafka conformance testing suite
- Exposes Kafka configs in the Helm chart so that they can be set directly
- Adds documentation for the Helm chart and recommended defaults
- Improves debug log output for transaction state changes
- Uncaches transactional producers to expose state transition errors
- Allows topic replication factor to be set to
-1
-- in cases where the topic replication factor is not set to -1 or 1, Bufstream will return an error - Improves compatibility with RedPanda console when displaying topics and offsets
v0.1.1
Release Date: 2024-08-14
Bug Fixes
- Fixes off-by-one error in archive requests
Features and Improvements
- Adds config option
kafka.exact_log_offsets
that when set to true will always return the exact offset for fetch requests - Updates and documents recommended default values in the helm chart
- Improves error handling for produce requests and transactions
v0.1.0
Release Date: 2024-08-09
Bug Fixes
- Fixes panic when coercing a message payload to Confluent Schema Registry format
- Respects
acks=0
setting on produce and does not wait for or guarantee the success of the produce request
Features and Improvements
- Adds helm value
exact_log_sizes
that determines whether exact log sizes should be fetched for all topics and partitions - Documents dynamic configuration options
- Adds configuration options for consumer group session timeout:
group.consumer.session.timeout.ms
,group.consumer.min.session.timeout.ms
,group.consumer.max.session.timeout.ms
v0.0.4
Release Date: 2024-08-06
Bug Fixes
- Fixed memory leak when uploading objects to S3 storage
- Removed redundant zone lookups when resolving metadata requests
- Fixed
Fetch
response to work withlibrdkafka
-based clients (including theconfluent-kafka
Python client) - Amended various API responses to match expectations of the
segmentio/kafka-go
andIBM/sarama
Go clients
Features and Improvements
- Allow command-line flags to override YAML configuration
- Support deleting topics by name with
DeleteTopics
- Expose additional Bufstream-specific broker and topic configuration options
- Reduce debug log volume
- Improve cache throughput
v0.0.3
Release Date: 2024-07-25
Bug Fixes
- Change
dataEnforcement
key in helm chart to an empty object so that it does not emit a warning when coalescing values
Features and Improvements
- Enable retries with backoff by default
- Return an error if Bufstream cannot resolve the producer ID
- Set etcd connection wait time to 2 minutes -- Bufstream will return an error and shut down if it cannot establish a connection to etcd within the 2 minute interval
- Provide traces for all etcd storage errors
- Improve topic metadata management
- Change shut down behavior such that Bufstream will wait for the archiver to finish before shutting down
v0.0.2
Release Date: 2024-07-10
Features and Improvements
- Emit build version in helm chart logs
v0.0.1
Release Date: 2024-07-09
- Initial release