Quickstart
What is Bufstream?
Bufstream is a fully self-hosted drop-in replacement for Apache Kafka® that writes data to S3-compatible object storage. It’s 100% compatible with the Kafka protocol, including support for exactly-once semantics (EOS) and transactions. Bufstream enforces data quality and governance requirements directly at the broker with Buf’s Advanced Semantic Intelligence™ engine powered by Protovalidate. Data written to S3 by Bufstream is encoded in Parquet format and includes Apache Iceberg™ metadata, reducing Time-to-Insight in popular data lakehouse products like Snowflake or ClickHouse.
In this demonstration, you'll run a Bufstream agent locally, use the franz-go
library to publish and consume messages, and explore Bufstream's schema enforcement features.
Requirements
Getting started
Before starting, perform the following steps to prepare your environment to run the demo.
-
Clone this repo:
-
Start downloading Docker images: [optional]
Try out Bufstream
The Bufstream demo application simulates a small application that produces and
consumes EmailUpdated
event messages. The demo publishes these events to a
Bufstream topic, followed by a separate consumer that fetches from the same
topic and "verifies" the change. The demo app uses an off-the-shelf Kafka client
(franz-go
) to interact with Bufstream.
The demo attempts to publish and consume three payloads:
- A semantically valid, correctly formatted version of the
EmailUpdated
message. - A correctly formatted, but semantically invalid version of the message.
- A malformed message.
Replace Apache Kafka
-
Boot up the Bufstream and demo apps:
-
The app logs both the publishing of the events on the producer side and shortly after, the consumption of these events. The consumer deserializes the first two messages correctly, while the final message fails with a deserialization error.
At this point, you've used Bufstream as a drop-in replacement for Apache Kafka.
Data quality enforcement
So far, Bufstream hasn't applied any quality enforcement on the data passing through it—addresses can be invalid or malformed. To ensure the quality of data flowing through Bufstream, you can configure policies that require data to conform to a pre-determined schema, pass semantic validation, and even redact sensitive information on the fly.
We'll demonstrate this functionality using the Buf Schema Registry, but Bufstream also works with any registry that implements Confluent's REST API.
-
Uncomment the
data_enforcement
block inconfigs/bufstream.yaml
. -
Uncomment the
--csr
command options under thedemo
service indocker-compose.yaml
. -
To pick up the configuration changes, terminate the Docker apps via
ctrl+c
and restart them withdocker compose up
. -
The app again logs both the attempted publishing and consumption of the three events. The first event successfully reaches the consumer. But with Bufstream's data quality enforcement enabled, the second and third messages are rejected.
-
For the message that reaches the consumer, notice the empty
old_address
in the logs. This field has been redacted by Bufstream as it's labeled with thedebug_redact
option.
By configuring Bufstream with a few data quality and governance policies, you've ensured that consumers receive only well-formed, semantically valid data.
Cleanup
After completing the demo, you can stop Docker and remove credentials from your machine with the following commands: