Skip to content

Quickstart

What is Bufstream?

Bufstream is a fully self-hosted drop-in replacement for Apache Kafka® that writes data to S3-compatible object storage. It’s 100% compatible with the Kafka protocol, including support for exactly-once semantics (EOS) and transactions. Bufstream enforces data quality and governance requirements directly at the broker with Buf’s Advanced Semantic Intelligence™ engine powered by Protovalidate. Data written to S3 by Bufstream is encoded in Parquet format and includes Apache Iceberg™ metadata, reducing Time-to-Insight in popular data lakehouse products like Snowflake or ClickHouse.

In this demonstration, you'll run a Bufstream agent locally, use the franz-go library to publish and consume messages, and explore Bufstream's schema enforcement features.

Requirements

Getting started

Before starting, perform the following steps to prepare your environment to run the demo.

  1. Clone this repo:

    $ git clone https://github.com/bufbuild/bufstream-demo.git
    $ cd bufstream-demo
    
  2. Start downloading Docker images: [optional]

    $ docker compose pull
    

Try out Bufstream

The Bufstream demo application simulates a small application that produces and consumes EmailUpdated event messages. The demo publishes these events to a Bufstream topic, followed by a separate consumer that fetches from the same topic and "verifies" the change. The demo app uses an off-the-shelf Kafka client (franz-go) to interact with Bufstream.

The demo attempts to publish and consume three payloads:

  • A semantically valid, correctly formatted version of the EmailUpdated message.
  • A correctly formatted, but semantically invalid version of the message.
  • A malformed message.

Replace Apache Kafka

  1. Boot up the Bufstream and demo apps:

    $ docker compose up --build
    
  2. The app logs both the publishing of the events on the producer side and shortly after, the consumption of these events. The consumer deserializes the first two messages correctly, while the final message fails with a deserialization error.

At this point, you've used Bufstream as a drop-in replacement for Apache Kafka.

Data quality enforcement

So far, Bufstream hasn't applied any quality enforcement on the data passing through it—addresses can be invalid or malformed. To ensure the quality of data flowing through Bufstream, you can configure policies that require data to conform to a pre-determined schema, pass semantic validation, and even redact sensitive information on the fly.

We'll demonstrate this functionality using the Buf Schema Registry, but Bufstream also works with any registry that implements Confluent's REST API.

  1. Uncomment the data_enforcement block in configs/bufstream.yaml.

  2. Uncomment the --csr command options under the demo service in docker-compose.yaml.

  3. To pick up the configuration changes, terminate the Docker apps via ctrl+c and restart them with docker compose up.

  4. The app again logs both the attempted publishing and consumption of the three events. The first event will successfully reach the consumer. But with Bufstream's data quality enforcement enabled, the second and third messages are rejected.

  5. For the message that reaches the consumer, notice the empty old_address in the logs. This field has been redacted by Bufstream as it's labeled with the debug_redact option.

By configuring Bufstream with a few data quality and governance policies, you've ensured that consumers receive only well-formed, semantically valid data.

Cleanup

After completing the demo, you can stop Docker and remove credentials from your machine with the following commands:

docker compose down --rmi all