Skip to content

Broker-side semantic validation#

To enable validation, including Protovalidate rules, add a schema provider and update topic or broker-level configuration.

TL;DR#

If you're using the Buf Schema Registry or developing locally, enabling validation is a two-step process:

  1. Configure a schema provider. Don't forget to set topic configurations like buf.registry.value.schema.module and buf.registry.value.schema.message.
  2. Configure a validation mode for the topic: reject sends errors to producers, and dlq sends invalid messages to a dead-letter queue (DLQ):

    bufstream kafka config topic set --topic demo --name bufstream.validate.mode --value reject
    

Overview#

More than just a drop-in Kafka replacement, Bufstream's built from the ground up to understand the shape of the data traversing its topics. We call this broker-side schema awareness, and it brings some interesting capabilities. Chief among these is broker-side semantic validation—its ability to block bad data from entering topics in the first place.

Broker-side enforcement#

Rather than trusting clients to send valid messages and manage their own schemas, Bufstream ensures messages match a known schema and pass their semantic validation rules. This guarantees consumers always receive well-formed data, eliminating a large class of data outages.

This feature works with binary-encoded Protobuf messages, the Buf Schema Registry, local Buf workspaces, or any Confluent API-compatible schema registry.

Semantic validation#

Fields in a schema are more than just data types. They have semantic properties:

  • They can be optional or required.
  • Fields like email addresses aren't just strings—they must match a specific format.
  • Numbers, like a user's age, may be a uint32, but you can safely assume a new user can't be more than 150 years old.

That's why Bufstream goes beyond simple Confluent schema ID validation (making sure messages use a schema ID) and schema validation (type-checking message data against its schema) to provide semantic validation, deeply inspecting messages to ensure they meet Protovalidate rules in their schema.

With Protovalidate, you can verify that a user's email is a valid email address, that an age is provided and within a sensible range, and even use a CEL expression to state that first_name should only be set if a last_name is provided:

Semantic validation for a User
message User {
    option (buf.validate.message).cel = {
        id:"first_name.requires.last_name",
        message: "If a first name is provided, a last name must also be provided.",
        expression: "this.first_name.size() == 0 || this.last_name.size() > 0"
    };

    string email = 1 [
        (buf.validate.field).string.email = true
    ];
    uint32 age = 2 [
        (buf.validate.field).required = true,
        (buf.validate.field).uint32.gte = 0,
        (buf.validate.field).uint32.lte = 150
    ];
    string last_name = 3;
    string first_name = 4;
}

Using Bufstream's broker-side semantic validation means you can rely on topic data to meet your business rules and domain logic, eliminating an entire class of data errors before they materialize in downstream systems.

Enabling semantic validation#

Add a schema provider#

Start by making sure you've configured a schema provider: a Buf Schema Registry, your local development environment, or a Confluent Schema Registry API.

Don't forget to configure a schema provider and set topic configurations.

Set a validation mode#

When a producer sends a batch containing an invalid message, Bufstream can do one of two things: reject the batch, sending an error to its producer, or send the invalid messages to a dead-letter queue (DLQ) topic.

Configuring a validation mode requires setting topic or broker-level configurations. Bufstream supports reading and updating topic configuration values from any Kafka API-compatible tool, including browser-based interfaces like AKHQ and Redpanda Console.

Rejecting messages#

To reject batches of messages that don't pass validation rules, set bufstream.validate.mode to reject. This configuration is available at the topic or broker level:

Configure a topic to reject invalid messages
bufstream kafka config topic set --topic demo --name bufstream.validate.mode --value reject

Using a dead-letter queue#

A dead-letter queue (DLQ) is a separate topic for messages that fail validation, letting you inspect or reprocess them without blocking valid data.

If you'd like invalid messages to be sent to a DLQ, set bufstream.validate.mode to dlq. This automatically creates a topic of the same name with .dlq appended. To control this topic's configuration, manually create and configure it like any other topic.

Configure a topic to use a DLQ for invalid messages
bufstream kafka config topic set --topic demo --name bufstream.validate.mode --value dlq

Customize the DLQ topic by setting buf.validate.dlq.topic:

Use a custom DLQ topic name
bufstream kafka config topic set --topic demo --name buf.validate.dlq.topic --value email-updated-dlq

Messages sent to dead-letter queues are wrapped in a Record message, wrapping the key and value while preserving metadata like topic name, partition, and headers.