Broker-side semantic validation#
To enable validation, including Protovalidate rules, add a schema provider and update topic or broker-level configuration.
TL;DR#
If you're using the Buf Schema Registry or developing locally, enabling validation is a two-step process:
- Configure a schema provider. Don't forget to set topic configurations like
buf.registry.value.schema.moduleandbuf.registry.value.schema.message. -
Configure a validation mode for the topic:
rejectsends errors to producers, anddlqsends invalid messages to a dead-letter queue (DLQ):
Overview#
More than just a drop-in Kafka replacement, Bufstream's built from the ground up to understand the shape of the data traversing its topics. We call this broker-side schema awareness, and it brings some interesting capabilities. Chief among these is broker-side semantic validation—its ability to block bad data from entering topics in the first place.
Broker-side enforcement#
Rather than trusting clients to send valid messages and manage their own schemas, Bufstream ensures messages match a known schema and pass their semantic validation rules. This guarantees consumers always receive well-formed data, eliminating a large class of data outages.
This feature works with binary-encoded Protobuf messages, the Buf Schema Registry, local Buf workspaces, or any Confluent API-compatible schema registry.
Semantic validation#
Fields in a schema are more than just data types. They have semantic properties:
- They can be optional or required.
- Fields like email addresses aren't just
strings—they must match a specific format. - Numbers, like a user's age, may be a
uint32, but you can safely assume a new user can't be more than 150 years old.
That's why Bufstream goes beyond simple Confluent schema ID validation (making sure messages use a schema ID) and schema validation (type-checking message data against its schema) to provide semantic validation, deeply inspecting messages to ensure they meet Protovalidate rules in their schema.
With Protovalidate, you can verify that a user's email is a valid email address, that an age is provided and within a sensible range, and even use a CEL expression to state that first_name should only be set if a last_name is provided:
message User {
option (buf.validate.message).cel = {
id:"first_name.requires.last_name",
message: "If a first name is provided, a last name must also be provided.",
expression: "this.first_name.size() == 0 || this.last_name.size() > 0"
};
string email = 1 [
(buf.validate.field).string.email = true
];
uint32 age = 2 [
(buf.validate.field).required = true,
(buf.validate.field).uint32.gte = 0,
(buf.validate.field).uint32.lte = 150
];
string last_name = 3;
string first_name = 4;
}
Using Bufstream's broker-side semantic validation means you can rely on topic data to meet your business rules and domain logic, eliminating an entire class of data errors before they materialize in downstream systems.
Enabling semantic validation#
Add a schema provider#
Start by making sure you've configured a schema provider: a Buf Schema Registry, your local development environment, or a Confluent Schema Registry API.
Don't forget to configure a schema provider and set topic configurations.
Set a validation mode#
When a producer sends a batch containing an invalid message, Bufstream can do one of two things: reject the batch, sending an error to its producer, or send the invalid messages to a dead-letter queue (DLQ) topic.
Configuring a validation mode requires setting topic or broker-level configurations. Bufstream supports reading and updating topic configuration values from any Kafka API-compatible tool, including browser-based interfaces like AKHQ and Redpanda Console.
Rejecting messages#
To reject batches of messages that don't pass validation rules, set bufstream.validate.mode to reject. This configuration is available at the topic or broker level:
bufstream kafka config topic set --topic demo --name bufstream.validate.mode --value reject
Using a dead-letter queue#
A dead-letter queue (DLQ) is a separate topic for messages that fail validation, letting you inspect or reprocess them without blocking valid data.
If you'd like invalid messages to be sent to a DLQ, set bufstream.validate.mode to dlq.
This automatically creates a topic of the same name with .dlq appended.
To control this topic's configuration, manually create and configure it like any other topic.
bufstream kafka config topic set --topic demo --name bufstream.validate.mode --value dlq
Customize the DLQ topic by setting buf.validate.dlq.topic:
bufstream kafka config topic set --topic demo --name buf.validate.dlq.topic --value email-updated-dlq
Messages sent to dead-letter queues are wrapped in a Record message, wrapping the key and value while preserving metadata like topic name, partition, and headers.