Stanislav Kozlovski's deep dive into the Kafka creation story appeared last weekend. It detailed how LinkedIn solved data problems with Kafka and also revealed three remaining challenges—schema evolution, plug-and-play integration, and ownership/governance—that had to be solved with additional tools and review processes.
At Buf, we've seen these same problems, and solved them with the Buf CLI, Bufstream, and the Buf Schema Registry.
LinkedIn moved from XML to Avro and developed "an exact compatibility model to allow us to programmatically check schema changes for backwards compatibility against existing production schemas," a tool run at compile time and on schema registration.
That sounds a lot like the Buf CLI's breaking change detection, but instead of running during production schema registration, Buf shifts enforcement left, all the way to your editor and development environment. Run buf breaking
before any commit, and it'll tell you if any change impacts backwards compatibility—no "compatibility model" maintenance required.
LinkedIn created a system where schema-based data flowed automatically to their data warehouse, but there was no quality enforcement within Kafka itself. Because Kafka validation is client-side, it’s easy to accidentally make incompatible changes in application code, breaking downstream consumers.
Bufstream fixes this. Built from the ground up to understand the shape of data traversing its topics, broker-side schema awareness brings interesting capabilities. Chief among these is Bufstream's ability to block bad data from entering topics in the first place, running schema enforcement and semantic validation server-side (on the broker). If a record doesn't pass validation, the entire batch is rejected or the offending record is sent to a DLQ. Consumers can rely on data within Bufstream topics always matching the topic's schema and your semantic constraints. With Iceberg integration, your Iceberg tables are your Kafka storage—only valid data lands in your data lake.
LinkedIn moved governance discussion between application, processing, and analytics teams into a mandatory system that's "built into the automatic code review process".
Again, that sounds a lot like Buf! The Buf Schema Registry scales schema governance from laptops to entire organizations, where its GitHub Actions and CI/CD integration provide breaking change detection, linting, and custom policy enforcement within your existing pull requests and review processes. Engineers commit code normally, and the BSR catches breaking changes at build-time, before they reach production.
LinkedIn achieved something remarkable: "A single engineer was able to implement and maintain the process that does data loads for all topics; no incremental work or co-ordination is needed as teams add new topics or change schemas." They proved schema evolution, integration, and governance were solvable problems, and we agree.
At Buf, we've built the tools that complete LinkedIn's original vision. Get in touch to talk about your use case or check out one of our on-demand workshops:
Let's help you finish what Kafka started.