Prevention can save a whole lot of time and effort and complexity versus treating it after the fact.
Healthcare data can be messy. When you're dealing with physician compensation data from hundreds of health systems across the country—each with their own regulations, formats, and quirks—the traditional “fix it when it breaks” approach to data becomes a recipe for disaster. This issue becomes even more challenging when working with human-created data, such as from surveys.
Dwight Whitlock, Platform Engineering Lead at Clinician Nexus, learned this firsthand. His team processes survey-based compensation, performance, and job classification data from the majority of the country's health systems, ranging from individual survey submissions with five records to massive datasets with 5 million records. The challenge? Every health system sends data differently—some as CSV files, others as complex Excel spreadsheets with many different sheets, each with inconsistent schemas.
Whitlock compares addressing data quality issues to preventing cancer.
"My sister is a physician and is always telling my parents ways to be healthier and prevent cancer," Whitlock explains. "It's a lot harder to treat cancer than to do things that help prevent it. Streaming systems obviously are not cancer, but they share that common attribute where prevention can save a whole lot of time and effort and complexity versus treating it after the fact."
This preventative medicine approach to data engineering is exactly what drove Clinician Nexus to adopt Bufsteam and the Buf Schema Registry.
Clinician Nexus is committed to delivering high-quality data and continuously improving their data engineering practices. But they saw challenges that most data teams face: Only the engineers who wrote the code could understand how field mappings worked or what validation rules were applied – knowledge that should be accessible to everyone.
Beyond knowledge silos, they saw other areas for improvement. Breaking changes could go undetected until production, validation logic might be scattered across different systems, and their transactional architecture required over-provisioning infrastructure to accommodate highly variable workloads. This is the reality for most data teams—fragmented approaches to schema management that limit effectiveness.
What Clinician Nexus wanted to improve:
We really liked the breaking change detection, the linting and formatting, and everything that that ecosystem (Buf) offers.
When Whitlock joined Clinician Nexus in December 2022, his instructions were clear: "We want to build a data mesh. We're probably going to be within Databricks, and we also want to use Kafka. And our application teams are going to be using Protobuf with gRPC services. If you can make everything play nicely with each other, that'd be great."
That's when they discovered Buf. “We were glad we found Buf pretty early on because we were able to get started with breaking change detection before we made it to production,” said Whitlock.
The world has moved to Protobuf in the last decade, and Buf's platform for schema-driven development was exactly what Clinician Nexus needed. The Buf CLI's breaking change detection and linting capabilities immediately addressed their highest-priority goals. Moving beyond the CLI, using Buf tools throughout their entire stack is shifting Clinician Nexus from reactive problem-solving to proactive governance:
When we catch those issues in our lower environment or in pull requests, that's a huge time savings.
The team processes about a terabyte of data per year in massive, unpredictable bursts. With Bufstream's autoscaling, they can now quickly scale up to handle these bursts, then automatically reduce capacity (and therefore costs) when traffic decreases.
Clinician Nexus is particularly excited about Bufstream's direct-to-Iceberg feature, which eliminates the need for pipelines that read from Kafka and write to tables. Instead, the broker handles this continuously in the background.
Perhaps the most underappreciated benefit of their Buf implementation is how the BSR facilitates communication between technical and non-technical teams. "It's a communication facilitator that automates the manual work that makes developers not want to communicate," Whitlock notes.
Before the BSR, when business users wanted to understand schema changes or history, Clinician Nexus had to submit tickets to a DBA and wait for responses. Now, business users are added as code owners to relevant schemas in GitHub. When an engineer tries to delete a field, the business user gets a notification and can prevent the change—without needing to understand the underlying technology.
The BSR provides a user-friendly interface for discovering schemas, understanding rules, and viewing change history. These descriptions come straight from the comments added to messages and fields in the schema, keeping the schema as the single source of truth, visible to all and propagating all the way to the Unity Catalog in Databricks.
When asked what he'd tell someone considering Buf, Whitlock cut to the core:
“I’d start by asking if they’ve ever built against a REST/JSON API—then ask how often it’s broken their pipeline. Have they ever tried generating a Swagger client from outdated docs and keeping that in sync across a team? That’s usually when people start nodding—or getting angry. It’s painful. That lack of strong typing and reliable version control creates real fragility.”
Clinician Nexus demonstrates the advantages of applying a schema-first approach across the entire stack. Backed by Buf technologies, the company has solved data quality issues, saved time and costs and, most importantly, shifted from reactive fire-fighting to proactive governance. In healthcare data processing, where accuracy impacts critical business decisions, that shift isn't just convenient—it's essential.