From Reactive to Proactive: How Clinician Nexus Built a Schema-Driven Data Platform with Buf

Prevention can save a whole lot of time and effort and complexity versus treating it after the fact.

Healthcare data can be messy. When you're dealing with physician compensation data from hundreds of health systems across the country—each with their own regulations, formats, and quirks—the traditional “fix it when it breaks” approach to data becomes a recipe for disaster. This issue becomes even more challenging when working with human-created data, such as from surveys.

Dwight Whitlock, Platform Engineering Lead at Clinician Nexus, learned this firsthand. His team processes survey-based compensation, performance, and job classification data from the majority of the country's health systems, ranging from individual survey submissions with five records to massive datasets with 5 million records. The challenge? Every health system sends data differently—some as CSV files, others as complex Excel spreadsheets with many different sheets, each with inconsistent schemas.

Whitlock compares addressing data quality issues to preventing cancer.

"My sister is a physician and is always telling my parents ways to be healthier and prevent cancer," Whitlock explains. "It's a lot harder to treat cancer than to do things that help prevent it. Streaming systems obviously are not cancer, but they share that common attribute where prevention can save a whole lot of time and effort and complexity versus treating it after the fact."

This preventative medicine approach to data engineering is exactly what drove Clinician Nexus to adopt Bufsteam and the Buf Schema Registry.

Before Buf

Clinician Nexus is committed to delivering high-quality data and continuously improving their data engineering practices. But they saw challenges that most data teams face: Only the engineers who wrote the code could understand how field mappings worked or what validation rules were applied – knowledge that should be accessible to everyone.

Beyond knowledge silos, they saw other areas for improvement. Breaking changes could go undetected until production, validation logic might be scattered across different systems, and their transactional architecture required over-provisioning infrastructure to accommodate highly variable workloads. This is the reality for most data teams—fragmented approaches to schema management that limit effectiveness.

What Clinician Nexus wanted to improve:

Making field mapping and data quality rules visible to business users. "Data quality rules were a black box to our business users," explained Whitlock. For example, teams couldn't understand how fields and schemas map to a header name without digging through engineering code. If analysts wanted to see changes over time, “they'd have to submit a ticket to a DBA and it would take however long to get an answer back."

Catching breaking changes before they reach production. Clinician Nexus’ first iteration on Protobuf was a monorepo with no breaking change detection. Like most organizations, schemas could appear at runtime without proper vetting, where incompatible changes can get deployed without anyone knowing until systems break.

Centralizing and clarifying validation logic. "Previously, if we were only applying data quality rules within the lakehouse, then our application team didn't understand what was being applied." Teams struggled with the smaller menial tasks of day-to-day data engineering – things like sorting out why does the schema look this way? What's the context behind it?

Streamlining their architecture to eliminate bottlenecks, optimize provisioning, and strengthen data quality guarantees. Before streaming, Clinician Nexus’ survey ingestion used a traditional transactional application and database. Even after moving to streaming with Amazon MSK, their setup was still "prone to poison pill messages – both of bad schema and bad data quality." To support traffic spikes they had to overprovision the cluster, even though it often sat idle.

We really liked the breaking change detection, the linting and formatting, and everything that that ecosystem (Buf) offers.

Enter Buf

When Whitlock joined Clinician Nexus in December 2022, his instructions were clear: "We want to build a data mesh. We're probably going to be within Databricks, and we also want to use Kafka. And our application teams are going to be using Protobuf with gRPC services. If you can make everything play nicely with each other, that'd be great."

That's when they discovered Buf. “We were glad we found Buf pretty early on because we were able to get started with breaking change detection before we made it to production,” said Whitlock.

The world has moved to Protobuf in the last decade, and Buf's platform for schema-driven development was exactly what Clinician Nexus needed. The Buf CLI's breaking change detection and linting capabilities immediately addressed their highest-priority goals. Moving beyond the CLI, using Buf tools throughout their entire stack is shifting Clinician Nexus from reactive problem-solving to proactive governance:

The Buf Schema Registry (BSR) gave all stakeholders direct access to field mapping and data quality rules. "With a schema registry that's integrated with our version control, we have one place with really good rendered documentation, that our engineers and our business users can go and understand why a schema is the way it is, the history of changes to it, who made those changes," explained Whitlock. “Now it's a whole lot easier for people to find the answer to their own problems."
Bufs breaking change detection prevented production issues through build-time schema governance. "We really liked the breaking change detection, the linting and formatting, and everything that that ecosystem offers," said Whitlock. “Any unexpected issue, which can happen a lot when dealing with schemas across different teams, is especially painful for us. When we catch those issues in our lower environment or in pull requests, that's a huge time savings."
Protovalidate centralized their validation rules in schemas. "We're using Protovalidate within the context of our streaming pipeline… all producers and consumers know the quality contract." Protovalidate puts validation rules directly in the schema where everyone can see them. No more hunting through code to figure out what's supposed to be valid—the contract is right there in the schema definition.
Bufstream provides broker-side schema awareness and cost savings. Unlike traditional Kafka—where poison pill messages can break everything downstream—Bufstream's semantic validation validates the contents of every message, blocking bad messages before they cause problems. Beyond data quality improvements, Bufstream eliminates over-provisioning by scaling in response to demand rather than requiring always-on infrastructure.

When we catch those issues in our lower environment or in pull requests, that's a huge time savings.

Saving money on infrastructure… while improving data quality and analytics

The team processes about a terabyte of data per year in massive, unpredictable bursts. With Bufstream's autoscaling, they can now quickly scale up to handle these bursts, then automatically reduce capacity (and therefore costs) when traffic decreases.

Clinician Nexus is particularly excited about Bufstream's direct-to-Iceberg feature, which eliminates the need for pipelines that read from Kafka and write to tables. Instead, the broker handles this continuously in the background.

Breaking down communication barriers

Perhaps the most underappreciated benefit of their Buf implementation is how the BSR facilitates communication between technical and non-technical teams. "It's a communication facilitator that automates the manual work that makes developers not want to communicate," Whitlock notes.

Before the BSR, when business users wanted to understand schema changes or history, Clinician Nexus had to submit tickets to a DBA and wait for responses. Now, business users are added as code owners to relevant schemas in GitHub. When an engineer tries to delete a field, the business user gets a notification and can prevent the change—without needing to understand the underlying technology.

The BSR provides a user-friendly interface for discovering schemas, understanding rules, and viewing change history. These descriptions come straight from the comments added to messages and fields in the schema, keeping the schema as the single source of truth, visible to all and propagating all the way to the Unity Catalog in Databricks.

Words of advice

When asked what he'd tell someone considering Buf, Whitlock cut to the core:

“I’d start by asking if they’ve ever built against a REST/JSON API—then ask how often it’s broken their pipeline. Have they ever tried generating a Swagger client from outdated docs and keeping that in sync across a team? That’s usually when people start nodding—or getting angry. It’s painful. That lack of strong typing and reliable version control creates real fragility.”

Clinician Nexus demonstrates the advantages of applying a schema-first approach across the entire stack. Backed by Buf technologies, the company has solved data quality issues, saved time and costs and, most importantly, shifted from reactive fire-fighting to proactive governance. In healthcare data processing, where accuracy impacts critical business decisions, that shift isn't just convenient—it's essential.

Customer Overview

Website:

cliniciannexus.com

Industry:

Healthcare

Buf products:

Buf CLI

Buf GitHub Action

Buf Schema Registry (BSR)

Bufstream

Protovalidate

Buf in action:

Stops breaking changes before they reach production
Validates each message against schema-driven validation rules
Serves as a single source of truth for data quality
Bufstream quickly scales up to handle bursts, then automatically reduces capacity (and costs) when traffic decreases.