Buf Announces Support for Unity Catalog Managed Iceberg Tables

At Buf, we're building toward a different future—one where a single schema definition in Protobuf can govern your entire data flow, from API contracts to streaming topics to analytical tables. This universal schema approach doesn't just reduce complexity; it transforms data governance by ensuring that data producers' domain knowledge and validation rules flow seamlessly to every downstream consumer.

Consider the typical data journey in today's lakehouse architectures. Raw data enters through APIs and streaming platforms, often semi-structured and loosely validated. Data engineers then spend significant time in bronze and silver layers of medallion architectures, cleaning and structuring this data before it becomes analytics-ready. But what if clean, well-structured data could land directly in your analytical tables from the start?

Today, we're announcing a major step toward this vision. Bufstream now supports Unity Catalog Managed Iceberg Tables in private preview, bringing together Buf's schema-first approach with Databricks' industry-leading data governance and optimization capabilities.

What are Managed Iceberg Tables?

Managed Iceberg tables are Databricks' latest advancement in the open lakehouse architecture These tables will fundamentally change how organizations think about data lake table management by bringing the simplicity of managed database tables to the open, interoperable world of Apache Iceberg.

The key innovation lies in Unity Catalog taking full ownership of the Iceberg table lifecycle. Unlike traditional external Iceberg tables where you manage your own catalog and optimization processes, Managed Iceberg Tables delegate these responsibilities entirely to Databricks' platform. This means Unity Catalog becomes your Iceberg REST catalog, handling not just metadata management but also intelligent optimization, governance, and maintenance operations.

What makes this particularly powerful is the dual-access model: while Unity Catalog manages the tables, they remain fully accessible to any Iceberg-compatible engine through standard REST APIs. This eliminates the traditional trade-off between managed convenience and open access. Your Spark jobs, Trino queries, Flink streams, and other Iceberg clients can all read and write to these tables seamlessly, while benefiting from Databricks' enterprise-grade optimization and governance capabilities.

The managed aspect extends beyond basic operations to include Databricks' AI-powered Predictive Optimization, which will automatically optimize file sizes, clustering, and enforce retention policies based on actual usage patterns. This represents a significant operational advantage over self-managed Iceberg deployments, where teams must manually tune these parameters and maintain optimization schedules.

Perhaps most importantly, Managed Iceberg Tables integrate natively with Unity Catalog's fine-grained governance model, providing consistent security policies, data lineage, and access controls across both Databricks and external engines. This unified governance layer addresses one of the biggest challenges in multi-engine data architectures: maintaining consistent security and compliance policies across different compute platforms.

Bufstream + Databricks: Schema-Driven Data Excellence

The combination of Bufstream and Databricks Managed Iceberg Tables creates an unprecedented data pipeline that ensures data quality from source to analytics:

Schema-First Data Ingestion: With Bufstream's semantic validation at ingest, your Protobuf schemas become enforceable contracts. Data that doesn't match your schema never enters your system, preventing downstream quality issues before they start.
Real-Time Schema Evolution: As your schemas evolve, Bufstream's direct writes to Iceberg automatically handle table schema updates, ensuring your analytical tables stay in sync with your operational data models without manual intervention.
End-to-End Data Lineage: Combine Bufstream's schema-based lineage with Unity Catalog's automated tracking to get complete visibility into how data flows from your APIs and Kafka topics all the way to your analytical queries.
Operational Simplicity: Define your data contracts once in Protobuf. Bufstream handles the streaming ingestion and validation. Databricks manages the storage optimization. Your data teams can focus on generating insights instead of maintaining infrastructure.

Getting Started

This integration represents more than just technical compatibility—it's a fundamental shift toward treating schemas as the foundation of your entire data architecture. When your API schemas, streaming schemas, and table schemas are unified, data governance becomes automatic, data quality improves dramatically, and your teams can move faster with confidence.

Ready to experience schema-driven data excellence? If you'd like to see your Protobuf messages automatically appear as optimized, governed tables within Databricks, contact us to join the private preview.

‍

In this post

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Buf Announces Support for Unity Catalog Managed Iceberg Tables

What are Managed Iceberg Tables?

Bufstream + Databricks: Schema-Driven Data Excellence

Getting Started

Product

Company