Skip to content

Export reference

This page contains technical reference information about Bufstream's Iceberg Export (continuous export) integration. For the zero-copy archives mode, see the Iceberg Archives reference.

Export data flow#

Bufstream's Iceberg Export continuously reads committed records from Kafka topic-partitions and writes them into a separate Iceberg table. Like the Archives mode, Bufstream queries the configured schema registry at the start of each export job to fetch the latest message schema, uses it to compute an Iceberg schema, and stores the schema state in the metadata store to ensure proper re-use of allocated field IDs.

Exports are committed as Iceberg snapshots on a configurable schedule (bufstream.export.iceberg.commit.freq.ms). After Bufstream starts exporting a topic, the first snapshot appears within one commit interval of the first record produced to the topic.

The exported table is independent of the Kafka topic's retention policy. Data in the exported table is not removed when records expire from the topic.

No backfill#

Bufstream does not backfill historical data into the export table. Only records produced to the topic after export is configured will be written to the table. If export is disabled and then re-enabled, records processed during the gap will not be exported.

Table management#

Unlike Iceberg Archives, the exported table is a standalone copy of the data and is compatible with managed Iceberg catalogs. The catalog may perform table maintenance (compaction, re-partitioning) without any conflict with Bufstream.

Bufstream still manages the table schema, keeping it in sync with the Protobuf schema in your schema registry. It's advisable to let Bufstream create the table; do not create it manually first.

Invalid records#

Records that cannot be parsed according to the schema are skipped and not written to the exported table. This differs from Iceberg Archives, which must store all records—including invalid ones—to preserve Kafka consumer semantics.

Table schema#

Because the exported Parquet files are a copy of the data intended for analytics use only, the schema omits Kafka-specific metadata fields that are required in the Archives table for reconstructing the stream for consumers.

The following sections describe the fields of the Iceberg Export table schema.

key#

struct

Represents the key in the original published record.

Because Protobuf schemas are not supported for record keys, the key struct contains only a __raw__ field holding the raw key bytes.

val#

struct

Represents the value in the original published record.

When a Protobuf message schema is associated with the value, additional fields are present that mirror the structure of the message. Unknown Protobuf fields encountered during deserialization are silently dropped.

Unlike Iceberg Archives, the val struct does not include __prefix__ or __err__ fields. In the Archives (zero-copy) mode, these fields are required to preserve full Kafka consumer semantics: every record—including malformed ones—must be stored and reproducible, even if the producer and consumer are using different schema versions. In Export mode, records that cannot be parsed are simply skipped (see Invalid records), so there is no need to store error information in the table.

If no Protobuf schema is associated with the value, the val struct contains only a __raw__ field holding the raw value bytes.

headers#

list of structs

Key-value pairs that Kafka producers attach to records.

key#

string

value#

bytes

timestamp#

timestamp

A timestamp (defined in microseconds) associated with the record. By default this is the record's event timestamp, as set by the Kafka producer. Set bufstream.export.iceberg.use.ingest.time to TRUE to use the ingestion timestamp instead.

Mapping Protobuf schemas to Iceberg schemas#

Export uses the same Protobuf-to-Iceberg type mapping as Archives. See Mapping Protobuf schemas to Iceberg schemas in the Archives reference. Note that Iceberg structs in Export tables do not include a __unrecognized__ field; that synthesized field is Archives-only.

Recursive types#

Iceberg schemas can't represent recursive types. When generating the Iceberg schema for a recursive Protobuf type, the message field that's the point of recursion is treated as if it were an empty message with no fields. All record data for that field is dropped.

Compacted topics#

Bufstream doesn't currently support using Iceberg Export for compacted topics.

Apache Avro™ and JSON#

If a topic uses Protobuf as its message data format, Bufstream ensures that the Iceberg schema mirrors the Protobuf message schema. For other message formats such as Avro and JSON, Bufstream represents the message keys and values as opaque binary columns.