Buf Schema Registry (BSR)

Reflection API overview

The Protobuf binary format is compact and efficient, and it has clever features that allow for a wide variety of schema changes to be both backward- and forward-compatible.

However, it is not possible to make meaningful sense of the data without a schema. Not only is it not human-friendly, since all fields are identified by an integer instead of a semantic name, but it also uses a very simple wire format which re-uses various value encoding strategies for different value types. This means it is not even possible to usefully interpret encoded values without a schema — for example, one cannot know (with certainty) if a value is a text string, a binary blob, or a nested message structure.

But there exists a category of systems and use cases where it is necessary or useful to decode the data at runtime, by a process or user agent that does not have prior (compile-time) knowledge of the schemas:

  1. RPC debugging. It is useful for a human to be able to meaningfully interpret/examine/modify RPC requests and responses (with tools like tcpdump, Wireshark, or Charles proxy). But without the schema, these payloads are inscrutable byte sequences.
  2. Persistent store debugging (includes message queues): This is similar to the above use case, but the human is looking at data blobs in a database or durable queue. A key difference between this case and the one above is that it is likely to observe messages produced over a longer period of time, using many versions of the schema as it evolved over time.
  3. Data pipeline schemas and transformations: This is less for human interaction and more for data validation and transformation. A producer may be pushing binary blobs of encoded Protobuf into a queue or publish/subscribe system. The system may want to verify that the blob is actually valid for the expected type of data, which requires a schema. The consumer may need the data in an alternate format; the only way to transform the binary data into an alternate format is to have the schema. Further, the only way to avoid dropping data is to have a version of the schema that is no older than the version used by the publisher. (Otherwise, newly added fields may not be recognized and then silently dropped during a format transformation.)

All of these cases call for a mechanism by which the schema for a particular message type can be downloaded on demand, for interpreting the binary data.

The Buf Reflection API provides exactly that mechanism. It provides a means of downloading the schema for any module in the BSR, and even any specific version of a module.

The Reflection API is currently in beta. It should be considered unstable and possibly impermanent.

In addition to querying for the schema by module name and version, this API also allows the caller to signal what part of the schema in which they are interested, such as a specific message type or a specific service or method. This is used to filter the schema, allowing the client to ignore parts of a module that it does not need. In many cases, the client only needs a small subset of the module's schema (especially for large modules), so this can greatly reduce the amount of data that a client needs to download.

API Usage

The Buf Reflection API can be found in the public BSR: buf.build/bufbuild/reflect (sources are in GitHub). You can see the available generated SDKs for it here.

It contains a single RPC service: buf.reflect.v1beta1.FileDescriptorSetService. This service contains a single endpoint named GetFileDescriptorSet, which is for downloading the schema for a particular module (optionally, at a specific version). The response is in the form of a FileDescriptorSet. You can find reference documentation for all the request and response fields in the BSR.

For the general mechanics of how to use APIs exposed by the BSR, see Invoking the BSR APIs.

The endpoint accepts a module name, in <bsr-domain>/<owner>/<repo> format. For example, buf.build/connectrpc/eliza is the module name for the Eliza service (a demo service for Connect). The domain of the BSR is "buf.build" (the public BSR); the owner is the "connectrpc" organization; and the repo name is "eliza".

Here's an example API request for downloading the buf.build/connectrpc/eliza module:

> POST /buf.reflect.v1beta1.FileDescriptorSetService/GetFileDescriptorSet HTTP/1.1
> Host: buf.build
> Authorization: Bearer <insert-buf-token-here>
> Content-Type: application/json
> Connect-Protocol-Version: 1
> {"module": "buf.build/connectrpc/eliza"}

Assuming a valid BSR token is used in the Authorization header, this will return a FileDescriptorSet that describes the files in the requested module, which describe the Eliza RPC service and all related message types.

The above request does not contain a version field in the request, which means it will return the latest version. This is the same as asking for "version": "main", which also returns the latest version. The version can also refer to a commit, either via the commit name or an associated tag. Or the version can refer to the name of a branch.

These are the same ways one can pin a particular version in the deps section of a buf.yaml file.

See the Overview section on dependencies for more.

Filtering the Schema

The request may also include a field named symbols that is an array of fully-qualified names. If present and non-empty, the returned schema will be pruned to only include the data required to describe the requested symbols. All other content in the module will be omitted. This is particularly useful with large modules, to reduce the amount of schema data that a client needs to download. For example, let's say a client needs the schema for a single service, but it's defined in a large module that defines many services. The request can indicate the name of the service of interest in the symbols field and will get back only what they need and nothing else. Here's an example that returns only the google.longrunning.Operations service from the buf.build/googleapis/googleapis module:

> POST /buf.reflect.v1beta1.FileDescriptorSetService/GetFileDescriptorSet HTTP/1.1
> Host: buf.build
> Authorization: Bearer <insert-buf-token-here>
> Content-Type: application/json
> Connect-Protocol-Version: 1
> {
>    "module": "buf.build/googleapis/googleapis",
>    "version": "75b4300737fb4efca0831636be94e517",
>    "symbols": ["google.longrunning.Operations"]
> }

This currently returns a response that is about 11k. If we leave out the symbols field from the request, the response would be about 10x that size.

Dynamic Messages

Once you have downloaded a set of descriptors, the next step is what to do with it. Having the whole schema allows for building dynamic messages -- which are backed by a descriptor at runtime instead of by generated code.

The general shape of this solution is two-fold:

  1. Convert FileDescriptorProto instances to "rich" data structures that are cross-linked and indexed. This makes it easy to traverse type references in the schema. This process also validates the schema, to make sure it is not missing any necessary elements and is valid per the rules of the Protobuf language.
  2. Use a "rich" descriptor that describes a message to construct a dynamic message. This message acts on most ways like a regular generated message. You can unmarshal message data from an array of bytes or vice versa, marshal the message's data to bytes. You can examine the field values of the message, too. Since it is not a generated type, however, you can't access fields in the normal way since your code doesn't even know what fields the message has at compile-time.

The power of a dynamic message is that it enables an "appliance" that can process message data of arbitrary types in cross-cutting ways. A particularly powerful and common use case is to examine fields and field options to redact sensitive data/PII, convert to JSON, and then store in a data warehouse for use with business intelligence tools. Without a dynamic message, you have to write a bespoke message processor that must be recompiled and re-deployed whenever any of the message definitions are changed. With a dynamic message, you can compile and deploy the service once, but then must provide the service with updated message definitions as they change; that's where the BSR and the Buf Reflect API come in!

To get a sense of how the API can be used to perform functionality described in the above paragraph, take a look at our example client library.

Unfortunately, not all languages/runtimes have support for descriptors and dynamic messages. Here are ones that do, with links to relevant API documentation.

There are other languages (C#, PHP) that include some support for descriptors, but only for runtime reflection; they do not provide dynamic message support. There also may be third-party language runtimes that offer this support.