What are Images?

Throughout the documentation, you will see many references to Images. We'll go over what Images are, how they are used, and the various options associated with them here.

Protobuf plugins: how they work

First we need to provide a short overview of how plugins work.

When you invoke the following command:

protoc -I . --go_out=gen/go foo.proto

The following is (roughly) what happens:

  • protoc compiles the file foo.proto (and any imports) and internally produces a FileDescriptorSet, which is just a list of FileDescriptorProto messages. These messages contain all information about your .proto files, including optionally source code information such as the start/end line/column of each element of your .proto file, as well as associated comments.
  • The FileDescriptorSet is turned into a CodeGeneratorRequest, which contains the FileDescriptorProtos that protoc produced for foo.proto and any imports, a list of the files specified (just foo.proto in this example), as well as any options provided after the = sign of --go_out or with --go_opt.
  • protoc then looks for a binary named protoc-gen-go, and invokes it, giving the serialized CodeGeneratorRequest as stdin.
  • protoc-gen-go runs, and either errors or produces a CodeGeneratorResponse, which specifies what files are to be generated and their content. The serialized CodeGeneratorRequest is written to stdout of protoc-gen-go.
  • On success of protoc-gen-go, protoc reads stdout and then writes these generated files.

The builtin generators to protoc, i.e. --java_out, --cpp_out, etc, work in roughly the same manner, although instead of executing an external binary, this is done internally to protoc.

FileDescriptorSets are the primitive used throughout the Protobuf ecosystem to represent a compiled Protobuf schema. They are also the primary artifact that protoc produces.

That is to say that everything you do with protoc, and any plugins you use, talk in terms of FileDescriptorSets. Of note, they are how gRPC Reflection works under the hood as well.

How do I create FileDescriptorSets with protoc?

protoc provides the --descriptor_set_out flag, aliased as -o, to allow writing serialized FileDescriptorSets. For example, given a single file foo.proto, you can write a FileDescriptorSet to stdout as follows:

protoc -I . -o /dev/stdout foo.proto

The resulting FileDescriptorSet will contain a single FileDescriptorProto with name foo.proto.

By default, FileDescriptorSets will not include any imports not specified on the command line, and will not include source code information. Source code information is useful for generating documentation inside your generated stubs, and for things like linters and breaking change detectors. As an example, assume foo.proto imports bar.proto. To produce a FileDescriptorSet that includes both foo.proto and bar.proto, as well as source code information:

protoc -I . --include_imports --include_source_info -o /dev/stdout foo.proto

What are Images then?

An Image is Buf's custom extension to FileDescriptorSets. The actual definition is currently stored in bufbuild/buf as of this writing.

Images are FileDescriptorSets, and FileDescriptorSets are Images. Due to the forwards and backwards compatible nature of Protobuf, we're able to add an additional field to FileDescriptorSet while maintaining compatibility in both directions - existing Protobuf plugins will just drop this field, and Buf does not require this field to be set to work with Images.

message Image {
// file matches the file field of a FileDescriptorSet.
repeated google.protobuf.FileDescriptorProto file = 1;
// bufbuild_image_extension is the ImageExtension for this image.
//
// The prefixed name and high tag value is used to all but guarantee there
// will never be any conflict with Google's FileDescriptorSet definition.
// The definition of a FileDescriptorSet has not changed in 11 years, so
// we're not too worried about a conflict here.
optional ImageExtension bufbuild_image_extension = 8042;
}

Images are the primitive of Buf. As a result, FileDescriptorSets are also the primitive of Buf.

Linting and breaking change detection internally operate on Images that Buf either produces on the fly, or reads from an external location. They represent a stable, widely-used method to represent a compiled Protobuf schema. For the breaking change detector, Images are the storage format used if you want to manually store the state of your Protobuf schema. See the breaking change documentation for more details.

We use the ImageExtension of an Image to store additional information that is useful to Buf to perform it's operations. Currently, the only additional information stored is the indexes within the file array of the FileDescriptorProtos that are imports.

// ImageExtension contains extensions to Images.
//
// The fields are not included directly on the Image so that we can both
// detect if extensions exist, which signifies this was created by buf
// and not by protoc, and so that we can add fields in a freeform manner
// without worrying about conflicts with google.protobuf.FileDescriptorSet.
message ImageExtension {
// image_import_refs are the image import references for this specific Image.
//
// A given FileDescriptorProto may or may not be an import depending on
// the image context, so this information is not stored on each FileDescriptorProto.
repeated ImageImportRef image_import_refs = 1;
}
// ImageImportRef is a reference to an image import.
//
// This is a message type instead of a scalar type so that we can add
// additional information about an import reference in the future, such as
// the external location of the import.
message ImageImportRef {
// file_index is the index within the Image file array of the import.
//
// This signifies that file[file_index] is an import.
// This field must be set.
optional uint32 file_index = 1;
}

Right now, the only possible imports are the Well-Known Types. All other files are specified through your build configuration, but it is always possible to include the Well-Known Types in your .proto files with Buf, and is usually possible to include the Well-Known Types with protoc in a standard installation. It's widely accepted that a Protobuf compiler should always provide these.

Currently, we use this information in the linter and breaking change detector. For the linter, we do not want to lint imports - they are not part of your Protobuf schema that you care about for linting. The linter filters any imports before running the lint checkers. If the ImageExtension field is not present, Buf cannot deduce what FileDescriptorProtos are imports, and lints everything.

For the breaking change detector, we check imports by default, however you can exclude imports with the --exclude-imports flag. As with the linter, if the ImageExtension field is not present, Buf does not know what an import is, so --exclude-imports is a no-op.

Creating images

Images are created using buf image build. Given that you are in the root of your repository, and you have a proper configuration:

$ buf image build -o image.bin

The resulting image is written to the file image.bin. Of note, the ordering of the FileDescriptorProtos is carefully written to mimic the ordering that protoc would produce, for both the cases where imports are and are not written.

By default, Buf produces an Image with both imports and source code info. You can strip each of these:

$ buf image build --exclude-imports --exclude-source-info -o image.bin

In general, we do not recommend stripping these, as this information can be useful for various operations. However, source code info specifically takes a lot of additional space, generally in the region of 5x as much space, so if you know you do not need this data, it can be useful to strip source code info.

Images can be outputted in one of two formats:

  • Binary
  • JSON

Either format can be compressed using Gzip or Zstandard.

Per the Inputs documentation, buf image build can deduce the format by the file extension:

$ buf image build -o image.bin
$ buf image build -o image.bin.gz
$ buf image build -o image.bin.zst
$ buf image build -o image.json
$ buf image build -o image.json.gz
$ buf image build -o image.json.zst

The special value - is used to denote stdout. You can manually set the format. For example:

$ buf image build -o -#format=json

When combined with jq, this also allows for introspection. For example, to see a list of all packages:

$ buf image build -o -#format=json | jq '.file[] | .package' | sort | uniq | head
"google.actions.type"
"google.ads.admob.v1"
"google.ads.googleads.v1.common"
"google.ads.googleads.v1.enums"
"google.ads.googleads.v1.errors"
"google.ads.googleads.v1.resources"
"google.ads.googleads.v1.services"
"google.ads.googleads.v2.common"
"google.ads.googleads.v2.enums"
"google.ads.googleads.v2.errors"

Images always include the ImageExtension field. However, if you want a pure FileDescriptorSet without this field set, to mimic protoc entirely:

$ buf image build -o image.bin --as-file-descriptor-set

The ImageExtension field will not affect Protobuf plugins or any other operations, they will merely see this as an unknown field. However, we provide the option in case you want it.

Using protoc output as Buf input

Since Buf's primitive is the Image, and FileDescriptorSets are Images, we're able to easily allow protoc output to be buf input. As an example for lint:

$ protoc -I . --include_source_info -o /dev/stdout foo.proto | buf check lint --input -

We discuss this further in the relevant sections of our documentation.

Protoc lint and breaking change detection plugins

Since Buf talks in terms of FileDescriptorSets, it's trivial for us to provide the Protobuf plugins protoc-gen-buf-check-lint and protoc-gen-buf-check-breaking as well.