The Protobuf Language Specification

Josh Humphries on Sep 12, 2022/3 min read

At Buf, our goal is to improve the way software systems integrate by making schema-driven development a "pit of success". And we've put our money on Protobuf as the winning way to describe those schemas. We are expanding on the work of the Protobuf team by providing the community a complete language spec.

Protobuf is the most stable and widely adopted IDL today. By building on Protobuf, we are standing on the shoulders of giants, those who have built and battle-tested it, and brought it to its current mature state. The official documentation, Google's developer site, is a great source of reference material. However, it does not contain a complete and thorough spec for the language. There are pages on the site that provide specs for the proto2 and proto3 syntax, but they are incomplete and sometimes inaccurate.

In an unrelated discussion about Go being "Google's Language", Ian Lance Taylor (one of the senior members of the Go team) wrote the following:

A programming language is a type of shared software infrastructure. It's most useful when everybody is using the same language, so code written by person A can be reused by person B. That means that programming languages are most useful when we all agree on exactly what the language is. All successful languages have either a single specification or a single primary implementation. (Go and C++ are examples of language based on a specification; Perl, at least before Perl 6, is an example of a language based on an implementation). These serve as the definition of what the language is: whatever the specification says or whatever the implementation does.

Protobuf is solidly in the latter camp: it has a single primary implementation in the form of the compiler, protoc. The ecosystem around Protobuf has been unfortunately held back by this. There are a multitude of tools and libraries that try to parse Protobuf source. But most of them are based on the incomplete specs from Google's developer site. None of them can correctly predict what source files protoc will actually accept or reject 100% of the time.

In the interest of a vibrant ecosystem and community building around Protobuf, we are excited to correct these omissions. As of today, Protobuf is now a fully-defined language:

🎉 protobuf.com/docs/language-spec 🎉

This means that users and tool makers now have a comprehensive source for what the Protobuf language is. With this knowledge, the quality of tools can vastly improve, and the existing tools can be made 100% correct.

As a side note, this work is a result of our work on the compiler that powers the buf CLI. We've built the compiler within the buf CLI to accurately match protoc. We've tested it against an extremely large corpus of real Protobuf sources, including some that use the most esoteric language syntax and features. Through the course of making Buf this robust alternative, we've learned a lot about the actual behavior of protoc, read all of its C++ code, and discovered how it behaves (sometimes surprisingly) in all manner of scenarios. We're excited to be able to share the Protobuf language specification as a formalized result of all of this hard work.

Ready for a trial?