Advanced CEL topics
Understanding how a tool works can be just as important as understanding how to use it. In this page, you'll learn more about Common Expression Language (CEL), an open source technology at the core of Protovalidate. It explores what CEL and its "expressions" are, who uses CEL, and how Protovalidate uses CEL.
When you're finished, you should have a better understanding of CEL, be able to explain how Protovalidate works, and maybe even have ideas about using CEL in other projects.
What this page isn't
This page is not a tutorial for how to write CEL-based Protovalidate rules. If that's what you're looking for, see custom or predefined rules.
What is CEL?
CEL is a miniature programming language designed to be embedded within applications that need to evaluate, compile, and run brief "one-liners." In other words, it's a way to let other programs safely provide small pieces of code that need to run within your own.
A mental model that may be helpful is to think of CEL as the runtime for a formula in a spreadsheet. The spreadsheet's author writes the formula, and "something" within the spreadsheet application executes the formula to provide a result. CEL is that "something."
Who uses CEL?
CEL is used in popular technologies across the modern internet, proving that it's a production-worthy tool that can operate at scale.
- Google Cloud Platform uses CEL throughout its services. If you've written expressions for conditional IAM, its Secure Web Proxy, or Firebase security rules, you've used CEL.
- In Kubernetes, instead of using webhooks to validate custom resource definitions (CRDs), you can use CEL to validate values provided to a CRD and resources' state transitions.
- Envoy proxy lets you use CEL to declare conditions, such as authorization filters, that must be evaluated at runtime.
- The KrakenD API gateway allows you to filter requests and conditionally return responses with CEL expressions.
What's a CEL expression?
CEL's "one-liners" are expressions—code that combines constants, variables, functions, and operators to produce a value. (Expressions are the opposite of statements-code that's a valueless instruction.)
One way to think of expressions is that they're the right side of an assignment. In this Go example, this.size() > 5
is an expression:
Because CEL's syntax is C-style, its expressions are easy to read and write for anyone familiar with Go, Java, Python, C++, JavaScript, Typescript, or many other languages. Adapting the Go example above to a CEL expression, where the native CEL function size()
returns the length of a string, the following example CEL expression returns false
(foo
has a length of 3
, which is less than 5
):
How does CEL work?
At first glance, it's tempting to think that CEL can't scale because it appears to be dynamic. This isn't true. An application using CEL employs the CEL compiler to evaluate and compile expressions into CEL programs. This makes it simple to build a cache of already-encountered expressions or to proactively warm up a cache with a library of expressions.
Let's take a look at how this works, using the prior example of checking a string
's length.
Though implementations vary across its supported languages, they all follow the same basic workflow.
First, the application compiles CEL expressions into CEL programs by:
- Receiving one or more CEL expressions to evaluate. In this case, it's
someString.size() > 5
. - Creating a CEL environment and compiler—these are classes or types provided by the language-specific CEL implementation.
- Asking the CEL compiler to compile expressions, handling any compiler failures. For any expressions using variables, such as
someString
, the compiler is provided the name and type of the variable. - Receiving the result of the CEL compiler (a CEL abstract syntax tree, or AST).
- Creating a CEL program (another class or type provided by the CEL library) based on the compiled AST.
After CEL expressions are compiled into CEL programs, the program can be provided input and run (evaluated, in CEL's terminology) at any time, returning the result of its expression.
It's not too different from writing any other compiled program: the source code is compiled, turned into an executable program, and then a runtime executes the program with any provided input.
For more details on how to use CEL directly, see the CEL tutorials for Go, Java, and C++.
Protovalidate and CEL
Why it uses CEL
Protovalidate is the spiritual successor to protoc-gen-validate
(PGV), a protoc
plugin that generates polyglot message validation functions. When developers use their Protobuf files and PGV to generate code, PGV creates idiomatic Validate
methods for the generated types.
p := new(Person)
err := p.Validate() // err: First name is required
Because it relies on code generation, PGV's rules have to be implemented in each supported language. When UUID was added as a well-known string rule, the code change had to consistently implement the definition of a UUID string in Go, Java, and C++.
With that it mind, you can probably guess why Protovalidate uses CEL. If CEL expressions form a way to consistently evaluate expressions across multiple languages, and you can write a library of CEL expressions for common validation cases, you can create a cross-platform validation library.
Instead of defining each rule in each language, Protovalidate defines a library of CEL expressions for common rules that work across all of its supported languages.
Where it uses CEL
Unlike protoc-gen-validate
, Protovalidate isn't a protoc
plugin. It doesn't rely on any code generation. The core of Protovalidate is simply one Protobuf file using the proto2
syntax to define options (annotations).
In validate.proto
, you can see the definition for every standard Protovalidate rule. For example, Protovalidate doesn't have to define validation for a UUID string in Go, Java, and C++. Instead, Protovalidate stores it once as—you probably guessed it—a CEL expression:
bool uuid = 22 [
(predefined).cel = {
id: "string.uuid"
message: "value must be a valid UUID"
expression: "!rules.uuid || this == '' || this.matches('^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$')"
}
]
Because uuid
is defined as a bool
field in the StringRules
message, this makes it easy for you to annotate any string
field that should be a UUID, without worrying about inconsistent UUID checking across Go, Java, or other systems:
Since the definition of uuid
is part of the StringRules
message, its backing CEL expression is compiled as part of the Contact
's Protobuf descriptor—the compiled schema contains all of its own validation rules within its own metadata.
How it uses CEL
Because you've already learned how CEL works, you can probably extrapolate how your application and Protovalidate use CEL.
Your application:
- Depends on the Protovalidate library that supports its language.
- Creates a Protovalidate
Validator
(class or type provided the Protovalidate library). - Optionally warms up the Validator's compiled CEL program cache.
- Asks the Validator to validate a Protobuf message by:
- Using its cache of messages to look up all CEL programs that should be run for the message.
- Running each program, binding either the message or each field's value as a variable named
this
. - Collecting the results of each program.
- Transforming those results into the
Violation
andViolations
messages defined by Protovalidate.
- Handles the idiomatic response from the Validator: Go uses an
error
, Java uses aValidationResult
class, etc.
It's easier than it sounds.
If that sounds like a lot, and you're just interested in using Protovalidate in RPC APIs, don't fret.
Buf provides quickstarts with either open-source or example interceptors that do all of this for you. They're available for Connect and Go, gRPC and Go, gRPC and Java, and gRPC and Python.
What CEL unlocks
Because Protovalidate relies on CEL expressions that are compiled into schema metadata, it's not limited to using only its standard library of CEL-based validation expressions. CEL allows Protovalidate to do what no other Protobuf validation library has ever done—it lets you write your own validation expressions.
Custom CEL expressions
With Protovalidate, you can write your own validation rules once in your Protobuf files, and then immediately use them across any supported language.
Protovalidate calls these custom rules. Simple to implement, they're nothing more than an association of a CEL expression with a given field or message:
message SampleMessage {
string must_be_five = 1 [(buf.validate.field).cel = {
id: "must.be.five"
message: "this must be five letters long"
// A CEL expression defines the rule.
expression: "this.size() >= 5"
}];
}
Reusable rule libraries
Protovalidate defines its standard rules in a Protobuf file. By extending its messages, you can do the same thing. This means you can develop organization-specific libraries of your own rules, publish them to the Buf Schema Registry, and then reuse them across your enterprise.
Creating these predefined rules is similar to creating custom rules, using proto2
syntax and extending Protovalidate's rule messages:
extend buf.validate.StringRules {
optional bool must_be_five = 80048952 [
(buf.validate.predefined).cel = {
id: "must.be.five"
message: "this must be five letters long"
// A CEL expression defines the rule.
expression: "this.size() >= 5"
}
];
}
CEL extensions it adds
You've already seen that CEL allows variable values to be bound at runtime. Protovalidate takes advantage of this, providing variables like this
, rule
, and rules
to your CEL expressions.
CEL doesn't stop with variables, however—brand-new functions and overloads can be added to CEL itself. CEL programs delegate their execution to implementations provided by the host language, binding to names and CEL types.
Protovalidate leverages this to provide common validation functions that aren't built into CEL. For example, every language-specific Protovalidate implementation consistently implements isNan()
to provide a function that you can use to check for NaN
values. In protovalidate-go
's source code, you can see this function's declaration, naming, binding to the CEL double
type, and delegation to math.isNaN()
:
cel.Function("isNan",
cel.MemberOverload(
"double_is_nan_bool",
[]*cel.Type{cel.DoubleType},
cel.BoolType,
cel.UnaryBinding(func(value ref.Val) ref.Val {
num, ok := value.Value().(float64)
if !ok {
return types.UnsupportedRefValConversionErr(value)
}
return types.Bool(math.IsNaN(num))
}),
),
)
This introduces cross-platform concerns: if Go's math.IsNaN()
follows different semantics than the type-specific isNaN()
functions for Java's Double
and Float
types, consistency could suffer. Protovalidate addresses this through a suite of conformance tests that all supported implementations must pass.
All of Protovalidate's CEL extensions are documented in the Protovalidate reference.
What you can do with CEL
Hopefully this introduction to CEL's workings and its relationship with Protovalidate has given you not just a better understanding of Protovalidate but also added CEL itself to your toolbox. If you find yourself in a situation where you need to support simple expression evaluation across platforms, or even provide a safe runtime to end users, CEL is a well-supported, extensible choice.
Learning more
- Learn more about using CEL with Protovalidate to write custom and predefined rules.
- Find out how to use CEL in your own Go, Java, or C++ applications with a CEL code lab.
- Take a deep dive into the CEL language reference.