Deploy Bufstream to Azure with Azure Database for PostgreSQL#
This page walks you through installing Bufstream into your Azure deployment, using PostgreSQL for metadata storage.
Data from your Bufstream cluster never leaves your network or reports back to Buf. Bufstream metadata is stored in PostgreSQL, while the actual data produced to the Bufstream cluster is stored in an Azure storage container.
Prerequisites#
To deploy Bufstream on Azure, you need the following before you start:
- A Kubernetes cluster (v1.27 or newer)
- An Azure Storage account and blob storage container
- An Azure Database for PostgreSQL flexible server (version 14 or higher)
- A Bufstream managed identity, with read/write permission to the storage container above.
- Helm (v3.12.0 or newer)
Terraform module#
We also provide a Terraform module at https://github.com/bufbuild/terraform-modules-bufstream. It sets up all necessary components from an empty Azure subscription, or adds all necessary components to any subset of the required ones that are already installed.
If you're setting up from an empty subscription, you need the following permissions:
Network Contributor
Azure Kubernetes Service Contributor
Storage Account Contributor
Storage Blob Data Contributor
PostgreSQL Flexible Server Contributor
DNS Zone Contributor
Managed Identity Contributor
The install script also requires the Azure CLI and Azure Kubelogin to be installed.
Create an Azure Kubernetes Service (AKS) cluster#
Create an AKS cluster if you don't already have one. An AKS cluster involves many settings that vary depending on your use case. See the official documentation for details.
Set up Workload Identity Federation (WIF) for Bufstream#
You can authenticate to Azure Blob Storage with storage account shared access keys, or you can use Kubernetes WIF via Microsoft Entra Workload ID with Azure Kubernetes Service. See the official documentation:
Create a storage account#
If you don't already have one, create a new resource group:
Then, create a new storage account within the group:
$ az storage account create \
--name <account-name> \
--resource-group <group-name> \
--location <region> \
--sku Standard_RAGRS \
--kind StorageV2 \
--min-tls-version TLS1_2 \
--allow-blob-public-access false
Create a storage container#
Create a storage container inside the storage account created above:
$ az storage container create \
--name <container-name> \
--account-name <account-name> \
--auth-mode login
Create an Azure Database for PostgreSQL flexible server#
Create a new Azure Database for PostgreSQL flexible server
$ az postgres flexible-server create \
--name <server-name> \
--resource-group <group-name> \
--location <region> \
--version 16 \
--password-auth enabled \
--admin-user postgres \
--admin-password <postgres-user-password> \
--tier generalpurpose \
--sku-name Standard_D4ds_v5 \
--storage-type premium_lrs \
--storage-size 32 \
--performance-tier p20 \
--storage-auto-grow enabled \
--high-availability zoneredundant \
--public-access all \
--create-default-database enabled \
--database-name bufstream
For more details about instance creation, see the official docs. For a more secure setup, using private access instead of a public IP is recommended.
Create a managed identity and assign role to Microsoft Entra Workload ID for WIF#
The managed identity must be given the Storage Blob Data Contributor
role with access to the target container.
$ az identity create \
--name <identity name> \
--resource-group <group name> \
--location <region>
$ export MANAGED_IDENTITY_CLIENT_ID="$(az identity show --resource-group <group name> --name <identity name> --query 'clientId' -otsv)"
$ az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee $MANAGED_IDENTITY_CLIENT_ID \
--scope "/subscriptions/<azure-subscription-id>/resourceGroups/<group-name>/providers/Microsoft.Storage/storageAccounts/<account-name>/blobServices/default/containers/<container-name>"
$ export AKS_OIDC_ISSUER="$(az aks show --name <aks cluster name> --resource-group <aks cluster resource group name> --query "oidcIssuerProfile.issuerUrl" -otsv)"
$ az identity federated-credential create \
--name bufstream \
--identity-name <identity name> \
--resource-group <group name> \
--issuer "${AKS_OIDC_ISSUER}" \
--subject system:serviceaccount:bufstream:bufstream-service-account" \
--audience api://AzureADTokenExchange
$ echo $MANAGED_IDENTITY_CLIENT_ID # Save and use for Helm values below
Create a namespace#
Create a Kubernetes namespace in the k8s cluster for the bufstream
deployment to use:
Deploy Bufstream#
Configure Bufstream's Helm values#
Bufstream is configured using Helm values that are passed to the bufstream
Helm chart.
To configure the values:
-
Create a Helm values file named
bufstream-values.yaml
, which is required by thehelm install
command in step 4. This file can be in any location, but we recommend creating it in the same directory where you run thehelm
commands. -
Add the values from the steps below to the
bufstream-values.yaml
file. Skip to Install the Helm chart for a full example chart.
Configure object storage#
Bufstream attempts to acquire credentials from the environment using WIF.
To configure storage, set the following Helm values, filling in your Blob Storage variables:
storage:
use: azure
azure:
# Azure storage account container name.
bucket: <container name>
# Azure storage account endpoint to use—for example, https://<account-name>.blob.core.windows.net
endpoint: <endpoint>
bufstream:
deployment:
podLabels:
azure.workload.identity/use: "true"
serviceAccount:
annotations:
azure.workload.identity/client-id: <managed identity client id>
The k8s service account to create the Federated identity credential association for is named bufstream-service-account
.
Alternatively, you can use a shared key pair.
-
Fetch the shared key for the storage account. It is recommended to only use the first key returned. The second key should only be used when you are rotating keys.
-
Create a k8s secret containing the storage account's shared key:
-
Add the
accessKeyId
to the configuration:storage: use: azure azure: # Azure storage account container name. bucket: <container name> # Azure storage account endpoint to use—for example, https://<account-name>.blob.core.windows.net endpoint: <endpoint> # Azure storage account name to use for auth instead of the metadata server. accessKeyId: <account-name> # Kubernetes secret containing a `secret_access_key` (as the Azure storage account key) to use instead of the metadata server. secretName: bufstream-storage
Configure PostgreSQL#
Get the endpoint address of the PostgreSQL instance:
$ az postgres flexible-server show \
--name <server-name> \
--resource-group <resource-group> \
--query "{endpoint:fullyQualifiedDomainName}" \
--output table
Create a secret with the DSN to connect to the PostgreSQL instance:
kubectl create secret --namespace bufstream generic bufstream-postgres \
--from-literal=dsn='postgresql://postgres:<postgres-user-password>@<endpoint-address>:5432/bufstream?sslmode=require'
Then, configure Bufstream to connect to PostgreSQL:
Install the Helm chart#
Proceed to the zonal deployment steps if you want to deploy Bufstream with zone-aware routing. If not, follow the instructions below to deploy the basic Helm chart.
Add the following to the bufstream-values.yaml
Helm values file to make bufstream brokers automatically detect their zone:
After following the steps above, the set of Helm values should be similar to the example below:
storage:
use: azure
azure:
# Azure storage account container name.
bucket: my-container
endpoint: https://mystorageaccount.blob.core.windows.net
bufstream:
deployment:
podLabels:
azure.workload.identity/use: "true"
serviceAccount:
annotations:
azure.workload.identity/client-id: <managed identity client id>
metadata:
use: postgres
postgres:
secretName: bufstream-postgres
discoverZoneFromNode: true
Using the bufstream-values.yaml
Helm values file, install the Helm chart for the cluster and set the correct
Bufstream version:
$ helm install bufstream oci://us-docker.pkg.dev/buf-images-1/buf/charts/bufstream \
--version "<version>" \
--namespace=bufstream \
--values bufstream-values.yaml
If you change any configuration in the bufstream-values.yaml
file, re-run the Helm install command to apply the changes.
Deploy Bufstream with zone-aware routing#
Specify a list of target zones#
First, specify a list of target zones in a ZONES
variable, which are used for future commands.
Create WIF Association for all zones#
If you're using WIF, you'll need to create a federated identity credential for each service account in each zone.
$ export AKS_OIDC_ISSUER="$(az aks show --name <aks cluster name> --resource-group <group name> --query "oidcIssuerProfile.issuerUrl" -otsv)"
$ for ZONE in $ZONES; do
az identity federated-credential create \
--name bufstream-${ZONE} \
--identity-name <identity name> \
--resource-group <group name> \
--issuer "${AKS_OIDC_ISSUER}" \
--subject system:serviceaccount:bufstream:bufstream-service-account-${ZONE} \
--audience api://AzureADTokenExchange
done
Create Helm values files for each zone#
Then, use this script to iterate through the availability zones saved in the ZONES
variable and create a Helm values file for each zone:
$ for ZONE in $ZONES; do
cat <<EOF > "bufstream-${ZONE}-values.yaml"
nameOverride: bufstream-${ZONE}
name: bufstream-${ZONE}
zone: ${ZONE}
bufstream:
serviceAccount:
name: bufstream-service-account-${ZONE}
deployment:
replicaCount: 2
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- ${ZONE}
kafka:
publicAddress:
host: bufstream-${ZONE}.bufstream.svc.cluster.local
port: 9092
EOF
done
Using the example ZONES
variable above creates three values files: bufstream-<zone1>-values.yaml
, bufstream-<zone2>-values.yaml
and bufstream-<zone3>-values.yaml
.
Install the Helm chart for each zone#
After following the steps above and creating the zone-specific values files, the collection of Helm values should be structurally similar to the example below:
storage:
use: azure
azure:
# Azure storage account container name.
bucket: my-container
endpoint: https://mystorageaccount.blob.core.windows.net
bufstream:
deployment:
podLabels:
azure.workload.identity/use: "true"
serviceAccount:
annotations:
azure.workload.identity/client-id: <managed identity client id>
metadata:
use: postgres
postgres:
secretName: bufstream-postgres
nameOverride: bufstream-<zone1>
name: bufstream-<zone1>
bufstream:
serviceAccount:
name: bufstream-service-account-<zone1>
deployment:
replicaCount: 2
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- <zone1>
kafka:
publicAddress:
host: bufstream-<zone1>.bufstream.svc.cluster.local
port: 9092
To deploy a zone-aware Bufstream using the bufstream-values.yaml
Helm values file, install the Helm chart for the cluster, set the target
Bufstream version, and supply the ZONES
variable:
$ for ZONE in $ZONES; do
helm install "bufstream-${ZONE}" oci://us-docker.pkg.dev/buf-images-1/buf/charts/bufstream \
--version "<version>" \
--namespace=bufstream \
--values bufstream-values.yaml \
--values "bufstream-${ZONE}-values.yaml"
done
If you change any configuration in the bufstream-values.yaml
file, re-run the Helm install command to apply the changes.
Create a regional service for the cluster#
Create a regional service which creates a bootstrap address for Bufstream across all the zones.
$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Service
metadata:
labels:
bufstream.buf.build/cluster: bufstream
name: bufstream
namespace: bufstream
spec:
type: ClusterIP
ports:
- name: connect
port: 8080
protocol: TCP
targetPort: 8080
- name: admin
port: 9089
protocol: TCP
targetPort: 9089
- name: kafka
port: 9092
protocol: TCP
targetPort: 9092
selector:
bufstream.buf.build/cluster: bufstream
EOF
Running CLI commands#
Once you've deployed, you can run the Bufstream CLI commands directly using kubectl exec bufstream <command>
on the running Bufstream pods.
You don't need to install anything else.