Skip to content

Deploy Bufstream to Azure

This page walks you through installing Bufstream into your Azure deployment by setting your Helm values and installing the provided Helm chart. See the Azure configuration page for defaults and recommendations about resources, replicas, storage, and scaling.

Data from your Bufstream cluster never leaves your network or reports back to Buf.

Prerequisites

To deploy Bufstream on Azure, you need the following before you start:

  • A Kubernetes cluster (v1.27 or newer)
  • An Azure Storage account and blob storage container
  • A Bufstream managed identity, with read/write permission to the storage container above.
  • Helm (v3.12.0 or newer)

Create an Azure Kubernetes Service (AKS) cluster

Create an AKS cluster if you don't already have one. An AKS cluster involves many settings that vary depending on your use case. See the official documentation for details.

Set up Workload Identity Federation (WIF) for Bufstream

You can authenticate to Azure Blob Storage with storage account shared access keys, or you can use Kubernetes WIF via Microsoft Entra Workload ID with Azure Kubernetes Service. See the official documentation:

Create a storage account

If you don't already have one, create a new resource group:

$ az group create \
  --name <group-name> \
  --location <region>

Then, create a new storage account within the group:

$ az storage account create \
  --name <account-name> \
  --resource-group <group-name> \
  --location <region> \
  --sku Standard_RAGRS \
  --kind StorageV2 \
  --min-tls-version TLS1_2 \
  --allow-blob-public-access false

Create a storage container

Create a storage container inside the storage account created above:

$ az storage container create \
    --name <container-name> \
    --account-name <account-name> \
    --auth-mode login

Create a managed identity and assign role to Microsoft Entra Workload ID for WIF

The managed identity must be given the Storage Blob Data Contributor role with access to the target container.

$ az identity create \
  --name <identity name> \
  --resource-group <group name> \
  --location <region>

$ export MANAGED_IDENTITY_CLIENT_ID="$(az identity show --resource-group <group name> --name <identity name> --query 'clientId' -otsv)"

$ az role assignment create \
    --role "Storage Blob Data Contributor" \
    --assignee $MANAGED_IDENTITY_CLIENT_ID \
    --scope "/subscriptions/<azure-subscription-id>/resourceGroups/<group-name>/providers/Microsoft.Storage/storageAccounts/<account-name>/blobServices/default/containers/<container-name>"

$ export AKS_OIDC_ISSUER="$(az aks show --name <aks cluster name> --resource-group <aks cluster resource group name> --query "oidcIssuerProfile.issuerUrl" -otsv)"

$ az identity federated-credential create \
  --name bufstream \
  --identity-name <identity name> \
  --resource-group <group name> \
  --issuer "${AKS_OIDC_ISSUER}" \
  --subject system:serviceaccount:bufstream:bufstream-service-account" \
  --audience api://AzureADTokenExchange

$ echo $MANAGED_IDENTITY_CLIENT_ID # Save and use for helm values below

Create a namespace

Create a Kubernetes namespace in the k8s cluster for the bufstream deployment to use:

$ kubectl create namespace bufstream

Deploy etcd

Bufstream requires an etcd cluster. To set up an example deployment of etcd on Kubernetes, use the Bitnami etcd Helm chart with the following values:

$ helm install \
  --namespace bufstream \
  bufstream-etcd \
  oci://registry-1.docker.io/bitnamicharts/etcd \
  -f - <<EOF
replicaCount: 3
persistence:
  enabled: true
  size: 10Gi
  storageClass: ""
autoCompactionMode: periodic
autoCompactionRetention: 30s
removeMemberOnContainerTermination: false
resourcesPreset: none
auth:
  rbac:
    create: false
    enabled: false
  token:
    enabled: false
metrics:
  useSeparateEndpoint: true
customLivenessProbe:
  httpGet:
    port: 9090
    path: /livez
    scheme: "HTTP"
  initialDelaySeconds: 10
  periodSeconds: 30
  timeoutSeconds: 15
  failureThreshold: 10
customReadinessProbe:
  httpGet:
    port: 9090
    path: /readyz
    scheme: "HTTP"
  initialDelaySeconds: 20
  timeoutSeconds: 10
extraEnvVars:
  - name: ETCD_LISTEN_CLIENT_HTTP_URLS
    value: "http://0.0.0.0:8080"
EOF

Check that etcd is running after installation.

Warning

etcd is sensitive to disk performance, so we recommend using the Azure Disks Container Storage Interface with Premium SSD v2 disks.

The storage class in the example above can be changed by setting the persistence.storageClass value to a custom storage class using those disks.

Deploy Bufstream

1. Authenticate helm

To get started, authenticate helm with the Bufstream OCI registry using the keyfile that was sent alongside this documentation. The keyfile should contain a base64 encoded string.

$ cat keyfile | helm registry login -u _json_key_base64 --password-stdin \
  https://us-docker.pkg.dev/buf-images-1/bufstream

2. Configure Bufstream's Helm values

Bufstream is configured using Helm values that are passed to the bufstream Helm chart. To configure the values:

  1. Create a Helm values file named bufstream-values.yaml, which is required by the helm install command in step 4. This file can be in any location, but we recommend creating it in the same directory where you run the helm commands.

  2. Add the values from the steps below to the bufstream-values.yaml file. Skip to Install the Helm chart for a full example chart.

Configure object storage

Bufstream attempts to acquire credentials from the environment using WIF.

To configure storage, set the following Helm values, filling in your Blob Storage variables:

bufstream-values.yaml
storage:
  use: azure
  azure:
    # Azure storage account container name.
    bucket: <container name>
    # Azure storage account endpoint to use—for example, https://<account-name>.blob.core.windows.net
    endpoint: <endpoint>
bufstream:
  deployment:
    podLabels:
      azure.workload.identity/use: "true"
  serviceAccount:
    annotations:
      azure.workload.identity/client-id: <managed identity client id>

The k8s service account to create the Federated identity credential association for is named bufstream-service-account.

Alternatively, you can use a shared key pair.

  1. Fetch the shared key for the storage account. It is recommended to only use the first key returned. The second key should only be used when you are rotating keys.

    $ az storage account keys list \
      --resource-group <group-name> \
      --account-name <account-name>
    
  2. Create a k8s secret containing the storage account's shared key:

    $ kubectl create secret --namespace bufstream generic bufstream-storage \
      --from-literal=secret_access_key=<Azure storage account key>
    
  3. Add the accessKeyId to the configuration:

    storage:
      use: azure
      azure:
        # Azure storage account container name.
        bucket: <container name>
        # Azure storage account endpoint to use—for example, https://<account-name>.blob.core.windows.net
        endpoint: <endpoint>
        # Azure storage account name to use for auth instead of the metadata server.
        accessKeyId: <account-name>
        # Kubernetes secret containing a `secret_access_key` (as the Azure storage account key) to use instead of the metadata server.
        secretName: bufstream-storage
    

Configure etcd

Then, configure Bufstream to connect to the etcd cluster:

bufstream-values.yaml
metadata:
  use: etcd
  etcd:
    # etcd addresses to connect to
    addresses:
    - host: "bufstream-etcd.bufstream.svc.cluster.local"
      port: 2379

3. Install the Helm chart

Proceed to the zonal deployment steps if you want to deploy Bufstream with zone-aware routing. If not, follow the instructions below to deploy the basic Helm chart.

Add the following to the bufstream-values.yaml Helm values file to make bufstream brokers automatically detect their zone:

discoverZoneFromNode: true

After following the steps above, the set of Helm values should be similar to the example below:

bufstream-values.yaml
storage:
  use: azure
  azure:
    # Azure storage account container name.
    bucket: my-container
    endpoint: https://mystorageaccount.blob.core.windows.net
bufstream:
  deployment:
    podLabels:
      azure.workload.identity/use: "true"
  serviceAccount:
    annotations:
      azure.workload.identity/client-id: <managed identity client id>
metadata:
  use: etcd
  etcd:
    # etcd addresses to connect to
    addresses:
    - host: "bufstream-etcd.bufstream.svc.cluster.local"
      port: 2379
discoverZoneFromNode: true
bufstream-values.yaml
storage:
  use: azure
  azure:
    bucket: my-container
    endpoint: https://mystorageaccount.blob.core.windows.net
    accessKeyId: mystorageaccount
    secretName: bufstream-storage
metadata:
  use: etcd
  etcd:
    # etcd addresses to connect to
    addresses:
    - host: "bufstream-etcd.bufstream.svc.cluster.local"
      port: 2379
discoverZoneFromNode: true

Using the bufstream-values.yaml Helm values file, install the Helm chart for the cluster and set the correct Bufstream version:

$ helm install bufstream oci://us-docker.pkg.dev/buf-images-1/bufstream/charts/bufstream \
  --version "<version>" \
  --namespace=bufstream \
  --values bufstream-values.yaml

If you change any configuration in the bufstream-values.yaml file, re-run the Helm install command to apply the changes.

Deploy Bufstream with zone-aware routing

1. Specify a list of target zones

First, specify a list of target zones in a ZONES variable, which are used for future commands.

$ ZONES=(<zone1> <zone2> <zone3>)

2. Create WIF Association for all zones

If you're using WIF, you'll need to create a federated identity credential for each service account in each zone.

$ export AKS_OIDC_ISSUER="$(az aks show --name <aks cluster name> --resource-group <group name> --query "oidcIssuerProfile.issuerUrl" -otsv)"

$ for ZONE in $ZONES; do
  az identity federated-credential create \
    --name bufstream-${ZONE} \
    --identity-name <identity name> \
    --resource-group <group name> \
    --issuer "${AKS_OIDC_ISSUER}" \
    --subject system:serviceaccount:bufstream:bufstream-service-account-${ZONE} \
    --audience api://AzureADTokenExchange
done

3. Create Helm values files for each zone

Then, use this script to iterate through the availability zones saved in the ZONES variable and create a Helm values file for each zone:

$ for ZONE in $ZONES; do
  cat <<EOF > "bufstream-${ZONE}-values.yaml"
nameOverride: bufstream-${ZONE}
name: bufstream-${ZONE}
zone: ${ZONE}
bufstream:
  serviceAccount:
    name: bufstream-service-account-${ZONE}
  deployment:
    replicaCount: 2
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values:
              - ${ZONE}
kafka:
  publicAddress:
    host: bufstream-${ZONE}.bufstream.svc.cluster.local
    port: 9092
EOF
done

Using the example ZONES variable above creates three values files: bufstream-<zone1>-values.yaml, bufstream-<zone2>-values.yaml and bufstream-<zone3>-values.yaml.

4. Install the Helm chart for each zone

After following the steps above and creating the zone-specific values files, the collection of Helm values should be structurally similar to the example below:

bufstream-values.yaml
storage:
  use: azure
  azure:
    # Azure storage account container name.
    bucket: my-container
    endpoint: https://mystorageaccount.blob.core.windows.net
bufstream:
  deployment:
    podLabels:
      azure.workload.identity/use: "true"
  serviceAccount:
    annotations:
      azure.workload.identity/client-id: <managed identity client id>
metadata:
  use: etcd
  etcd:
    # etcd addresses to connect to
    addresses:
    - host: "bufstream-etcd.bufstream.svc.cluster.local"
      port: 2379
bufstream-_zone1_-values.yaml
nameOverride: bufstream-<zone1>
name: bufstream-<zone1>
bufstream:
  serviceAccount:
    name: bufstream-service-account-<zone1>
  deployment:
    replicaCount: 2
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values:
              - <zone1>
kafka:
  publicAddress:
    host: bufstream-<zone1>.bufstream.svc.cluster.local
    port: 9092

To deploy a zone-aware Bufstream using the bufstream-values.yaml Helm values file, install the Helm chart for the cluster, set the target Bufstream version, and supply the ZONES variable:

$ for ZONE in $ZONES; do
  helm install "bufstream-${ZONE}" oci://us-docker.pkg.dev/buf-images-1/bufstream/charts/bufstream \
    --version "<version>" \
    --namespace=bufstream \
    --values bufstream-values.yaml \
    --values "bufstream-${ZONE}-values.yaml"
done

If you change any configuration in the bufstream-values.yaml file, re-run the Helm install command to apply the changes.

5. Create a regional service for the cluster

Create a regional service which creates a bootstrap address for Bufstream across all the zones.

$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Service
metadata:
  labels:
    bufstream.buf.build/cluster: bufstream
  name: bufstream
  namespace: bufstream
spec:
  type: ClusterIP
  ports:
  - name: connect
    port: 8080
    protocol: TCP
    targetPort: 8080
  - name: admin
    port: 9089
    protocol: TCP
    targetPort: 9089
  - name: kafka
    port: 9092
    protocol: TCP
    targetPort: 9092
  selector:
    bufstream.buf.build/cluster: bufstream
EOF

Running CLI commands

Once you've deployed, you can run the Bufstream CLI commands directly using kubectl exec bufstream <command> on the running Bufstream pods. You don't need to install anything else.