Bufstream live migration for Postgres#
This document describes the process for migrating Bufstream’s metadata store to the latest architecture with zero downtime. This migration is only applicable to clusters using Postgres for metadata storage.
The migration is controlled manually by advancing through a sequence of steps using the bufstream CLI. Bufstream includes guardrails to prevent invalid cluster states.
Cluster health must be monitored continuously during the migration. While some steps can be reverted or re-run, there is a point of no return after which full reversal is only possible through backup restoration.
How it works#
Bufstream uses Postgres as both a key-value store and a message queue. The new storage architecture implements these two functions in a simpler, more efficient way. Because the new architecture requires a new set of tables, key-value data must be copied and message queues must be atomically switched over.
To achieve this without downtime, the migration steps your cluster through a sequence of phases called read-write modes. Two intermediate dual-writing modes keep both sets of tables in sync while data is gradually moved over, leaving the old tables available as a fallback until a cutover point is reached.
The cluster progresses through four modes:
- Mode 1: Read and write old tables (starting point)
- Mode 2: Read old tables, write to both
- All key-value updates are dual-written, and an active pass over all key-value entries ensures all keys are synchronized before progressing.
- Mode 3: Read new tables, write to both
- Each internal message queue is atomically closed and reopened in the new backing table.
- Mode 4: Read and write new tables (migration complete)
Before you begin#
Verify the following before starting the migration:
- Bufstream has been upgraded to a version ≥ 0.4.14 and < 0.5.0, and all brokers are confirmed to be running the same version. Run
bufstream --versionviakubectl execto check each pod’s version. -
Bufstream and Postgres have sufficient resources:
-
Connection capacity: Verify that the sum of the connections allocated to Bufstream brokers (
metadata.postgres.pool.maxConnectionstimes the number of brokers) is below Postgres’s configured connection limit.For example, if you have eight brokers and Postgres’s
max_connectionsis 1000, then settingmetadata.postgres.pool.maxConnectionsto 100 means Bufstream may use up to 8 × 100 = 800 connections, which is below the limit of 1000.While the migration doesn’t allocate beyond your configured limit, it does increase usage of existing connections. If connection pools are misconfigured, operations may time out and cause cascading failures. If you’re close to the limit, increase Postgres’s
max_connectionsor reduce Bufstream’s connection pool size before proceeding. -
Postgres CPU and memory: The migration’s dual-write modes increase demand on Postgres. Before the migration, ensure Postgres is operating at under 65% CPU utilization and 80% memory utilization on average.
-
Bufstream memory: If brokers frequently experience OOMKilled events or consistently use more than 80% of available memory, increase memory allocation before proceeding.
-
-
Auto-scaling is disabled: Running additional brokers doesn’t negatively impact migration as long as Postgres can support the additional connections. Nevertheless, keeping the broker count static during the migration is recommended because:
- Broker restarts and terminations, while tolerated, may slow migration progress.
- Temporary migration-related metric changes may trigger spurious scaling events.
-
Cluster is healthy: See Monitor cluster health for details.
Dry-run command#
Bufstream provides a command which verifies that your cluster is ready for migration. It checks your cluster’s version and configuration, then scans your metadata store to verify that the migration will cover all of your data.
This command is read-only, and though it is optional, you are encouraged to run it before performing the migration.
Note
The dry-run check may show false-positive “missed keys” warnings if any metadata changes while it runs (new topics, partitions, or consumer groups). If this happens, re-run the command; it should not warn consistently about the same key.
Note that in the real migration, any new keys created during the process are already written to both stores via the dual-write system.
Monitor migration status#
Run this command periodically (for example, using watch) as you perform the migration:
This prints a formatted summary of the migration status:
Cluster Mode: 1 (V1_ONLY)
Stability Window: 2m0s
Last Action: never
Stability Window Remaining: 0s (ready for next action)
Broker Modes:
bufstream-us-west1-a-0 (us-west1-a): 1 (V1_ONLY)
bufstream-us-west1-a-1 (us-west1-a): 1 (V1_ONLY)
bufstream-us-west1-b-0 (us-west1-b): 1 (V1_ONLY)
bufstream-us-west1-b-1 (us-west1-b): 1 (V1_ONLY)
Migration Action Revisions:
mode_2_rev: 0
mode_3_rev: 0
mode_4_rev: 0
sync_kv_started_rev: 0
sync_kv_done_rev: 0
migrate_queues_started_rev: 0
migrate_queues_done_rev: 0
Monitor cluster health#
Metrics and logs are your early warning system during the migration. Watch for the following between each step, and if you see significant degradation in performance, consider reverting the last step and investigating before continuing:
- Metrics:
- Broker CPU and memory are stable. You can expect Postgres CPU to increase during the migration.
- Kafka error count is not increasing (obtained via
bufstream.kafka.request.countwith non-emptykafka.error_code). - Consumer lag is bounded (
bufstream.kafka.consumer.group.lag). - Producer throughput is consistent for constant-sized workloads (
bufstream.kafka.produce.bytes).
- Logs:
- Bufstream: there should be no new ERROR-level messages (there will be INFO-level logs related to the migration itself).
- Clients: the migration is not expected to impact clients, so any changes in client-side logs may indicate a problem.
See the Bufstream metrics reference for a full list of available metrics.
Note
Metrics will fluctuate during the migration. In particular, Postgres CPU utilization will increase noticeably, and Bufstream latencies may increase slightly. You can expect these metrics to settle, at or below their pre-migration averages, after the migration completes.
Migration steps#
Run these commands in order. You can execute them in any Bufstream broker pod using kubectl exec.
Note
After each step, Bufstream enforces a 2-minute stability window during which further migration actions are blocked. Wait for this window to complete before proceeding.
Step 1: Enter dual-write mode (Mode 2)#
The cluster writes to both old and new table schemas while reading from the old tables. New data is written to the new tables immediately; existing data is migrated in the next step.
Step 2: Migrate key-value data#
Copies all existing key-value data from the old tables to the new tables in a background job. The cluster remains available during the job. Use the status command to monitor progress.
Rollback: Cancel the job, then revert to Mode 1 if needed. If you revert to Mode 1, Bufstream will require you to run sync-kv again before continuing the migration.
Tuning sync-kv#
The sync-kv job may take several minutes to hours depending on your cluster’s configuration and age. By default, the job is tuned to finish as quickly as possible with minimal performance impact. However, there are two flags which you can use to optimize for your needs:
--concurrency int32
Controls the number of data-copying threads within each broker. Defaults to 1. Increasing concurrency will copy data faster, but will also increase Bufstream latency substantially. You are encouraged to use the default concurrency unless you need faster completion and can tolerate higher latency.
--rate-limit float
Controls the maximum keys copied per second per broker. By default, there is no limit. If your application is extremely latency-sensitive, you may use a very low rate limit (e.g., < 10) to keep the migration’s impact to a minimum.
Cancelling and resuming sync-kv#
Once initialized (which may take a few minutes), the sync-kv job is resilient to broker restarts as well as termination of the sync-kv start command. It will run to completion unless you explicitly cancel it.
After cancelling sync-kv, you can resume without losing progress by starting the job again. You may wish to cancel and resume if, for example, you need to change the concurrency, rate limit or broker count.
If you terminate sync-kv start and rerun it without cancelling first, your client will reconnect to the in-progress job and continue displaying its logs and progress.
Reverting to Mode 1 resets your progress, since Mode 1 is not a dual-write mode.
Step 3: Enter second dual-write mode (Mode 3)#
The cluster begins reading from the new tables while still writing to both. Key-value data has been fully migrated. Message queues are atomically switched over upon first write; the next step ensures they all transition.
Rollback: Revert to Mode 2.
You cannot revert to Mode 1 after this step.
Step 4: Migrate message queues#
Moves all message queues from the old tables to the new tables. The cluster remains available throughout.
Rollback: Cancel the job.
If needed, you may revert to Mode 2 after cancelling this job.You cannot revert to Mode 1 after this step.
Note
Unlike sync-kv, the duration of the migrate-queues job does not depend on the amount of data stored in your cluster. It typically finishes in seconds to minutes. However, note that terminating the migrate-queues start command does not stop the job; you must explicitly cancel it.
Step 5: Finalize migration (Mode 4)#
The cluster switches exclusively to the new tables. This step is irreversible without a backup restore.
No rollback available. Contact Buf if you experience problems after this step.