Configure Bufstream for AWS
This page describes Bufstream's defaults and provides specific recommendations for configuring your Bufstream cluster in AWS to get the best combination of price and performance.
Resources and replicas
For initial Bufstream deployments we recommend a deployment size of 3 replicas with a minimum resource request size of 2 cores, 8 GiB of memory.
The bufstream
Helm chart defaults to these settings, and can be adjusted with the following Helm values:
bufstream:
deployment:
# Number of replicas to deploy
replicaCount: 3
resources:
requests:
cpu: 2
memory: 8Gi
limits:
# Optional
# cpu: 2
memory: 8Gi
Bufstream runs best on network-optimized instance types, such as m6in
.
If you choose larger instances or other types, the cost of running Bufstream will change according to the region and instance type you select.
Network-intensive workloads
Because Bufstream doesn't have a local disk, most I/O occurs over the network to support Kafka produce and fetch requests. Bufstream uses compression, compaction, and caching to minimize the load on the instances. However, because Bufstream needs to write to and read from remote storage, it puts more load on the network than out-of-the-box Kafka.
Horizontal Pod Autoscaler
We recommend configuring your Bufstream deployment with a minimum deployment size of 6 replicas, and a node group that runs across multiple Availability Zones (AZs). For example, for a node group over 3 AZs, a minimum replica count of 6 keeps cross-AZ network charges down if an instance is unavailable (such as during a deploy). Properly configured clients are directed to Bufstream instances in the same zone. We also recommend maintaining a ratio of 1:4 vCPU to GiB of memory. For example, for a 6 replica deployment, a 1GiB/s workload may demand 16 cores and 64 GiB of memory.
If you run Bufstream using the instance type recommended above, we also recommend autoscaling based on CPU usage with a 50% average usage target. Adjusting the autoscaling threshold impacts the overall cost of your cluster and its ability to respond to bursty workloads effectively. You can configure autoscaling for the Bufstream deployment using the following Helm values:
bufstream:
deployment:
autoscaling:
enabled: true
# Optional, replicas and target % cpu usage
minReplicas: 6
maxReplicas: 18
targetCPU: "50"
Object storage
Because Bufstream doesn't store data on a local disk, all data from the cluster is written to object storage (S3). Though Bufstream's only requirement is an isolated bucket to write to, we recommend configuring additional settings.
Object retention
Bufstream manages the object lifecycle directly, including deleting expired or compacted objects, so we don't recommend setting a retention policy for the S3 bucket. If you do set a retention policy, it must be longer than the maximum retention of any topic in your Bufstream cluster to guard against data loss.
Bucket lifecycle
To support multi-part uploads, you must configure a lifecycle policy to clean up failed or partially successful uploads. Configuring this policy will stop failed uploads from polluting the bucket and increasing storage costs. We recommend a maximum of 7 days or the topic retention value.
Bucket permissions
For Bufstream to interact with your bucket, you need to update the configuration with the appropriate permissions. Bufstream needs to perform the following bucket operations:
s3:GetObject
: Read existing objectss3:PutObject
: Create new objectss3:DeleteObject
: Remove old objects according to retention and compaction ruless3:ListBucket
: List objects in the buckets3:AbortMultipartUpload
: Allow failing of multi-part uploads that will not succeed
For more information about S3 bucket permissions and actions, consult the AWS S3 documentation.
Reducing produce latency
Produce latency can be decreased by reducing Bufstream's configured flush interval. Lowering the flush interval will incur greater cost in the form of more frequent writes to object storage. The default value is 100.
Metadata storage (etcd
)
Bufstream requires an etcd
cluster in which to persist cluster metadata.
We recommend configuring etcd
with the following settings:
Because etcd
is sensitive to disk performance, we recommend using gp3
or io1/io2
EBS disks.
Permissions
Bufstream uses the Bitnami etcd package. Most Bitnami containers are non-root and therefore privileged tasks like mounting volumes may fail during deployment as the containers do not have the correct privileges to modify ownership of the filesystem.
If a Bufstream deploy fails as a result of etcd attempting to mount a persistent volume, you will see the following error in your logs:
To resolve this error and grant the Bitnami etcd containers the right permissions, add the following to your Helm values to allow the container to change the owner and group of etcd's mountpoint to one with appropriate filesystem permissions:
To learn more about this Helm value or other configurable permissions in Bitnami's etcd chart, consult the Bitnami etcd README. For more information about troubleshooting Bitnami helm chart issues, read the troubleshooting guide.