Performance Tuning Guide

Redistill ships with sensible defaults, but you can unlock maximum throughput and consistently low latency by tuning shards, batching, buffers, memory and network settings to your workload.

Recommended defaults

Based on extensive benchmarking (e.g. AWS c7i.8xlarge), a good starting point for most production deployments is:

[server]
num_shards = 2048
batch_size = 256
buffer_size = 16384
buffer_pool_size = 2048
max_connections = 10000

[performance]
tcp_nodelay = true
tcp_keepalive = 60

[memory]
max_memory = 0
eviction_policy = "allkeys-lru"

This configuration achieved ≈6.87M GET ops/s and 2.74M SET ops/s in upstream benchmarks with sub-millisecond p50 latency.

Shard count (num_shards)

Goal: balance parallelism and memory overhead. Each shard adds overhead but reduces contention.

  • 256 shards – lower memory, more contention; small datasets & moderate concurrency.
  • 2048 shards – recommended balance for most workloads.
  • 4096 shards – maximum GET throughput; higher memory use.

Batch size (batch_size)

Goal: match your client pipeline depth to minimize syscalls without adding latency.

Pipeline depth (-P) Recommended batch_size
1–16 16
16–64 64–128
64–128 256
>128 512

Buffer pool size & TCP

buffer_pool_size controls the number of reusable response buffers. Too small increases allocations; too large wastes memory.

  • Start with 2048 buffers and increase for very high connection counts (>1000).
  • Enable tcp_nodelay for low-latency interactive workloads.
  • Use reasonable tcp_keepalive (60–300s) to keep long-lived connections healthy.

Workload-specific configurations

Read-heavy caching (>90% GETs)

[server]
num_shards = 4096
batch_size = 256
buffer_pool_size = 2048

[memory]
max_memory = 8589934592  # 8GB
eviction_policy = "allkeys-lru"

Balanced 50/50 read-write

[server]
num_shards = 2048
batch_size = 256
buffer_pool_size = 2048

[memory]
max_memory = 4294967296  # 4GB
eviction_policy = "allkeys-lru"

Low-latency interactive

[server]
num_shards = 2048
batch_size = 16

[performance]
tcp_nodelay = true

Memory planning & eviction

An approximate formula for total memory:

Total ≈ num_keys × (key_size + value_size + 100B) + shard_overhead + buffer_pool

Guidelines:

  • Set max_memory to ≈70–80% of available RAM.
  • Use allkeys-lru for most caches; use noeviction when data loss is unacceptable and handle errors in your app.
  • Monitor used_memory and evicted_keys via INFO or the HTTP health endpoint.

Benchmarking & troubleshooting

Quick benchmarking

redis-benchmark -h localhost -p 6379 \
  -t set,get \
  -n 2000000 \
  -c 500 \
  -P 128 \
  --csv

Common issues

  • High latency: ensure tcp_nodelay=true, reduce batch_size, and increase num_shards.
  • Low throughput: increase batch_size and num_shards, and ensure clients are pipelining.
  • Memory pressure: set max_memory and eviction policy; add TTLs.

For deeper system-level tuning (Linux kernel parameters, NUMA pinning, huge pages) and more detailed recipes, see the full Performance Tuning Guide in the upstream docs.