Skip to content

Benchmark

Busbar’s claim is that it adds only microseconds of overhead — small enough to disappear under the jitter of the provider call it fronts. This page is the falsifiable artifact behind that claim. Read the Result tab for the 10-second version; flip to Reproduce to run it yourself and get the same shape on your hardware.

We measure a difference, on a fixed, named machine so the run is exactly reproducible: drive identical load against the same Anthropic model over two paths and subtract. Anthropic’s own latency is present in both paths and cancels in the subtraction, so with − without is Busbar’s added cost — measured against real provider jitter. We publish both absolute paths — without Busbar and with Busbar — at p50 / p99 / p99.9, and the per-percentile difference, for non-streaming full-response latency and for streaming TTFT (time to first byte). Nothing is hidden: you see each path’s real numbers and can check the subtraction yourself.

Test rig (the reproduction anchor): AWS c7g.xlarge (4 vCPU Graviton3, Amazon Linux 2023 arm64) in us-east-1, running the released v1.0.0-rc.5 aarch64 binary against claude-haiku-4-5 on api.anthropic.com. The load generator, Busbar, and the direct baseline all run on that one box — same machine, same egress on both paths.

Full response (non-streaming) — latency, ms

Pathp50p99p99.9
Without Busbar — client → Anthropicrun to fillrun to fillrun to fill
With Busbar — client → Busbar → Anthropicrun to fillrun to fillrun to fill
Busbar adds (with − without)run to fillrun to fillrun to fill

Streaming TTFT — latency, ms

Pathp50p99p99.9
Without Busbar — client → Anthropicrun to fillrun to fillrun to fill
With Busbar — client → Busbar → Anthropicrun to fillrun to fillrun to fill
Busbar adds (with − without)run to fillrun to fillrun to fill

On the tail and provider jitter. The absolute p99/p99.9 in the Without and With rows include Anthropic’s own network jitter — a slow upstream response inflates the tail on both paths equally. That is exactly why the headline figure is the delta: the same provider variance sits in both measurements and subtracts out, leaving Busbar’s own added cost. We show the absolute rows too, so the provider tail is visible rather than hidden — read the jitter and the Busbar delta side by side, and check the subtraction yourself. (Sample size: each cell is N requests at fixed concurrency; the exact counts are printed by the harness and recorded with the run.)

The takeaway. Against a real provider the absolute call time is Anthropic’s (hundreds of ms); the delta is Busbar. The one-line read once filled: Busbar’s added p50 is a tiny fraction of the provider call, and it grows no tail of its own — the delta’s p99/p99.9 track its p50 rather than ballooning. Where the delta widens, that is provider jitter leaking into the subtraction, not a Busbar pause.

Why the tail stays tight — no garbage collector. Busbar is a single Rust binary with no GC. Nothing in the request path pauses to sweep memory, so the latency it adds is near-constant request to request: p99 lands close to p50, and even p99.9 does not balloon. A proxy on a garbage-collected runtime (a Python, Node, or JVM gateway) pays an occasional GC pause that lands on some requests — those become the tail, so its p99/p99.9 swell well above its p50 even when its median looks fine. The number that hurts a user is the tail, and the tail is where a no-GC proxy wins. That is why we report p50 / p99 / p99.9, not p50 alone: a median hides exactly the tail behavior that distinguishes the two architectures.

Honest competitive note. No apples-to-apples third-party figure exists to cite, because nobody publishes one. LiteLLM (Python/FastAPI) adds overhead in the millisecond range with a GC tail by construction, but publishes no reproducible self-host overhead benchmark. OpenRouter is a SaaS hop — its “overhead” is a public-internet round-trip to their servers, not a proxy cost, and there is no self-host to measure. A reproducible self-hosted overhead number is uniquely Busbar’s; we would rather ship the harness with an honest placeholder than a number nobody can check.