Benchmark
Busbar’s claim is that it adds only microseconds of overhead — small enough to disappear under the jitter of the provider call it fronts. This page is the falsifiable artifact behind that claim. Read the Result tab for the 10-second version; flip to Reproduce to run it yourself and get the same shape on your hardware.
We measure a difference, on a fixed, named machine so the run is exactly reproducible:
drive identical load against the same Anthropic model over two paths and subtract. Anthropic’s
own latency is present in both paths and cancels in the subtraction, so with − without is
Busbar’s added cost — measured against real provider jitter. We publish both absolute paths —
without Busbar and with Busbar — at p50 / p99 / p99.9, and the per-percentile difference, for
non-streaming full-response latency and for streaming TTFT (time to first byte). Nothing is
hidden: you see each path’s real numbers and can check the subtraction yourself.
Test rig (the reproduction anchor): AWS c7g.xlarge (4 vCPU Graviton3, Amazon Linux 2023
arm64) in us-east-1, running the released v1.0.0-rc.5 aarch64 binary against
claude-haiku-4-5 on api.anthropic.com. The load generator, Busbar, and the direct baseline all
run on that one box — same machine, same egress on both paths.
Full response (non-streaming) — latency, ms
| Path | p50 | p99 | p99.9 |
|---|---|---|---|
| Without Busbar — client → Anthropic | run to fill | run to fill | run to fill |
| With Busbar — client → Busbar → Anthropic | run to fill | run to fill | run to fill |
| Busbar adds (with − without) | run to fill | run to fill | run to fill |
Streaming TTFT — latency, ms
| Path | p50 | p99 | p99.9 |
|---|---|---|---|
| Without Busbar — client → Anthropic | run to fill | run to fill | run to fill |
| With Busbar — client → Busbar → Anthropic | run to fill | run to fill | run to fill |
| Busbar adds (with − without) | run to fill | run to fill | run to fill |
On the tail and provider jitter. The absolute p99/p99.9 in the Without and With rows include Anthropic’s own network jitter — a slow upstream response inflates the tail on both paths equally. That is exactly why the headline figure is the delta: the same provider variance sits in both measurements and subtracts out, leaving Busbar’s own added cost. We show the absolute rows too, so the provider tail is visible rather than hidden — read the jitter and the Busbar delta side by side, and check the subtraction yourself. (Sample size: each cell is N requests at fixed concurrency; the exact counts are printed by the harness and recorded with the run.)
The takeaway. Against a real provider the absolute call time is Anthropic’s (hundreds of ms); the delta is Busbar. The one-line read once filled: Busbar’s added p50 is a tiny fraction of the provider call, and it grows no tail of its own — the delta’s p99/p99.9 track its p50 rather than ballooning. Where the delta widens, that is provider jitter leaking into the subtraction, not a Busbar pause.
Why the tail stays tight — no garbage collector. Busbar is a single Rust binary with no GC. Nothing in the request path pauses to sweep memory, so the latency it adds is near-constant request to request: p99 lands close to p50, and even p99.9 does not balloon. A proxy on a garbage-collected runtime (a Python, Node, or JVM gateway) pays an occasional GC pause that lands on some requests — those become the tail, so its p99/p99.9 swell well above its p50 even when its median looks fine. The number that hurts a user is the tail, and the tail is where a no-GC proxy wins. That is why we report p50 / p99 / p99.9, not p50 alone: a median hides exactly the tail behavior that distinguishes the two architectures.
Honest competitive note. No apples-to-apples third-party figure exists to cite, because nobody publishes one. LiteLLM (Python/FastAPI) adds overhead in the millisecond range with a GC tail by construction, but publishes no reproducible self-host overhead benchmark. OpenRouter is a SaaS hop — its “overhead” is a public-internet round-trip to their servers, not a proxy cost, and there is no self-host to measure. A reproducible self-hosted overhead number is uniquely Busbar’s; we would rather ship the harness with an honest placeholder than a number nobody can check.
The figures come from one fixed machine and a published, copy-paste sequence — so anyone can re-run
exactly what we ran. The harness (bench/latency/) is Python stdlib plus the released binary;
no special build.
1. Launch the rig — an AWS c7g.xlarge (Graviton3, 4 vCPU) on Amazon Linux 2023 (arm64) in
us-east-1. The AMI is resolved from the SSM public parameter, so the image is pinned to the same
family for every reproducer:
AMI=$(aws ssm get-parameter --region us-east-1 \ --name /aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64 \ --query 'Parameter.Value' --output text)
aws ec2 run-instances --region us-east-1 \ --image-id "$AMI" --instance-type c7g.xlarge \ --key-name YOUR_KEYPAIR --security-group-ids YOUR_SG_ALLOWING_SSH \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=busbar-bench}]'# then: aws ec2 describe-instances ... for the public IP, and `ssh ec2-user@<ip>`2. On the box — install the released binary, fetch the harness at the tag, point it at Anthropic, and run:
curl -fsSL https://getbusbar.com/install.sh | sh # released aarch64 busbar (rc.5)# the binary under test is the release above; the harness landed one commit after the# tag, so clone the default branch for it:git clone --depth 1 https://github.com/MattJackson/busbarAI && cd busbarAI
export ANTHROPIC_API_KEY=sk-ant-... # your key; spends real tokensREQS=10000 CONC=12 BUSBAR_BIN=$HOME/busbar \ bench/latency/run-anthropic.sh # prints p50/p99/p99.9, both paths + deltaIt drives both paths against the same Anthropic model — direct (x-api-key straight to
api.anthropic.com) and through Busbar (Anthropic ingress → same upstream) — and prints the
per-percentile busbar − direct table. That delta is the number in the Result tab. Load is modest
by default (REQS=300 CONC=4, max_tokens=16) because it spends real tokens; scale with
REQS=1000 CONC=8 bench/latency/run-anthropic.sh. Terminate the instance when done
(aws ec2 terminate-instances --instance-ids <id>).
Isolating pure overhead (optional). The real-provider delta is dominated by Anthropic’s
jitter. To see Busbar’s overhead with zero provider noise, the harness also has a local
mock-upstream mode (bench/latency/run.sh) — see bench/latency/README.md. It needs a
publicly-trusted cert for a loopback domain, because the release binary trusts only public CA roots
for upstream TLS.