Skip to content

Changelog

All notable changes to Busbar are documented here.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Three independent features land together: pluggable routing policies, deeper Prometheus observability, and native inbound TLS/mTLS. The request path, wire protocols, breaker FSM, and governance contract are unchanged. This release also folds in a multi-round security and correctness audit and an internal provider-containment refactor.

  • Pluggable routing policies (route: per pool). A pool can declare a route: key that produces an ordered preference over its members. The ranked list feeds the existing failover loop — if the policy’s first choice is tripped or at capacity, Busbar walks to the next; a policy can never strand a request.

    Five built-in native policies, selected with route: <name>:

    • weighted — default smooth weighted round-robin (SWRR); no behavioral change from rc.4.
    • cheapest — prefer the member with the lowest operator-declared cost_per_mtok.
    • fastest — prefer the member with the lowest rolling-EWMA latency.
    • least_busy — prefer the member with the most available concurrency permits.
    • usage — prefer the member with the most rate-limit headroom (fraction of the caller key’s RPM/TPM budget still available this window), steering traffic away from candidates approaching a provider 429.

    Members missing a signal are demoted to the back of the preference list but never dropped, so incomplete signal data cannot strand a lane.

    Two additional transports for operator-defined logic:

    • webhook — POSTs a stable JSON projection of the request and candidates to an operator-run HTTP sidecar (any language, any runtime); the sidecar returns a ranked { "order": [...] }.
    • script — evaluates an operator-supplied Rhai script compiled once at config load. Gated behind the script-policy Cargo feature (off by default), keeping the default binary free of the Rhai dependency.

    Both transports honor a per-pool timeout_ms; a timeout or transport error falls back to the pool’s on_error setting (abstain | weighted | reject | first) and never blocks or fails the client request.

    Zero-cost default path. A pool with route: weighted — including any pool that omits the route: key entirely — resolves to no policy object at config load. The hot path is a single branch that is never entered for default pools: no allocation, no signal projection, no I/O, identical throughput to rc.4.

  • Four new Prometheus gauges (scrape-time). Refreshed on each /metrics scrape from in-process reads, not on the request hot path. All label values are drawn from operator-controlled configuration; no client-supplied input appears as a label:

    • busbar_key_spend_cents — per-virtual-key accumulated spend in cents for the current budget window (label: key = virtual-key id). Only emitted when governance is enabled.
    • busbar_key_budget_remaining_centsmax_budget_cents minus current spend for keys that carry a budget cap. Suitable for Prometheus burn-rate alerting. Only emitted for capped keys.
    • busbar_key_tokens_total — accumulated tokens consumed by each virtual key in the current budget window (label: key).
    • busbar_lane_state — per-(pool, lane-index) circuit-breaker health: 0 = healthy (Closed), 1 = half-open (cooling, probe admitted), 2 = tripped (Open or hard-down). Labels: pool and lane (numeric index). Read-only; does not trigger FSM transitions.
  • Native inbound TLS and optional mutual TLS. Busbar now terminates TLS on the client-to-Busbar hop natively, without a reverse proxy. Add a tls: block to config.yaml:

    tls:
    cert_file: /etc/busbar/tls/fullchain.pem
    key_file: /etc/busbar/tls/privkey.pem
    client_ca_file: /etc/busbar/tls/ca.pem # optional — enables mTLS

    When client_ca_file is present, Busbar requires a client certificate signed by that CA; connections without a valid cert are rejected at the TLS handshake, before any HTTP or bearer-token processing. Omitting tls: entirely leaves the plain-HTTP path unchanged.

  • mTLS client-cert enforcement. With client_ca_file set, unauthenticated connections are rejected at the TLS layer — before HTTP routing or governance checks — providing zero-trust transport without a service mesh.
  • TLS handshake timeout. A 10-second wall-clock cap on each incoming TLS handshake prevents a client from parking a file descriptor and task indefinitely before authentication (slowloris / handshake-flood mitigation). A timed-out or failed handshake drops only that connection; the server continues serving other clients.
  • Webhook response size cap. The webhook routing transport reads sidecar responses under a 64 KiB cap. A slow or hostile sidecar cannot drive unbounded memory allocation; an oversized response is an error and falls back to on_error.
  • Rhai script operation budget. The script transport evaluates operator scripts under a per-invocation Rhai operation count limit and a hard wall-clock deadline (run on the blocking pool so a runaway script cannot pin an async worker). No module resolver, no file or network host functions are registered in the sandboxed engine.
  • Startup fail-fast for TLS config errors. PEM cert, key, or CA load/parse failures abort startup with a message naming the offending file; key material is never logged. A single-connection handshake failure is logged at debug level only.
  • Weight-zero drain bypass on the session-affinity path. A pool member set to weight: 0 (an operator draining a lane) could still receive requests that carried an existing session-affinity stickiness, sidestepping the drain. Affinity resolution now applies the same weight-zero exclusion as fresh routing; regression test added.
  • Anthropic outbound User-Agent. Corrected the User-Agent header shape emitted on the Anthropic upstream hop.
  • SSRF guard covers the Oracle Cloud metadata address. The trusted-upstream net guard now blocks 192.0.0.192 alongside the other link-local / cloud-metadata ranges.
  • Additional cross-cutting correctness fixes from a deep audit pass (streaming-translation vtable flag propagation, request-id header constant) and the multi-round security and correctness review (rounds R3–R12).
  • Provider containment (internal). All provider-name branches were removed from the protocol-agnostic core and relocated behind the ProtocolReader/ProtocolWriter vtable, so provider-specific behavior lives entirely in src/proto/* (safe defaults plus per-provider overrides). No user-visible behavior change — architecture only.

A continuation of the rc.3 hardening campaign: nine further rounds (R19→R27) of multi-round, dual-model (Sonnet + Opus) security/correctness auditing over the rc.3 tree, with adversarial triage and class-level fixes. No API changes vs rc.3.

The severity gate — 0 critical / 0 high / 0 medium-security / 0 medium-correctness — is met and has held flat for the final four rounds; remaining findings are documented low/medium-completeness items at the asymptote of the audit loop. The test suite grew from 267 (rc.2) to 1334 passing; fmt, build, clippy -D warnings, and test all green.

  • Circuit-breaker / streaming / FSM cluster — clean SSE stream-end no longer records a spurious breaker failure; breaker success is recorded synchronously before streaming; mid-stream error paths no longer double-record. Readiness checks (cell_ready_breaker/is_ready) are split from the probe-acquiring transition (cell_acquire_breaker) so candidate enumeration no longer steals probes or transitions lanes; a failed half-open probe releases its permit instead of benching a lane permanently.
  • Upstream Retry-After is extracted on the forward path and propagated through error normalization so the breaker cooldown floor is honored.
  • SSRF hardening — backslash-bypass and OTLP-redirect vectors closed; the OTLP exporter uses a no-redirect client. Removed a duplicate reqwest major as a side effect.
  • Same-protocol non-stream large-body token undercountFirstByteBody now buffers and feeds the whole body once, so usage is no longer dropped past the per-chunk scan cap.
  • A long tail of medium/low conformance, governance, admin-validation, and protocol-translation findings across all six wire protocols (see the private audit residuals for the per-finding ledger).

This is a hardening release: a multi-round security/correctness audit campaign over the rc.2 code, plus the universal-ingress feature. No API changes vs rc.2 beyond the new ingress routes.

  • Universal ingress — all six protocols are now first-class ingress. Previously clients could only speak Anthropic (/<...>/v1/messages) or OpenAI (/v1/chat/completions); now native Responses (/v1/responses), Cohere (/v2/chat), Gemini (/v1beta/models/{model}:generateContent / :streamGenerateContent), and Bedrock (/model/{modelId}/converse / /converse-stream) clients can point their SDK’s base URL at busbar unmodified. Each protocol has one ingress route; body-model protocols (openai, responses, cohere) take the model/pool from the request body, path-model protocols (anthropic, gemini, bedrock) from the URL. Errors are emitted in the caller’s native protocol shape, with multi-scheme auth and content-type/identity handling per protocol.
  • /metrics is no longer unconditionally open. It now goes through the same auth check as /stats (requires a valid client token in token mode, or a virtual key under governance) because the Prometheus exposition — lane/pool topology, per-protocol counters, error rates — is an information-disclosure surface. Only /healthz remains unconditionally open. In none/passthrough mode /metrics is still admitted unconditionally. This supersedes the 0.16.2 security-review note that described /metrics as intentionally open.
  • SSRF guard hardened against trailing-dot hosts. The webhook and OTLP endpoint validators stripped a trailing FQDN-root dot only inside one branch, so 127.0.0.1. / metadata.google.internal. slipped past the IP-literal and cloud-metadata checks and resolved to internal targets. The dot is now stripped before every check, matching the upstream-config SSRF guard.
  • Admin reserved-name collision now rejected for models too. A model named admin was reachable at /admin/v1/messages (the operator admin surface), making it unreachable to clients and bypassing per-model governance. Config validation now rejects it, symmetric with the pool/provider checks.
  • Anthropic egress no longer emits a dual-credential header. An ambiguous credential previously sent both x-api-key and authorization: Bearer — a request shape no native client produces. The wire path now resolves it to the single native header the auth mode implies.
  • Cohere streaming text no longer dropped. The content-delta reader could not decode the native object shape (delta.message.content = {type,text}) the writer emits, silently dropping streamed assistant text on the Cohere read/proxy path.
  • OpenAI include_usage streams. A usage: null non-final chunk no longer synthesizes a spurious mid-stream message_delta; and a trailing usage-only chunk no longer produces a message_delta after message_stop on non-Bedrock ingress.
  • Gemini safety-filtered responses. A finishReason: SAFETY candidate with no content field (a legitimate Gemini shape) is decoded normally instead of returning a spurious 500.
  • Bedrock conformance: cross-protocol degraded error relays now forward x-amzn-requestid / x-amzn-errortype; tool-call ids are remapped to the client’s native shape on the degraded path; prompt-cache token fields round-trip.
  • Responses non-streaming output items now carry the native id / status / annotations the streaming path emits.
  • Numerous lower-severity correctness/conformance fixes across the breaker cooldown jitter, SigV4 header canonicalization, health-probe Retry-After handling, and id synthesis (unbiased base62). Active health probes now send the same User-Agent / Accept as organic traffic. Admin key creation rejects negative budgets.
  • MSRV is now Rust 1.87 (declared via rust-version), reflecting use of u32::is_multiple_of.
  • Internal: the auth mode is now a single source of truth on the auth middleware (removed a denormalized copy on the app state).
  • ~30× faster cold start (≈206 ms → ≈6 ms). The Prometheus recorder is now installed on a background thread, so its one-time clock calibration (quanta’s TSC calibration, ~200 ms) no longer blocks the listener — busbar binds and serves (including /healthz) in single-digit milliseconds, the right behavior for a daemon/k8s readiness path. Trade-off: /metrics renders empty until the recorder finishes calibrating shortly after start, and the few requests in that window are not counted.

First release candidate for 1.0. Busbar is feature-complete and API-stable: six wire protocols with lossless cross-protocol translation, weighted SWRR pools with per-(pool,lane) circuit breaking and in-flight failover, governance (virtual keys / budgets / rate limits), and a security-hardened request path — all in one native binary. The remaining work before 1.0.0 is operational validation (extended soak/leak testing and a performance/SLO baseline), not features.

  • Release profile optimized for distribution. opt-level 3 + fat LTO + codegen-units = 1 + symbol stripping cut the release binary from ~12 MB to 7.4 MB with a faster hot path. panic stays unwind so a panic in one request task can’t abort the whole gateway.
  • README rewritten around the value proposition (SDK-swap hook, competitor comparison, Security and cross-protocol-translation sections, badges).
  • OpenAI→Anthropic translation no longer drops max_tokens. An OpenAI-format request that omits max_tokens (legal — the OpenAI server applies a default) was translated to the Anthropic Messages API without one, which hard-rejects it (400 max_tokens: Field required). So any OpenAI-compatible client relying on the server default 400’d on every call once pointed at an Anthropic-backed lane. busbar now injects a max_tokens at the cross-protocol translation boundary when the egress protocol requires it (Anthropic) and the source omitted it. A caller-supplied value is always preserved, and same-protocol passthrough is unaffected. Bedrock Converse defaults maxTokens server-side, so it is intentionally excluded (injecting would silently cap output).
  • default_max_tokens per-model config (optional). Sets the value injected for the case above; unset falls back to a conservative 4096. Validated > 0 at startup. Documented in config.yaml.

Security hardening. The following vectors were reviewed and confirmed clean — SSRF on the routing paths (provider/model validated against config; upstream URL never caller-derived), token-compare timing (constant-time for client and admin tokens; virtual keys via SHA-256 + map), /metrics label cardinality (unknown models are rejected before any metric, so labels stay config-bounded), secret-in-logs (no keys/tokens/bodies logged), SQL injection (fully parameterized), and auth-bypass. Fixes below close the few hardening gaps that review surfaced.

  • Request body size limit. The HTTP router now caps request bodies at 32 MiB (DefaultBodyLimit) — previously unbounded beyond axum’s 2 MiB default toggling, so a multi-gigabyte body could be buffered and exhaust memory (notably under auth.mode=none).
  • Constant-time token compare hardened. constant_time_eq is now #[inline(never)] and runs its result through std::hint::black_box, so the optimizer can’t fold the accumulation loop into an early-exit branch and reintroduce a timing signal (no new dependency).
  • Documented the two to_vec re-serialization sites as the invariants they are (built from already-valid JSON), and corrected a stale UsageTap doc comment that referenced a nonexistent carry buffer.
  • Added an ad-hoc-route SSRF regression test (unknown provider/model → 404, mismatched provider → 400, both before any upstream call). 262 tests total.
  • Provider health: in config.yaml now takes effect. The deployment-side ProviderDeploy had no health field, so a health: block under a provider in config.yaml (exactly as the shipped example documents it) was silently dropped at parse time and resolve() only used the catalog’s providers.yaml health — meaning active/dead health probing never spawned for config-defined health. ProviderDeploy now carries health, and resolve() merges it deployment-wins-over-catalog (mirroring path/auth). + regression test.

Second RC for final testing — fixes from the first 0.17.0 testing pass.

  • Dead-mode health probing now recovers soft-cooldown lanes. A sub-threshold transient leaves the breaker Closed but arms a cooldown; the prober gate only fired for fully-tripped (Open) cells, so a single 5xx benched a single-member route for the full ~30s cooldown with no active recovery. The gate is now “breaker-suppressed in any cell” (Open/HalfOpen or a pending cooldown), and a successful probe clears the soft cooldown too.
  • Cross-protocol reasoning is preserved (OpenAI → Anthropic). A model’s reasoning_content (chain-of-thought) now maps to a thinking block instead of being dropped — both non-streaming (a leading thinking block) and streaming (a thinking block at index 0, with text/tools shifted after it). Non-reasoning responses are unchanged.
  • --help / --version and startup errors no longer panic before argument handling: those flags print and exit without touching the filesystem, an unknown flag is a clean usage error, and every misconfiguration (missing/invalid providers.yaml or config.yaml, bad env interpolation, unknown provider/protocol, pool→unknown-model, invalid on_exhausted, bind failure) prints a clean [error] … instead of a backtrace.
  • +7 unit tests (now 261): soft-cooldown recovery, reasoning translation (stream + non-stream), malformed-Authorization safety, config parsing, JSON-scanner underflow safety, stable affinity hash.

Release candidate for final testing ahead of 1.0. Outcome of a systematic review of the full source for correctness, robustness, and security.

  • Panics removed on hostile input: a malformed Authorization header could panic on a UTF-8 boundary; a closing brace before an opening one in an upstream body could underflow the JSON brace scanner; an API key with a control character could panic the worker. All now fail cleanly.
  • Circuit-breaker error-rate trip now uses windowed errors vs windowed total (both from the sliding window) — a long-running lane no longer spuriously trips on clean recent traffic once old errors age out.
  • SWRR weight updates are serialized — concurrent selections could corrupt the algorithm’s invariant and bias distribution.
  • Cooldown jitter applies its sign (±) instead of only ever lengthening cooldowns.
  • Session affinity uses a stable hash, so sticky routing survives a restart (was a randomly seeded hasher).
  • Passthrough auth now forwards the caller’s bearer token (handlers previously dropped it, silently falling back to the lane’s static key).
  • Degraded routing (least-bad / fallback-pool) now applies cross-protocol translation, so it is correct when the chosen lane speaks a different protocol.
  • Anthropic tool role messages map to the user role (no nonexistent tool_use role → 422); bedrock parse-error signal typo (ir-parseir_parse); token-count i64 saturation.
  • Per-key rate-limit map evicts stale windows (was an unbounded per-key memory leak).
  • /admin usage requests no longer double-counts non-streaming cross-protocol responses.
  • /stats inflight is derived from the semaphore (was always 0).
  • Logging: a stderr tracing subscriber is always installed (level from RUST_LOG); OTLP export composes on top when configured. Previously all spans/warnings were dropped unless OTLP was set. Operational warnings moved from eprintln! to structured tracing.
  • Quality: named the magic numbers/strings (auth modes, breaker states, failover/timeout/ probe/rate-window/price/window-capacity defaults, Anthropic API version); the outcome window is a VecDeque (O(1) eviction); scrubbed internal references from comments; Cargo.toml reports the real version. One unconditional dead-code allow remains (a RAII guard).
  • Admin-token comparison is now constant-time. The /admin management API compared the configured admin token with ==, a timing side channel that could let an attacker recover the token byte-by-byte. It now uses the same constant-time comparison as client tokens.
  • Virtual-key generation fails closed. If the OS CSPRNG (getrandom) is unavailable, busbar now refuses to mint a key instead of falling back to a predictable, time-derived secret. (CSPRNG failure is near-impossible on supported platforms; the failure aborts only the key-mint request.)
  • Security review found no other issues: virtual keys are SHA-256 hashed and never stored/compared raw; the admin API is token-gated and disabled when no admin token is set; key listings never expose hashes; no secrets are logged; cross-protocol JSON parsing has no caller-triggered panics; ad-hoc routes only reach configured (provider, model) pairs (no SSRF). /healthz and /metrics are intentionally open (protect /metrics at the network layer).
    • Correction (superseded): the claim that /metrics is intentionally open no longer holds. /metrics now goes through the same auth check as any other route — only /healthz stays unauthenticated for liveness probes — though under none/passthrough mode the check still admits unconditionally. See the Unreleased Security entry above and src/auth.rs (auth_middleware). The original line is kept as-written to preserve the historical record.
  • error_map can now match a provider’s structured error type, not just its numeric code. Stage 1b checks raw.structured_type against error_map as a second data-driven signal (the explicit code still wins) — useful for providers that surface a typed error.type but no code. (Previously structured_type was extracted by every protocol but never consulted.)
  • /stats now reports each lane’s client_fault counter alongside ok/err.
  • Dead-code cleanup: removed vestigial scaffolding (SseCarryBuffer and its test, COOLDOWN_BASE_SECS, an unused FirstByteBody::usage and GovState::store accessor) and resolved nearly every #[allow(dead_code)] — the remaining suppressions are one RAII permit guard plus test-only API gated behind cfg(test) / cfg_attr(not(test)). No behavior change from this part.
  • Per-(pool, lane) circuit-breaker isolation. A lane shared by multiple pools now carries independent breaker state (Open/Closed/HalfOpen, streak, cooldown, error window, SWRR weight) per pool, so one pool’s traffic tripping a lane no longer benches it for every other pool. Direct/ad-hoc routes and /stats use a lane-default cell; named pools each get their own, created lazily and inheriting the lane’s current known health on first use. The breaker FSM is now written once over a BreakerCellAccess seam and run against either cell — no logic duplication. Lane-global concerns (the concurrency semaphore and the max_requests lifetime budget) remain shared across pools, since they cap the one upstream.
  • Active health probing now recovers a lane across every cell (all pools + default) on a successful probe, and gates dead-mode probing on “tripped in any cell” — a probe tests the shared upstream, so its result is lane-global.
  • This supersedes the 0.15.0 note that deferred per-(pool, lane) state.
  • Breaker recovery was broken — a tripped lane never came back. On cooldown expiry the lane went HalfOpen and admitted a single probe; the probe’s success reset the streak but never transitioned the breaker out of HalfOpen (closed_state was only ever called from tests), so probe_in_flight stayed set and every later usable() returned false. Any lane that ever tripped became permanently dead after one request. record_success now completes the recovery (→ Closed, cooldown cleared, probe released) when it sees a HalfOpen lane.
  • Active health checks are now live. A provider’s health: block has a mode: none (default — passive health only), dead (periodically re-probe only tripped lanes so a recovered upstream is picked back up promptly), or active (probe every lane so a silently-dead upstream trips before real traffic hits it). Probes are a one-token request built by the lane’s protocol writer (probe_body), so all six protocols work with no per-protocol code; interval_secs/timeout_secs are honored. One background task per probing lane; lanes with no key are skipped.
  • Per-pool circuit-breaker config is now live. A pool’s breaker: block (trip.mode error_rate|consecutive, trip.window_s/threshold/min_requests/n, base_cooldown_secs/max_cooldown_secs) is resolved at startup and drives the trip decision via should_trip — previously the block was parsed but ignored and the breaker used a hardcoded err >= 5 rule. Streak ownership moved to the record path (incremented once per failure, reset on success) so consecutive-mode trips and cooldown escalation are coherent. Example added to config.yaml (pool sensitive).
  • failover.exclusions are enforced — members named there are removed from a pool’s candidate set (never selected, primary or failover).
  • Pool affinity.header_name is honored — the session-pinning header is now configurable per pool (defaults to x-session-id).
  • Breaker state remains per-lane (not per-(pool,lane)). This is correct for the common case and for upstream-driven signals (a 401/429 is a property of the upstream, shared across pools). Full per-(pool,lane) state isolation — where one shared lane carries independent Open/Closed status per pool — was deferred: it would require threading a pool key through the StateStore trait and its 77 constructor sites, and only differs when one lane is shared by multiple pools with different breaker configs.
  • Cohere v2 protocol (/v2/chat) — the 6th wire protocol (Reader + Writer, request/response/streaming, bearer auth). System prompts are canonicalized into the IR so they survive cross-protocol translation.
  • Azure OpenAI auth adapter — a per-provider auth: api-key style that sends the api-key header instead of bearer (deployment + ?api-version= ride the existing path override). No new dependency; same sign_request seam as Bedrock SigV4. Template shipped in providers.yaml.
  • docs/roadmap.md — the protocols-not-providers thesis and auth-adapter design.
  • Cross-protocol pool responses now preserve the upstream model field (added to the IR), matching direct routes — a pool landing on a cross-protocol member no longer returns a model-less body.
  • Token accounting on the buffered cross-protocol (non-streaming) path: usage is now tapped and charged to the virtual key, so TPM limits enforce (previously per-key tokens stayed 0).
  • max_requests lifetime cap is now enforced — the success path records the lane success and decrements the budget (spend_budget previously never decremented), and the per-lane ok counter increments on success (was always 0; also fixed a latent double-count in record_success).
  • This changelog was previously stale; entries before 0.14.0 are not yet backfilled (tracked for the 1.0 documentation pass).
  • Project scaffolding for open-source release: README, CONTRIBUTING, SECURITY, issue/PR templates, and CI workflow.
  • Licensed the project under AGPL-3.0-or-later (previously MIT) — the AGPL’s network-use clause is the appropriate copyleft for a gateway run as a service.