Routing Policies
Every Busbar pool has a routing policy. The default — weighted smooth round-robin (SWRR) — costs nothing and is the right choice for most pools. When you need a different selection strategy, you swap in a policy with one field: route:. Everything else in Busbar — the circuit breaker, failover loop, concurrency semaphore, session affinity — is unchanged. The policy only determines the order in which healthy candidates are tried.
Routing is the first of Busbar’s Hooks — a programmable request path. At a small number of points in the request lifecycle, an operator-supplied policy inspects the request and returns a decision, via one of three transports (native, webhook, or Rhai script). Routing is hook #1: it decides selection order. The same transport machinery — timeout, fallback, transparency header — carries every hook, so a broken or slow policy never blocks or fails a request.
Cross-references: Configuration (full field reference) · Reliability & Failover (breaker, failover, and exhaustion behavior).
Table of contents
Section titled “Table of contents”- The model: policy as ranked preference
- How policies compose with the breaker and failover
- Native policies
- The routing signals
- External policies
- Configuration reference
- Full examples
- Observability
The model: policy as ranked preference
Section titled “The model: policy as ranked preference”A routing policy does one thing: given the current request and the set of healthy candidates, it returns an ordered preference list. Busbar’s existing failover loop walks that list — trying the first candidate, then the second if the first fails, and so on — using the circuit breaker at every step to skip any lane that is tripped, at capacity, or already tried this request.
The design consequence: a policy ranks; the breaker decides health. A policy cannot resurrect a tripped lane, and a policy that omits a healthy lane does not strand it — omitted candidates are appended to the end of the preference list, not excluded. A broken policy (timeout, error, empty response) falls back to weighted SWRR rather than failing the request.
Every pool has exactly one route: value. weighted is both the default and the zero-overhead case: pools without a route: field run today’s unchanged SWRR code, with no projection built and no policy object constructed. Adding a policy to a pool adds no overhead to pools that do not use one.
How policies compose with the breaker and failover
Section titled “How policies compose with the breaker and failover”The sequence for every request routed to a non-weighted pool:
- Policy runs once, before the failover loop. It receives the healthy candidate set (as a projected read-only view) and returns a ranked list of candidate indices.
- The failover loop walks the ranked list. It tries candidates in the policy’s preferred order, skipping any that are tripped, at capacity, or already tried (
request_ctx.excluded). - Candidates not in the ranked list are tried last. If the policy emits a subset of candidates, the omitted ones are appended after the ranked set in an unspecified order. They are reachable — a policy can never permanently exclude a healthy lane.
- On policy failure, SWRR takes over. A timeout, error, or
abstainresponse causes Busbar to fall back to the pool’son_errorbehavior (default: SWRR), as if no policy were configured.
This composition means a policy’s job is deliberately narrow. You declare a preference; Busbar’s existing reliability machinery handles the rest.
request → policy decides ranked order (once) → failover loop walks ranked order → breaker filter (unchanged: health, exclusion, half-open probe) → dispatch → first-byte boundaryNative policies
Section titled “Native policies”Native policies are compiled into Busbar and have no runtime dependencies. They are sync, never do I/O, and sort the candidate set by a single live signal. Configure them with route: native and a policy.name.
weighted (default)
Section titled “weighted (default)”policy.name: weighted
The default selection strategy when no route: is set. Uses Nginx-style smooth weighted round-robin (SWRR) across healthy members, proportional to each member’s weight field. Setting route: native and policy.name: weighted gives byte-identical behavior to omitting route: entirely — it always abstains, letting the inline SWRR path handle selection.
Use weighted explicitly only when you want to name the policy in config for documentation clarity. There is no behavioral difference from the absent-route: default.
cheapest
Section titled “cheapest”policy.name: cheapest
Prefers the member with the lowest operator-declared cost_per_mtok. Members without a cost_per_mtok value are demoted to the end of the preference list but are still reachable. If no candidate has a declared cost, the policy abstains and SWRR takes over.
Signal: cost_per_mtok on the pool member config. You declare the cost; Busbar ranks on it.
pools: cost-optimized: route: native policy: name: cheapest members: - target: claude-sonnet cost_per_mtok: 3.0 - target: gpt-4o cost_per_mtok: 5.0 - target: gpt-4o-mini cost_per_mtok: 0.15Traffic flows to gpt-4o-mini first, then gpt-4o, then claude-sonnet. If gpt-4o-mini is tripped, the breaker skips it and gpt-4o becomes the first attempt.
fastest
Section titled “fastest”policy.name: fastest
Prefers the member with the lowest measured round-trip latency, tracked as a rolling EWMA updated after each request. Members with no latency sample yet (new lanes, recently restarted) are demoted but reachable. If no candidate has latency data, the policy abstains.
Signal: rolling EWMA latency in milliseconds, accumulated from organic traffic. No configuration required.
This is a good choice when your members have meaningfully different tail latencies and you want Busbar to track and prefer the faster one automatically over time.
least_busy
Section titled “least_busy”policy.name: least_busy
Prefers the member with the most available concurrency headroom — the lane with the most free slots in its semaphore. Unlike fastest, least_busy always has data (concurrency is always known) and never abstains.
Signal: free concurrency permits on each lane’s semaphore at decision time.
Use this when your members have different max_concurrent limits or when you want to avoid piling requests onto an already-saturated backend before the breaker trips it.
policy.name: usage
Prefers the member with the most remaining rate-limit headroom (RPM/TPM headroom from the caller key’s governance counters). The rate_headroom signal is computed in src/forward.rs (rate_headroom_for_token) and passed to each Candidate before the routing decision. Candidates with no headroom signal (None, e.g. when governance is disabled or no rate limit is set) are demoted to last but remain reachable. Abstains only when every candidate lacks the signal (no rate limit in play), falling back to SWRR.
The routing signals
Section titled “The routing signals”Every policy — native and external — sees the same projection of each candidate:
| Signal | Field | Available now | Notes |
|---|---|---|---|
| Per-member cost | cost_per_mtok | Yes | Operator-declared in member config. None if not set. |
| Per-lane latency | latency_ms | Yes | Rolling EWMA in ms, updated per request. None until first request. |
| Live concurrency | available_concurrency | Yes | Free semaphore slots. Always populated. |
| Budget remaining | budget_remaining | Yes | Per-lane max_requests remaining. None = unlimited. |
| Rate headroom | — | Landing alongside | RPM/TPM headroom from upstream governance counters. Being wired by a parallel effort. |
External policies (webhook, script) also receive the request projection. The webhook wire format and the Rhai req map share most fields, with the script getting a few extras:
| Signal | Field | Webhook | Script (req) | Notes |
|---|---|---|---|---|
| Pool | pool | Yes | Yes | Pool name the request is being routed through. |
| Ingress protocol | ingress_protocol | Yes | Yes | One of: anthropic, openai, gemini, bedrock, cohere, responses. |
| Requested model | requested_model | — | Yes | Model the caller asked for; () if absent. Not in webhook payload. |
| Message count | message_count | Yes | Yes | Number of messages in the conversation. |
| Has tools | has_tools | Yes | Yes | Boolean; true when at least one tool is declared. |
| Tool count | tool_count | — | Yes | Number of tools declared. Not in webhook payload. |
| Request size | total_chars | Yes | Yes | Sum of all text chars across system + messages. v1 size signal — not a token count. Rule of thumb: ~4 chars/token. |
| System size | system_chars | — | Yes | System-prompt text chars only. Not in webhook payload. |
| Max tokens | max_tokens | Yes | Yes | Caller’s requested output token limit; null/() if unset. |
| Streaming | stream | Yes | Yes | Whether the request is a streaming call. |
Token counts are not available pre-dispatch. The upstream response carries token usage, but that comes after a lane is chosen. Use total_chars with the rule of thumb (~4 chars/token) for size-based routing decisions in webhook/script policies.
External policies
Section titled “External policies”External policies let you run routing logic outside Busbar — in any language, with access to any data Busbar cannot see. Busbar sends a lightweight request projection to your policy endpoint and receives a ranked candidate list back. The same timeout, fallback, and safety machinery applies as for native policies: a slow or broken external policy never fails a request.
webhook
Section titled “webhook”route: webhook
Busbar POSTs a JSON payload to your operator-supplied sidecar URL before each request’s failover loop. The sidecar returns a ranked list of candidate indices. The URL is operator-config-only — never derived from a request header or body.
Timeout and fallback. The request is bounded by policy.timeout_ms (default 150 ms). On timeout, non-200 response, malformed JSON, or an at-capacity in-flight queue, Busbar applies on_error (default: weighted — fall back to SWRR). A broken sidecar is indistinguishable from no policy. Up to 64 concurrent policy calls are in flight at once; additional calls abstain immediately rather than queue.
URL security. The sidecar URL allows loopback (127.0.0.1, localhost) — routing sidecars are commonly co-located processes. RFC-1918, link-local, CGNAT, and cloud metadata endpoints (169.254.169.254, metadata.google.internal, etc.) are blocked regardless.
Request payload (Busbar → sidecar):
{ "request": { "pool": "smart-webhook", "ingress_protocol": "anthropic", "message_count": 12, "has_tools": true, "total_chars": 41200, "max_tokens": 8192, "stream": true }, "candidates": [ { "idx": 0, "model": "claude-opus", "tier": "large", "cost_per_mtok": 15.0, "latency_ms": 320.5, "available_concurrency": 14, "budget_remaining": null }, { "idx": 1, "model": "claude-sonnet", "tier": "small", "cost_per_mtok": 3.0, "latency_ms": 95.2, "available_concurrency": 18, "budget_remaining": 5000 } ], "context": { "pool": "smart-webhook", "budget_remaining": null }}Field notes:
request.pool— the pool name the request is being routed through.request.ingress_protocol— one ofanthropic,openai,gemini,bedrock,cohere,responses.request.max_tokens—nullif the caller did not set an output-token limit.candidates[*].tier—nullif not set on the member config.candidates[*].cost_per_mtok—nullif not declared on the member.candidates[*].latency_ms—nulluntil the lane has served at least one request.candidates[*].budget_remaining—null= unlimited (max_requests: -1).context.budget_remaining— per-key governance budget,nullwhen governance is disabled or not yet plumbed (v1 default).
The payload contains only the request projection — no prompt text, no message bodies. Busbar never sends request content to any external sink.
Response payload (sidecar → Busbar):
Ranked preference — most preferred first:
{ "order": [1, 0] }Or abstain (fall back to on_error):
{ "abstain": true }Rules:
orderis the only ranking key. Unknownidxvalues are dropped; duplicates are deduplicated preserving first-seen order.- Omitted candidates are demoted, not excluded — the failover loop can still reach them after the ranked set is exhausted.
- An absent or empty
order(including a bare{}) is treated as abstain. abstain: trueexplicitly signals no preference; theorderfield is ignored if present alongside it.- Any non-2xx response, malformed JSON, or timeout applies
on_error(same as abstain for the defaulton_error: weighted).
Transparency. Every response with a non-default routing policy carries two headers: x-busbar-route-policy (the policy name) and x-busbar-route-target (the chosen lane model) — for example x-busbar-route-policy: webhook and x-busbar-route-target: claude-sonnet.
script (Rhai)
Section titled “script (Rhai)”route: script
Feature-flagged. The script transport is compiled in behind the
script-policycargo feature and is disabled in the default binary. Enable it only if you need in-process scripting and have reviewed the sandbox configuration.
An embedded Rhai script, compiled at startup and evaluated per request on a sandboxed thread pool. Rhai is a pure-Rust scripting language with no network or filesystem access by default, bounded by operation count and size limits.
Script environment. The script receives three injected variables:
-
req— a map with the request projection fields:pool(string),ingress_protocol(string),requested_model(string or())message_count(int),tool_count(int),has_tools(bool)total_chars(int),system_chars(int)max_tokens(int or()if unset),stream(bool)
-
candidates— an array of maps, one per candidate, each with:idx(int),model(string),provider(string)weight(int),context_max(int or()if unset)tier(string or()if unset)cost_per_mtok(float or()if unset),latency_ms(float or()until first request)available_concurrency(int),budget_remaining(int or()= unlimited)tags(array of strings)
-
ctx— a map with routing context:pool(string),budget_remaining(int or()when not plumbed)
Optional/absent values are () (Rhai unit), not null. Test with == () or default with ??.
Return value. Return an array of integer candidate idx values, most preferred first. Returning (), an empty array, or any non-array value is treated as abstain. Unknown or duplicate idx values are dropped. An all-unknown array collapses to abstain.
// Route large requests to a capable model, smaller ones to a cheaper one.// ~4 chars/token rule of thumb; 24000 chars ≈ 6k tokens.let big = req.total_chars > 24000 || req.max_tokens > 4096;let preferred = if big { candidates.filter(|c| c.tier == "large")} else { candidates.filter(|c| c.tier == "small")};preferred.map(|c| c.idx)Sandbox limits. The script engine is locked down with:
- Max operations: 250,000 (a runaway
while true {}terminates at the cap, not hangs) - Max call/expression depth: 32
- Max string size: 8 KB
- Max array length: 4,096 entries
- Max map entries: 1,024
- No module resolver —
importstatements always fail - No file, network, or process host functions registered
A script that exceeds any limit errors and the result is coerced via on_error. The script evaluates synchronously against a shared pre-compiled AST (compiled once at config load, not per request).
Scripts are operator-config only — never client-supplied.
Configuration reference
Section titled “Configuration reference”Pool-level fields
Section titled “Pool-level fields”| Field | Type | Default | Description |
|---|---|---|---|
route | string | weighted | Routing transport: weighted, native, webhook, or script. weighted (default / absent) runs SWRR with zero added cost. |
policy | object | none | Transport configuration block. Required for native, webhook, and script; inert for weighted. |
policy block fields
Section titled “policy block fields”| Field | Type | Default | Transport | Description |
|---|---|---|---|---|
name | string | none | native | Native policy name: weighted, cheapest, fastest, least_busy, or usage. Required when route: native. |
url | string | none | webhook | Operator sidecar URL. Loopback allowed; RFC-1918/CGNAT/metadata blocked. Required when route: webhook. |
timeout_ms | integer | 150 | webhook, script | Hard wall-clock deadline for the policy decision in milliseconds. On timeout, on_error applies. |
on_error | string | weighted | webhook, script | Fallback when the policy times out, errors, or abstains: weighted (SWRR), reject (503), or first (first member in config order). |
script | string | none | script | Inline Rhai source. Exactly one of script or script_file is required when route: script. |
script_file | string | none | script | Path to a Rhai script file. Alternative to inline script. |
Per-member routing metadata fields
Section titled “Per-member routing metadata fields”Added to each pool member. All optional; inert for pools with route: weighted.
| Field | Type | Default | Description |
|---|---|---|---|
tier | string | none | Operator-declared routing tier (e.g. "primary", "overflow", "large", "small"). Exposed to webhook and script policies as tier. |
cost_per_mtok | float | none | Operator-declared cost in currency units per million tokens. Drives cheapest and is exposed to webhook/script. |
tags | list | [] | Free-form labels (e.g. ["opus", "large-context"]). Exposed to webhook and script for tag-based selection. |
Startup validation
Section titled “Startup validation”| Rule | Condition |
|---|---|
route: native + no policy.name | Startup error |
route: native + unknown policy.name | Startup error (must be one of: weighted, cheapest, fastest, least_busy, usage) |
route: webhook + no policy.url | Startup error |
route: webhook + SSRF-blocked URL | Startup error (RFC-1918, CGNAT, link-local, metadata hosts blocked; loopback allowed) |
route: script + neither policy.script nor policy.script_file | Startup error |
route: script + both policy.script and policy.script_file | Startup error |
route: script without the script-policy feature flag | Startup error |
Full examples
Section titled “Full examples”Cost-optimized pool
Section titled “Cost-optimized pool”Route all traffic to the cheapest healthy member. Useful for background jobs or batch workloads where latency is not the primary concern.
pools: batch: route: native policy: name: cheapest failover: deadline_secs: 60 cap: 3 members: - target: gpt-4o-mini weight: 1 cost_per_mtok: 0.15 - target: claude-sonnet weight: 1 cost_per_mtok: 3.0 - target: gpt-4o weight: 1 cost_per_mtok: 5.0gpt-4o-mini is tried first. If it is tripped, the breaker skips it and claude-sonnet becomes the first attempt for this request. The failover loop and breaker are unchanged.
Latency-sensitive pool
Section titled “Latency-sensitive pool”Route to the fastest-responding member, measured over real traffic. New members start with no latency data and are tried last until they accumulate samples.
pools: realtime: route: native policy: name: fastest members: - target: claude-sonnet weight: 1 - target: gpt-4o weight: 1 - target: gemini-1.5-flash weight: 1Tier-based webhook policy
Section titled “Tier-based webhook policy”Route large requests to a capable model and smaller requests to a cheaper one, using a co-located HTTP sidecar.
pools: smart: route: webhook policy: url: "http://127.0.0.1:8731/route" timeout_ms: 150 on_error: weighted # broken sidecar falls back to SWRR, never fails failover: deadline_secs: 60 cap: 3 members: - target: claude-opus weight: 1 tier: large cost_per_mtok: 15.0 tags: ["opus"] - target: claude-sonnet weight: 1 tier: small cost_per_mtok: 3.0 tags: ["sonnet"]Your sidecar receives the request projection (including total_chars, max_tokens, and each candidate’s tier and cost_per_mtok) and returns {"order": [0, 1]} or {"order": [1, 0]} depending on request size.
Rhai script: route by request size
Section titled “Rhai script: route by request size”The same tier-based logic as above, without a sidecar, using an inline Rhai script.
pools: smart-script: route: script policy: on_error: weighted script: | let big = req.total_chars > 24000 || req.max_tokens > 4096; let preferred = if big { candidates.filter(|c| c.tier == "large") } else { candidates.filter(|c| c.tier == "small") }; preferred.map(|c| c.idx) members: - target: claude-opus tier: large tags: ["opus"] - target: claude-sonnet tier: small tags: ["sonnet"]The
script-policycargo feature must be enabled at compile time. The script runs sandboxed: no I/O, no network, operation-count bounded.
Observability
Section titled “Observability”Response headers. Every request with a non-default routing policy emits two headers (src/forward.rs constants HDR_ROUTE_POLICY and HDR_ROUTE_TARGET):
x-busbar-route-policy: <policy>— the policy name that made the decision (e.g.webhook,native:cheapest)x-busbar-route-target: <chosen-lane-model>— the model of the chosen lane (e.g.claude-sonnet,gpt-4o-mini)
| Example headers | Meaning |
|---|---|
x-busbar-route-policy: webhook / x-busbar-route-target: claude-sonnet | Webhook policy chose claude-sonnet |
x-busbar-route-policy: native:cheapest / x-busbar-route-target: gpt-4o-mini | Native cheapest policy chose gpt-4o-mini |
Prometheus metrics:
| Metric | Labels | Description |
|---|---|---|
busbar_route_decisions_total | pool, policy, outcome | Count of routing decisions. outcome is one of prefer, abstain, timeout, error, reject. |
busbar_route_decision_seconds | pool, policy | Histogram of policy decision latency (webhook/script only). |