Halvr
cost routing
Cut your LLM bill in half.
Drop in a new base URL. Halvr caches repeated context, routes to the cheapest capable model, and exposes where every token dollar goes.
Avg paid customer savings: 54.2%
Live operator panel
Routing decisions in motion
Saved this week
$12,482
Cache hit rate
67%
Avg latency
234ms
Requests routed
148,201
Integrates cleanly with the AI stack teams already run
Average 54% cost reduction across paid customers
How it works
A proxy layer built to squeeze waste out of inference spend.
Halvr keeps the migration small and the operational visibility deep.
Swap your base URL
Keep your provider SDK. Change the upstream endpoint to api.halvr.io and keep shipping.
Halvr intercepts every request
We hash context, check Redis, and route to the cheapest capable model before traffic leaves your app.
Your dashboard shows the delta
Track savings, latency, hit rate, provider mix, and request-level decisions without stitching together logs.
One line migration
Keep your SDK, auth, and request shape.
// before
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// after
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://api.halvr.io/v1"
});Savings calculator
Model the spend reduction before you touch production traffic.
Use rough request volume and context size to estimate what routing and caching can remove from your monthly bill.
Current spend / mo
With Halvr / mo
Savings / mo
Breakdown
Pricing
Start small. Then move routing policy and billing control into production.
All plans include a 14-day trial and the same core cost instrumentation.
Free
For small projects validating whether model routing changes the margin profile.
Pro
For product teams with live AI traffic that need savings, routing control, and provider coverage.
Scale
For larger platforms that need SLAs, custom routing policy, and high-volume support.