halvr

Halvr

cost routing

One line of code. Lower spend.

Cut your LLM bill in half.

Drop in a new base URL. Halvr caches repeated context, routes to the cheapest capable model, and exposes where every token dollar goes.

Tokens optimized today48,291,827

Avg paid customer savings: 54.2%

Live operator panel

Routing decisions in motion

stream: true
tok_29A cost↓ cache↑ route→ save
openai gpt-4o-mini hit 67% 234ms
claude-haiku selected spend -54%

Saved this week

$12,482

Cache hit rate

67%

Avg latency

234ms

Requests routed

148,201

Integrates cleanly with the AI stack teams already run

Average 54% cost reduction across paid customers

LangChain
LlamaIndex
Vercel AI SDK
OpenAI SDK
DSPy

How it works

A proxy layer built to squeeze waste out of inference spend.

Halvr keeps the migration small and the operational visibility deep.

01

Swap your base URL

Keep your provider SDK. Change the upstream endpoint to api.halvr.io and keep shipping.

02

Halvr intercepts every request

We hash context, check Redis, and route to the cheapest capable model before traffic leaves your app.

03

Your dashboard shows the delta

Track savings, latency, hit rate, provider mix, and request-level decisions without stitching together logs.

One line migration

Keep your SDK, auth, and request shape.

diff
// before
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// after
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.halvr.io/v1"
});

Savings calculator

Model the spend reduction before you touch production traffic.

Use rough request volume and context size to estimate what routing and caching can remove from your monthly bill.

250,000
6,500

Current spend / mo

$292,500.00

With Halvr / mo

$134,550.00

Savings / mo

$157,950.00

Breakdown

Cache savings$90,675.00
Routing savings$67,275.00
Model selection$23,400.00

Pricing

Start small. Then move routing policy and billing control into production.

All plans include a 14-day trial and the same core cost instrumentation.

Free

$0/ month

For small projects validating whether model routing changes the margin profile.

100K tokens optimized / mo
1 project
Core dashboard

Pro

$299/ month

For product teams with live AI traffic that need savings, routing control, and provider coverage.

50M tokens optimized / mo
5 projects
All providers
Routing controls

Scale

$999/ month

For larger platforms that need SLAs, custom routing policy, and high-volume support.

Unlimited optimized tokens
Priority routing
SLA
Usage-based billing