halvr

Halvr

cost routing

One line of code. Lower spend.

Cut your LLM bill in half.

Drop in a new base URL. Halvr caches repeated context, routes to the cheapest capable model, and exposes where every token dollar goes.

Create account See how it works

Tokens optimized today48,291,827

Avg paid customer savings: 54.2%

Live operator panel

Routing decisions in motion

stream: true

tok_29A cost↓ cache↑ route→ save

openai gpt-4o-mini hit 67% 234ms

claude-haiku selected spend -54%

Saved this week

$12,482

Cache hit rate

67%

Avg latency

234ms

Requests routed

148,201

Integrates cleanly with the model stack teams already run

Average 54% cost reduction across paid customers

LangChain

LlamaIndex

Vercel AI SDK

OpenAI SDK

DSPy

How it works

A proxy layer built to squeeze waste out of inference spend.

Halvr keeps the migration small and the operational visibility deep.

Swap your base URL

Keep your provider SDK. Change the upstream endpoint to api.halvr.io and keep shipping.

Halvr intercepts every request

We hash context, check Redis, and route to the cheapest capable model before traffic leaves your app.

Your dashboard shows the delta

Track savings, latency, hit rate, provider mix, and request-level decisions without stitching together logs.

One line migration

Keep your SDK, auth, and request shape.

diff

// before
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// after
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.halvr.io/v1"
});

Savings calculator

Model the spend reduction before you touch production traffic.

Use rough request volume and context size to estimate what routing and caching can remove from your monthly bill.

API calls per day250,000

Average tokens per request6,500

Current spend / mo

$292,500.00

With Halvr / mo

$134,550.00

Savings / mo

$157,950.00

Breakdown

Cache savings$90,675.00

Routing savings$67,275.00

Model selection$23,400.00

Pricing

Straightforward pricing for teams routing real model traffic.

Start free while you wire the proxy. Upgrade when you need more volume, routing control, and tighter operational guarantees.

Free

$0/ month

For teams wiring Halvr into production for the first time and validating savings before rolling out usage.

2M tokens optimized / mo

1 project

Core request analytics

Email support

Start free

Pro

$29/ month

For product teams with live model traffic that need savings, routing control, and provider coverage.

50M tokens optimized / mo

5 projects

All providers

Routing controls

Create account

Scale

$199/ month

For larger platforms that need SLAs, custom routing policy, and high-volume support.

Unlimited optimized tokens

Priority routing

SLA

Usage-based billing

Talk to us