One key. Every model.
Drop ModelMeld between your coding tools and your LLM backends. Cheap requests run on a fast local model; hard ones go to the frontier APIs you already pay for. Your keys stay on your machine.
Three steps to a smarter, cheaper inference path.
Point your tool at localhost
Claude Code, Cursor, Aider, anything OpenAI-compatible — base URL stays the same. No code changes.
OPENAI_BASE_URL=http://localhost:8000/v1
ModelMeld routes each request
Simple prompts (autocomplete, type hints, docstring adds) go to a fast local model. Hard prompts (debugging, architecture) go to the frontier API you already pay for.
route: scout → qwen-7b (local) route: scout → claude-sonnet-4-6
Right answer, less money
Same model name on the wire. Same answer quality on the prompts that matter. A line item that's a fraction of pure-frontier.
$ /usr/bin/finops yesterday cloud spend $0.27 local spend $0.00
Your frontier API keys are stored locally and never leave your machine. The hosted local model is the only path that touches our infrastructure.
Real numbers from real coding-tool traffic.
Benchmarks run on the open-source gateway. No CRMs, no logos, no fluff — when a number lands here it has a reproducible test behind it.
Switch from local to frontier mid-conversation, no context loss. Validated end-to-end.
Speaks Anthropic Messages for Claude Code and OpenAI Chat Completions for Cursor, Aider, Continue, Cline. No client code changes.
Routing validated across Python, JavaScript, TypeScript, Go, Rust, and Java dev-tool prompts. The scout works on prompt shape, not language.
Built into the product.
Your API keys never leave your machine.
The local installer holds them; routing decisions happen in your process. The frontier provider sees a call from your machine, not ours.
Open-source core.
ModelMeld is AGPL-3.0 licensed. No lock-in, fork-friendly, contribute-friendly — calling the gateway over HTTP from your tools doesn't make them AGPL. The whole routing engine is public.
We store the minimum we can.
- Always: request metadata (model, latency, tokens) plus a SHA-256 prompt hash — never the prompt body itself.
- Only when you enable them: completion cache and tiered memory persist prompt-equivalent content — cache for repeats, memory for cross-model continuity. Both are off by default in OSS and Hosted. Tenant-scoped when on, you control retention.
Works with the tools you already use.
Two native API surfaces — Anthropic Messages and OpenAI Chat Completions — so Claude Code, Cursor, Aider, Continue, Cline, and any OpenAI or Anthropic SDK drop in unchanged. No SDK swaps, no client code rewrites.
Try it. Twenty bucks gets you a long way.
Self-host the gateway in five minutes, or skip the GPU and use our hosted pods. Either way you can be making routed requests in the time it takes to read this paragraph.