Open-source AI gateway

One gateway. Every coding tool.

Drop ModelMeld between your coding tools and your LLM backends. Route simple requests to fast OSS models, hard ones to your existing ChatGPT subscription or the frontier API keys you already pay for. Predictable cost ceilings, no key custody.

Get started View on GitHub

AGPL-3.0 core OpenAI-compatible No key custody

How it works

Three steps to multi-model routing across your dev tools.

Point your tool at localhost

Claude Code, Cursor, Aider, anything OpenAI-compatible — base URL stays the same. No code changes.

OPENAI_BASE_URL=http://localhost:8080/v1

ModelMeld routes each request

Simple prompts (autocomplete, type hints, docstring adds) route to OSS-tier models. Hard prompts (debugging, architecture) route to your existing ChatGPT subscription via OAuth passthrough, or to a frontier API key you already pay for.

route: scout → qwen3-coder-next
route: scout → claude-sonnet-4-6 (your Anthropic key)

Auditable routing on every response

Same model name on the wire. Every response carries x-modelmeld-* headers showing which model actually served the request, which tier it ran on, and the task category the scout matched.

x-modelmeld-routed-model: qwen3-coder-flash
x-modelmeld-tier: oss
x-modelmeld-task-category: coding

Self-host the gateway and your frontier keys never leave your machine. Use Hosted with BYOK and your keys transit per-request only — used to make the upstream call, then forgotten. Never stored at rest, never logged.

What's validated

Validated end-to-end on real coding-tool traffic.

Built and tested against the dev tools you actually use.

Prompt cache hits work end-to-end

Anthropic `cache_control` breakpoints forwarded verbatim via native-shape passthrough on /v1/messages. We don't strip them — cached prompts hit cached, the way Anthropic intended.

Drop-in for the tools you use

Speaks Anthropic Messages for Claude Code and OpenAI Chat Completions for Cursor, Aider, Continue, Cline. No client code changes.

Tool-capable routing

When your request carries tools, the scout filters out models that can't reliably do function-calling — your agents don't get broken responses from a smaller model picked by mistake.

Deterministic, auditable routing

Every response carries headers showing the routed model, tier, task category, and quality threshold. Same request + same registry = same model. Audit log captures every choice.

Pricing

Capability-based pricing. OSS rates by default.

Frontier-bound requests use your own subscription or API key directly — no markup, no custody. OSS-tier requests route to compute-cost models when you self-host, or to our Hosted fleet when it opens. Self-hosting is always free.

OSS

FreeAGPL-3.0

Self-host ModelMeld. Bring your own frontier keys and your own GPU. Everything works on a snapshot of the routing data; the live curated feed is the upgrade.

Self-host ModelMeld
Bring your own frontier API keys
Bring your own local GPU
Bundled seed routing data (stale but functional)

See details

Invite-only beta

Hosted

Coming soon

Pay-as-you-go across our routed model catalog. No DevOps, no GPU procurement, no key custody. Multi-model routing happens server-side; you point your dev tool at one base URL and we pick the right model per request. Currently in invite-only beta — the OSS engine is the path while we open it up.

Multi-model routing across our hosted catalog
BYOK passthrough — frontier keys transit per-request, never persisted
Subscription passthrough — point your ChatGPT OAuth at us
Pro routing feed bundled while you're an active customer

See details

Pro

Coming soon

Live curated routing feed for self-hosters who don't need our hosted compute. Routes against fresh benchmark data weekly. The OSS engine ships with a frozen snapshot today; Pro unlocks the auto-refreshed feed when it launches.

Continuously curated routing feed
Signed JSON delivery (Ed25519)
Falls back to bundled snapshot on any failure
Same OSS engine — self-host on your hardware

See details

Enterprise

Custom

Your hardware, your cloud, or hosted by us — pricing flexes to the deployment. SSO + RBAC, audit logs, custom SLAs, direct engineering support.

Deploy on your hardware, your cloud, or ours
SSO (OIDC) + RBAC
SOC 2-grade immutable audit log
Custom SLAs

See details

Full pricing breakdown

Trust

Built into the product.

Your frontier keys, your custody.

Self-host the OSS gateway and your keys never leave your machine — routing happens in your process. The same holds for subscription passthrough: your ChatGPT OAuth bearer transits per-request only, used to make the upstream call, then forgotten. Hosted with BYOK works the same way via the x-modelmeld-byok-* header. Never stored at rest, never logged. No custody, no per-request markup.

Open-source core.

ModelMeld is AGPL-3.0 licensed. No lock-in, fork-friendly, contribute-friendly — calling the gateway over HTTP from your tools doesn't make them AGPL. The whole routing engine is public.

We store the minimum we can.

Always: request metadata (model, latency, tokens) plus a SHA-256 prompt hash — never the prompt body itself.
Only when you enable them: completion cache and tiered memory persist prompt-equivalent content — cache for repeats, memory for cross-model continuity. Both are off by default in OSS and Hosted. Tenant-scoped when on; retention follows the policy in /privacy.

Works with the tools you already use.

Two native API surfaces — Anthropic Messages and OpenAI Chat Completions — so Claude Code, Cursor, Aider, Continue, Cline, and any OpenAI or Anthropic SDK drop in unchanged. No SDK swaps, no client code rewrites.

The OSS engine works today.

Self-host on your own GPU and route from the moment pip install finishes. Bring your own frontier API keys, or point your ChatGPT subscription at the gateway. Walk through it step-by-step on the Get Started page.

Get started View on GitHub