Access models from Claude, GPT, Gemini, Grok, Groq, Cerebras, and more through a single OpenAI-compatible endpoint. Response caching, smart routing, usage analytics, and rate limiting -- all built in. No markup on any model except Mulu's own.
Stop juggling API keys across Anthropic, OpenAI, Google, xAI, Groq, Cerebras, and more. One endpoint, one key, one bill.
Claude Opus & Sonnet, GPT-5.4, Gemini 3.1 Pro, Grok 4.2, Llama 4, Qwen, Kimi K2.5, MiniMax M2.7, plus image & video models. All accessible through a single OpenAI-compatible endpoint.
Identical prompts return cached responses instantly instead of hitting the model again. Slash your costs on repetitive queries without changing a single line of code.
Per-user, per-IP, or per-key rate limits. Prevent abuse, stay within provider quotas, and guarantee service for your real users -- all configured automatically.
Track token usage, costs, latency, and error rates per model and per user. See which models your app uses most and where you can optimise spending.
Already using the OpenAI SDK? Just swap your base URL and API key. The Mulu Gateway speaks the exact same API format -- chat completions, streaming, function calling, vision, and tool use all work out of the box. Switch between Claude, GPT, Gemini, and Grok with a single parameter change.
When the same prompt hits the gateway twice, the cached response is returned instantly -- zero tokens consumed, zero latency. Perfect for classification tasks, embeddings, and repeated queries.
A full analytics dashboard shows token usage, costs, and performance across every model. See which models your app relies on and set spend limits before you get a surprise bill.
Claude, GPT, Gemini, Grok, Llama, Groq, Cerebras, Qwen, Kimi, MiniMax, and more -- all routed through a single edge-deployed proxy.
Claude Opus 4.6 (1M ctx), Claude Sonnet 4.6 (1M ctx), Claude Haiku 4.5 -- full thinking & vision support.
ClaudeGPT-5.4 (1M ctx), GPT-5.3 Codex (400K ctx), GPT Image 1 Mini, Sora 2 & Sora 2 Pro -- text, image, and video generation.
GPTGemini 3 Flash (1M ctx), Gemini 3.1 Pro (1M ctx) with Deep Think, plus Nano Banana 2 & Nano Banana Pro for image generation.
GeminiGrok 4.2 (2M ctx) with reasoning toggle and Grok 4.2 Agents for multi-agent workloads.
GrokLlama 4 Scout (10M ctx) and Llama 4 Maverick (1M ctx) -- natively multimodal, open-weight MoE models.
LlamaQwen 3.5 Plus, Qwen3-235B, Qwen3-Coder-480B, Llama 4 Scout (10M ctx), Llama 4 Maverick, Kimi K2.5, MiniMax M2.7, MiMo v2 Pro, Mulu Agent 1 Flash, Mulu Agent 1 Pro, and more via OpenRouter.
OpenRouterResponse caching on Cloudflare Workers. Identical prompts return cached results instantly with zero token cost.
CachingTop up your wallet and pay only for what you use. Track spending in real-time across every model and provider.
Already using the OpenAI SDK? Just change the base URL. Switch models with one parameter.
import OpenAI from 'openai';
// Point to Mulu Gateway instead of OpenAI directly
const client = new OpenAI({
baseURL: 'https://gateway.mulucode.dev/v1',
apiKey: 'mulu_sk_...', // One key for all models
});
// Use Claude
const claude = await client.chat.completions.create({
model: 'claude-sonnet-4-6', // Or claude-opus-4-6, claude-haiku-4-5
messages: [{ role: 'user', content: 'Explain caching' }],
});
// Switch to GPT with one line change
const gpt = await client.chat.completions.create({
model: 'gpt-5.4', // Or gemini-3.1-pro, grok-4.2, qwen-3.5-plus
messages: [{ role: 'user', content: 'Explain caching' }],
cache: true, // Enable response caching (free, saves tokens)
});
No markup on any model except Mulu's own. No per-request fees. The gateway is included free with your $1/mo Mulu Code subscription.
Claude, GPT, Gemini, Grok, and more -- billed at exact provider pricing. Mulu models have a small markup. Caching, analytics, and routing included free.