AI Agents Multi-Model AI Code Review Cloud Deploy Project Memory HQ Kanban Board API Gateway
Desktop App CLI Pricing
Documentation Blog Changelog Roadmap FAQ MuluBench
Our Mission Support Security Status Contact Our Promise Legal
GET MULU CODE

One API Key for
50+ AI Models

Access models from Claude, GPT, Gemini, Grok, Groq, Cerebras, and more through a single OpenAI-compatible endpoint. Response caching, smart routing, usage analytics, and rate limiting -- all built in. No markup on any model except Mulu's own.

Everything you need, nothing you don't

Stop juggling API keys across Anthropic, OpenAI, Google, xAI, Groq, Cerebras, and more. One endpoint, one key, one bill.

50+ Models, One Key

Claude Opus & Sonnet, GPT-5.4, Gemini 3.1 Pro, Grok 4.2, Llama 4, Qwen, Kimi K2.5, MiniMax M2.7, plus image & video models. All accessible through a single OpenAI-compatible endpoint.

Response Caching

Identical prompts return cached responses instantly instead of hitting the model again. Slash your costs on repetitive queries without changing a single line of code.

Smart Rate Limiting

Per-user, per-IP, or per-key rate limits. Prevent abuse, stay within provider quotas, and guarantee service for your real users -- all configured automatically.

Usage Analytics

Track token usage, costs, latency, and error rates per model and per user. See which models your app uses most and where you can optimise spending.

OpenAI-compatible, drop-in ready

Already using the OpenAI SDK? Just swap your base URL and API key. The Mulu Gateway speaks the exact same API format -- chat completions, streaming, function calling, vision, and tool use all work out of the box. Switch between Claude, GPT, Gemini, and Grok with a single parameter change.

  • Chat completions with streaming, function calling, and tool use
  • Vision & multimodal support across all capable models
  • Automatic retries with exponential backoff on provider errors
  • Fallback routing to backup models when a provider goes down
gateway-demo.js
const res = await client.chat.completions.create({
model: 'claude-sonnet-4-6', // Anthropic
model: 'gpt-5.4', // OpenAI
model: 'gemini-3.1-pro', // Google
model: 'grok-4.2', // xAI
messages: [{ role: 'user', content: prompt }],
});

Response caching that pays for itself

When the same prompt hits the gateway twice, the cached response is returned instantly -- zero tokens consumed, zero latency. Perfect for classification tasks, embeddings, and repeated queries.

  • Semantic caching for near-identical prompts
  • Configurable TTL -- cache for seconds, minutes, or hours
  • Cache hit rates visible in your analytics dashboard
  • Per-route cache rules -- cache some endpoints, skip others
Cache Performance Last 7 days
87%
Hit Rate
2.4M
Tokens Saved
~$8
Cost Saved
Cache hits: 14,280 Misses: 2,120

Know exactly where your tokens go

A full analytics dashboard shows token usage, costs, and performance across every model. See which models your app relies on and set spend limits before you get a surprise bill.

  • Per-model and per-user token usage breakdown
  • Cost tracking with daily, weekly, and monthly views
  • Latency percentiles (p50/p95/p99) per provider
  • Configurable spend alerts and hard caps
Token Usage by Model
Claude Sonnet 4.6482K tokens
GPT-5.4310K tokens
Gemini 3.1 Pro128K tokens
Grok 4.264K tokens

Every major provider, one gateway

Claude, GPT, Gemini, Grok, Llama, Groq, Cerebras, Qwen, Kimi, MiniMax, and more -- all routed through a single edge-deployed proxy.

Anthropic

Claude Opus 4.6 (1M ctx), Claude Sonnet 4.6 (1M ctx), Claude Haiku 4.5 -- full thinking & vision support.

Claude

OpenAI

GPT-5.4 (1M ctx), GPT-5.3 Codex (400K ctx), GPT Image 1 Mini, Sora 2 & Sora 2 Pro -- text, image, and video generation.

GPT

Google

Gemini 3 Flash (1M ctx), Gemini 3.1 Pro (1M ctx) with Deep Think, plus Nano Banana 2 & Nano Banana Pro for image generation.

Gemini

xAI

Grok 4.2 (2M ctx) with reasoning toggle and Grok 4.2 Agents for multi-agent workloads.

Grok

Meta

Llama 4 Scout (10M ctx) and Llama 4 Maverick (1M ctx) -- natively multimodal, open-weight MoE models.

Llama

Open Models

Qwen 3.5 Plus, Qwen3-235B, Qwen3-Coder-480B, Llama 4 Scout (10M ctx), Llama 4 Maverick, Kimi K2.5, MiniMax M2.7, MiMo v2 Pro, Mulu Agent 1 Flash, Mulu Agent 1 Pro, and more via OpenRouter.

OpenRouter

Edge Cache Layer

Response caching on Cloudflare Workers. Identical prompts return cached results instantly with zero token cost.

Caching

Wallet & Credit Balance

Top up your wallet and pay only for what you use. Track spending in real-time across every model and provider.

Wallet Balance
$--
Top up anytime
This Month
$--
Across all models
Free Tokens Left
1M
Included monthly

Built for production

Every feature you need to ship AI-powered apps confidently at scale.

Streaming (SSE)

Full Server-Sent Events streaming support across all models. Get tokens as they're generated -- no buffering, no delays.

API Key Management

Create, revoke, and rotate API keys from your dashboard. Set per-key spend limits and model restrictions for fine-grained control.

Request Logging

Every request is logged with latency, token count, model, and status. Search, filter, and export logs for debugging and compliance.

Edge Deployed

Runs on Cloudflare Workers at 300+ edge locations worldwide. Sub-50ms overhead regardless of where your users are.

Spend Alerts

Set daily, weekly, or monthly spend caps. Get notified before you hit limits so you're never caught off guard.

Model Fallback Chains

Define fallback sequences: if Claude is down, auto-route to GPT. If GPT fails, try Gemini. Zero downtime for your users.

Drop-in compatible

Already using the OpenAI SDK? Just change the base URL. Switch models with one parameter.

app.js
import OpenAI from 'openai';

// Point to Mulu Gateway instead of OpenAI directly
const client = new OpenAI({
  baseURL: 'https://gateway.mulucode.dev/v1',
  apiKey:  'mulu_sk_...',      // One key for all models
});

// Use Claude
const claude = await client.chat.completions.create({
  model: 'claude-sonnet-4-6',  // Or claude-opus-4-6, claude-haiku-4-5
  messages: [{ role: 'user', content: 'Explain caching' }],
});

// Switch to GPT with one line change
const gpt = await client.chat.completions.create({
  model: 'gpt-5.4',            // Or gemini-3.1-pro, grok-4.2, qwen-3.5-plus
  messages: [{ role: 'user', content: 'Explain caching' }],
  cache: true,                 // Enable response caching (free, saves tokens)
});

Transparent, simple pricing

No markup on any model except Mulu's own. No per-request fees. The gateway is included free with your $1/mo Mulu Code subscription.

No markup except Mulu models

Claude, GPT, Gemini, Grok, and more -- billed at exact provider pricing. Mulu models have a small markup. Caching, analytics, and routing included free.

  • 50+ models across all major providers -- text, image, and video
  • Response caching (saves you money)
  • Rate limiting & spend caps
  • Usage analytics & cost tracking
  • Automatic retries & fallbacks
  • No per-request fees, ever
Get Mulu Code

One key. Every model. Ship faster.

Stop wasting time managing API keys, tracking usage across providers, and building retry logic. The Mulu Gateway handles all of it -- so you can focus on building.