One API Key for
50+ AI Models

Access models from Claude, GPT, Gemini, Grok, Groq, Cerebras, and more through a single OpenAI-compatible endpoint. Response caching, smart routing, usage analytics, and rate limiting -- all built in. No markup on any model except Mulu's own.

Get Mulu Code Explore Features

Gateway

Everything you need, nothing you don't

Stop juggling API keys across Anthropic, OpenAI, Google, xAI, Groq, Cerebras, and more. One endpoint, one key, one bill.

50+ Models, One Key

Claude Opus & Sonnet, GPT-5.4, Gemini 3.1 Pro, Grok 4.2, Llama 4, Qwen, Kimi K2.5, MiniMax M2.7, plus image & video models. All accessible through a single OpenAI-compatible endpoint.

Response Caching

Identical prompts return cached responses instantly instead of hitting the model again. Slash your costs on repetitive queries without changing a single line of code.

Smart Rate Limiting

Per-user, per-IP, or per-key rate limits. Prevent abuse, stay within provider quotas, and guarantee service for your real users -- all configured automatically.

Usage Analytics

Track token usage, costs, latency, and error rates per model and per user. See which models your app uses most and where you can optimise spending.

Drop-In

OpenAI-compatible, drop-in ready

Already using the OpenAI SDK? Just swap your base URL and API key. The Mulu Gateway speaks the exact same API format -- chat completions, streaming, function calling, vision, and tool use all work out of the box. Switch between Claude, GPT, Gemini, and Grok with a single parameter change.

Chat completions with streaming, function calling, and tool use
Vision & multimodal support across all capable models
Automatic retries with exponential backoff on provider errors
Fallback routing to backup models when a provider goes down

gateway-demo.js

const res = await client.chat.completions.create({
model: 'claude-sonnet-4-6', // Anthropic
model: 'gpt-5.4', // OpenAI
model: 'gemini-3.1-pro', // Google
model: 'grok-4.2', // xAI
messages: [{ role: 'user', content: prompt }],
});

Caching

Response caching that pays for itself

When the same prompt hits the gateway twice, the cached response is returned instantly -- zero tokens consumed, zero latency. Perfect for classification tasks, embeddings, and repeated queries.

Semantic caching for near-identical prompts
Configurable TTL -- cache for seconds, minutes, or hours
Cache hit rates visible in your analytics dashboard
Per-route cache rules -- cache some endpoints, skip others

Cache Performance Last 7 days

87%

Hit Rate

2.4M

Tokens Saved

~$8

Cost Saved

Cache hits: 14,280 Misses: 2,120

Analytics

Know exactly where your tokens go

A full analytics dashboard shows token usage, costs, and performance across every model. See which models your app relies on and set spend limits before you get a surprise bill.

Per-model and per-user token usage breakdown
Cost tracking with daily, weekly, and monthly views
Latency percentiles (p50/p95/p99) per provider
Configurable spend alerts and hard caps

Token Usage by Model

Claude Sonnet 4.6482K tokens

GPT-5.4310K tokens

Gemini 3.1 Pro128K tokens

Grok 4.264K tokens

Providers

Every major provider, one gateway

Claude, GPT, Gemini, Grok, Llama, Groq, Cerebras, Qwen, Kimi, MiniMax, and more -- all routed through a single edge-deployed proxy.

Anthropic

Claude Opus 4.6 (1M ctx), Claude Sonnet 4.6 (1M ctx), Claude Haiku 4.5 -- full thinking & vision support.

Claude

OpenAI

GPT-5.4 (1M ctx), GPT-5.3 Codex (400K ctx), GPT Image 1 Mini, Sora 2 & Sora 2 Pro -- text, image, and video generation.

GPT

Google

Gemini 3 Flash (1M ctx), Gemini 3.1 Pro (1M ctx) with Deep Think, plus Nano Banana 2 & Nano Banana Pro for image generation.

Gemini

xAI

Grok 4.2 (2M ctx) with reasoning toggle and Grok 4.2 Agents for multi-agent workloads.

Grok

Open Models

Qwen 3.5 Plus, Qwen3-235B, Qwen3-Coder-480B, Llama 4 Scout (10M ctx), Llama 4 Maverick, Kimi K2.5, MiniMax M2.7, MiMo v2 Pro, Mulu Agent 1 Flash, Mulu Agent 1 Pro, and more via OpenRouter.

OpenRouter

Edge Cache Layer

Response caching on Cloudflare Workers. Identical prompts return cached results instantly with zero token cost.

Caching

Wallet

Wallet & Credit Balance

Top up your wallet and pay only for what you use. Track spending in real-time across every model and provider.

Wallet Balance

$--

Top up anytime

This Month

$--

Across all models

Free Tokens Left

Included monthly

View your dashboard

Production

Built for production

Every feature you need to ship AI-powered apps confidently at scale.

Streaming (SSE)

Full Server-Sent Events streaming support across all models. Get tokens as they're generated -- no buffering, no delays.

API Key Management

Create, revoke, and rotate API keys from your dashboard. Set per-key spend limits and model restrictions for fine-grained control.

Request Logging

Every request is logged with latency, token count, model, and status. Search, filter, and export logs for debugging and compliance.

Edge Deployed

Runs on Cloudflare Workers at 300+ edge locations worldwide. Sub-50ms overhead regardless of where your users are.

Spend Alerts

Set daily, weekly, or monthly spend caps. Get notified before you hit limits so you're never caught off guard.

Model Fallback Chains

Define fallback sequences: if Claude is down, auto-route to GPT. If GPT fails, try Gemini. Zero downtime for your users.

Code

Drop-in compatible

Already using the OpenAI SDK? Just change the base URL. Switch models with one parameter.

app.js

import OpenAI from 'openai';

// Point to Mulu Gateway instead of OpenAI directly
const client = new OpenAI({
  baseURL: 'https://gateway.mulucode.dev/v1',
  apiKey:  'mulu_sk_...',      // One key for all models
});

// Use Claude
const claude = await client.chat.completions.create({
  model: 'claude-sonnet-4-6',  // Or claude-opus-4-6, claude-haiku-4-5
  messages: [{ role: 'user', content: 'Explain caching' }],
});

// Switch to GPT with one line change
const gpt = await client.chat.completions.create({
  model: 'gpt-5.4',            // Or gemini-3.1-pro, grok-4.2, qwen-3.5-plus
  messages: [{ role: 'user', content: 'Explain caching' }],
  cache: true,                 // Enable response caching (free, saves tokens)
});

Pricing

Transparent, simple pricing

No markup on any model except Mulu's own. No per-request fees. The gateway is included free with your $1/mo Mulu Code subscription.

No markup except Mulu models

Claude, GPT, Gemini, Grok, and more -- billed at exact provider pricing. Mulu models have a small markup. Caching, analytics, and routing included free.

50+ models across all major providers -- text, image, and video
Response caching (saves you money)
Rate limiting & spend caps
Usage analytics & cost tracking
Automatic retries & fallbacks
No per-request fees, ever

Get Mulu Code

View full pricing details

One key. Every model. Ship faster.

Stop wasting time managing API keys, tracking usage across providers, and building retry logic. The Mulu Gateway handles all of it -- so you can focus on building.

Get Mulu Code View Pricing

One API Key for50+ AI Models