Model Routing

Specify fallback models for reliability or use auto-routing for optimal model selection based on your prompt and requirements.

What is Model Routing?

Model routing allows you to specify multiple models that can handle your request. If the primary model is unavailable, overloaded, or fails, OpenRouter automatically tries the next model in your list. This ensures high availability and reliability.

❌ Single Model

If your chosen model is down or overloaded, your request fails

✅ Model Routing

Automatic fallback to alternative models ensures high availability

Fallback Models

Specify a list of models to try in order. OpenRouter will attempt each model until one succeeds.

fallback-models.ts
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://www.superagentstack.com/api/v1',
  apiKey: process.env.OPENROUTER_KEY,
  defaultHeaders: { 'superAgentKey': process.env.SUPER_AGENT_KEY },
});

const response = await client.chat.completions.create({
  model: 'openai/gpt-4o',  // Primary model
  messages: [{
    role: 'user',
    content: 'Explain quantum computing in simple terms'
  }],
  // @ts-expect-error - OpenRouter extension
  models: [
    'anthropic/claude-3.5-sonnet',  // First fallback
    'google/gemini-2.0-flash',      // Second fallback
    'meta-llama/llama-3.1-70b'      // Third fallback
  ]
});

// Check which model was actually used
console.log(`Response from: ${response.model}`);
console.log(response.choices[0].message.content);

Auto Router

Let OpenRouter automatically select the best model for your prompt based on performance, availability, and cost considerations.

auto-router.ts
const response = await client.chat.completions.create({
  model: 'openrouter/auto',  // Use auto-routing
  messages: [{
    role: 'user',
    content: 'Write a creative story about a time-traveling scientist'
  }]
});

// OpenRouter selects the best model for creative writing
console.log(`Auto-selected model: ${response.model}`);
console.log(response.choices[0].message.content);

// Different prompts may route to different models
const codeResponse = await client.chat.completions.create({
  model: 'openrouter/auto',
  messages: [{
    role: 'user',
    content: 'Write a Python function to calculate fibonacci numbers'
  }]
});

console.log(`Code task routed to: ${codeResponse.model}`);

Routing Strategies

Cost-Optimized Routing

Start with cheaper models and fallback to more expensive ones.

typescript
const response = await client.chat.completions.create({
  model: 'google/gemini-2.0-flash',  // Cheapest first
  messages: [{ role: 'user', content: 'Summarize this article...' }],
  // @ts-expect-error - OpenRouter extension
  models: [
    'openai/gpt-4o-mini',           // Still affordable
    'anthropic/claude-3.5-sonnet',  // More expensive
    'openai/gpt-4o'                 // Most expensive fallback
  ]
});

Performance-First Routing

Prioritize the best models with cheaper alternatives as fallbacks.

typescript
const response = await client.chat.completions.create({
  model: 'openai/gpt-4o',  // Best performance first
  messages: [{ role: 'user', content: 'Complex reasoning task...' }],
  // @ts-expect-error - OpenRouter extension
  models: [
    'anthropic/claude-3.5-sonnet',  // High quality fallback
    'google/gemini-2.0-flash',      // Good quality, faster
    'openai/gpt-4o-mini'            // Budget fallback
  ]
});

Provider Diversity

Use models from different providers to avoid single-provider outages.

typescript
const response = await client.chat.completions.create({
  model: 'openai/gpt-4o',  // OpenAI
  messages: [{ role: 'user', content: 'General query...' }],
  // @ts-expect-error - OpenRouter extension
  models: [
    'anthropic/claude-3.5-sonnet',  // Anthropic
    'google/gemini-2.0-flash',      // Google
    'meta-llama/llama-3.1-70b',     // Meta
    'mistralai/mistral-large'       // Mistral
  ]
});

Response Metadata

When model routing is used, the response includes metadata about which model was actually used.

routing-metadata.ts
const response = await client.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
  // @ts-expect-error - OpenRouter extension
  models: ['anthropic/claude-3.5-sonnet', 'google/gemini-2.0-flash']
});

// The model field shows which model actually responded
console.log('Actual model used:', response.model);

// Check metadata for routing information
if (response._metadata?.openRouter?.modelRouting) {
  const routing = response._metadata.openRouter.modelRouting;
  console.log('Requested models:', routing.requested);
  console.log('Actual model:', routing.actual);
  
  if (routing.actual !== routing.requested[0]) {
    console.log('Fallback was used!');
  }
}

curl Example

bash
curl -X POST https://www.superagentstack.com/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_KEY" \
  -H "superAgentKey: $SUPER_AGENT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}],
    "models": [
      "anthropic/claude-3.5-sonnet",
      "google/gemini-2.0-flash"
    ]
  }'

Best Practices

Choose Compatible Models

Ensure all models in your routing list support the features you're using (structured outputs, tool calling, etc.).

Consider Model Differences

Different models may produce different response styles. Test your fallback models to ensure consistent quality.

Monitor Usage

Check the response metadata to understand which models are being used and optimize your routing strategy.

Next Steps