Documentation Index
Fetch the complete documentation index at: https://agno-v2-team-approvals.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Pass fallback_models to any Agent or Team. If the primary model fails after exhausting its retries, each fallback is tried in order until one succeeds.
from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
fallback_models=[Claude(id="claude-sonnet-4-20250514")],
)
If gpt-4o fails after exhausting its own retries, Claude is tried automatically.
Model strings work too:
from agno.agent import Agent
agent = Agent(
model="openai:gpt-4o",
fallback_models=["anthropic:claude-sonnet-4-20250514"],
)
Usage with Teams
Fallback models apply to the team leader’s model calls. Member agents keep their own models and are not affected by the leader’s fallback config.
from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
from agno.team import Team
researcher = Agent(
name="Researcher",
role="You research topics and provide detailed findings.",
model=OpenAIChat(id="gpt-4o-mini"),
)
writer = Agent(
name="Writer",
role="You write clear, concise summaries from research findings.",
model=OpenAIChat(id="gpt-4o-mini"),
)
team = Team(
name="Research Team",
model=OpenAIChat(id="gpt-4o"),
fallback_models=[Claude(id="claude-sonnet-4-20250514")],
members=[researcher, writer],
markdown=True,
)
Error-Specific Fallbacks
FallbackConfig lets you route different error types to different fallback models. Instead of a flat list, you specify which models to try for rate limits, context window overflows, and general errors separately.
from agno.agent import Agent
from agno.models.fallback import FallbackConfig
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
fallback_config=FallbackConfig(
# On rate-limit (429/529) errors
on_rate_limit=[
OpenAIChat(id="gpt-4o-mini"),
Claude(id="claude-sonnet-4-20250514"),
],
# On context-window-exceeded errors
on_context_overflow=[
Claude(id="claude-sonnet-4-20250514"),
],
# General fallback for any other retryable error
on_error=[
Claude(id="claude-sonnet-4-20250514"),
],
),
)
Error routing
When the primary model fails, the error is classified and routed to the matching fallback list:
| Error Type | Fallback List | Example |
|---|
| Rate limit (429/529) | on_rate_limit | Provider throttling, Anthropic overloaded |
| Context window exceeded | on_context_overflow | Input too long for model’s context window |
| Other retryable errors | on_error | Server errors (5xx), network failures |
If a specific list (like on_rate_limit) is empty, on_error is used as a catch-all.
Non-retryable client errors like 400, 401, 403, 404, and 422 are not caught by fallback. These indicate configuration problems (bad API key, invalid request) that need to be fixed rather than masked by switching models.
Fallback Callback
Use the callback parameter to get notified whenever a fallback model is activated. This is useful for logging, metrics, or alerting.
from agno.agent import Agent
from agno.models.fallback import FallbackConfig
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
def on_fallback(primary_model_id: str, fallback_model_id: str, error: Exception) -> None:
print(f"[fallback] {primary_model_id} -> {fallback_model_id} (reason: {error})")
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
fallback_config=FallbackConfig(
on_error=[Claude(id="claude-sonnet-4-20250514")],
callback=on_fallback,
),
)
The callback fires after the fallback model succeeds. For streaming calls, it fires after the full stream completes.
Retry vs. Fallback
Retry and fallback are separate layers. Retry happens inside each model. Fallback only triggers after the primary model’s retry loop is fully exhausted.
Primary model
└── _invoke_with_retry() # retries N times (per model config)
On failure
└── classify error type
└── select matching fallback list
└── try each fallback in order
└── fallback._invoke_with_retry() # each fallback retries independently
Each model controls its own retry behavior:
agent = Agent(
model=OpenAIChat(id="gpt-4o", retries=3, exponential_backoff=True),
fallback_models=[
Claude(id="claude-sonnet-4-20250514", retries=2),
],
)
The primary model retries 3 times with exponential backoff. Only after all 3 attempts fail does the fallback kick in, and it gets 2 retries of its own.
Streaming
Fallback works with streaming responses. If the primary model fails mid-stream, the fallback model takes over and the response content is reset so the consumer receives a clean response from the fallback model only.
Parameters
Available on both Agent and Team:
| Parameter | Type | Description |
|---|
fallback_models | List[Model | str] | Models tried in order on any failure. Shorthand for FallbackConfig(on_error=...). |
fallback_config | FallbackConfig | Error-specific routing. Takes precedence over fallback_models if both are set. |
FallbackConfig
| Field | Type | Description |
|---|
on_error | List[Model | str] | General fallback for any retryable error. |
on_rate_limit | List[Model | str] | Fallback for rate-limit (429/529) errors. Falls back to on_error if empty. |
on_context_overflow | List[Model | str] | Fallback for context-window-exceeded errors. Falls back to on_error if empty. |
callback | Callable[[str, str, Exception], None] | Called when a fallback model is activated. Receives (primary_model_id, fallback_model_id, error). |
Developer Resources