Available Flows
Architecture Diagram
Flow Steps
1
Client
Developer sends chat completion request to proxy
📤 POST /chat/completions
Content-Type: application/json
{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}],
"fallbacks": ["claude-2"]
}
2
Proxy
Validates API key and checks rate limits
📤 SELECT * FROM litellm_usertable WHERE user_id = 'api_key_hash'
3
Cache
Checks rate limit counter
📤 GET rate_limit:api_key_hash
4
Core
Maps model name and attempts OpenAI call
📤 POST https://api.openai.com/v1/chat/completions
{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}
5
OpenAI
Returns API error (rate limit exceeded)
📥 429 Too Many Requests
{
"error": {
"code": "rate_limit_exceeded"
}
}
6
Core
Triggers fallback to Claude model
📤 POST https://api.anthropic.com/v1/messages
{
"model": "claude-2",
"messages": [{"role": "user", "content": "Hello"}]
}
7
Anthropic
Returns successful completion
📥 200 OK
{
"id": "msg_123",
"content": [{"text": "Hello! How can I help you?"}]
}
8
Logger
Logs completion with fallback metadata
📤 INSERT INTO litellm_logtable (request_id, model, fallback_used, tokens_used) VALUES ('req_123', 'claude-2', true, 12)
9
Client
Receives unified OpenAI format response
📥 200 OK
{
"id": "chatcmpl-123",
"model": "claude-2",
"choices": [{
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
}
}]
}