← Back to Home

BerriAI/litellm

Architecture Visualization

Available Flows

Architecture Diagram

Flow Steps

1
Client
Developer sends chat completion request to proxy
📤 POST /chat/completions Content-Type: application/json { "model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}], "fallbacks": ["claude-2"] }
2
Proxy
Validates API key and checks rate limits
📤 SELECT * FROM litellm_usertable WHERE user_id = 'api_key_hash'
3
Cache
Checks rate limit counter
📤 GET rate_limit:api_key_hash
4
Core
Maps model name and attempts OpenAI call
📤 POST https://api.openai.com/v1/chat/completions { "model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}] }
5
OpenAI
Returns API error (rate limit exceeded)
📥 429 Too Many Requests { "error": { "code": "rate_limit_exceeded" } }
6
Core
Triggers fallback to Claude model
📤 POST https://api.anthropic.com/v1/messages { "model": "claude-2", "messages": [{"role": "user", "content": "Hello"}] }
7
Anthropic
Returns successful completion
📥 200 OK { "id": "msg_123", "content": [{"text": "Hello! How can I help you?"}] }
8
Logger
Logs completion with fallback metadata
📤 INSERT INTO litellm_logtable (request_id, model, fallback_used, tokens_used) VALUES ('req_123', 'claude-2', true, 12)
9
Client
Receives unified OpenAI format response
📥 200 OK { "id": "chatcmpl-123", "model": "claude-2", "choices": [{ "message": { "role": "assistant", "content": "Hello! How can I help you?" } }] }