Providers¶
The provider layer is how SynthOrg reaches every LLM -- cloud APIs, OpenRouter, Ollama, LM Studio, vLLM, or any custom endpoint -- through a single unified interface. It handles authentication, model discovery, cost metering, health probing, and runtime hot-reload without restarting the engine.
Provider Abstraction¶
The framework provides a unified interface for all LLM interactions. The provider layer
abstracts away vendor differences, exposing a single completion() method regardless of
whether the backend is a cloud API, OpenRouter, Ollama, or a custom endpoint.
Unified Model Interface: completion(messages, tools, config) -> resp
| Cloud API Adapter | OpenRouter Adapter | Ollama Adapter | Custom Adapter | |
|---|---|---|---|---|
| Method | Direct API call | 400+ LLMs via OR | Local LLMs, self-host | Any API |
Provider Configuration¶
Provider Configuration (YAML)
Model IDs, pricing, and provider examples below are illustrative. Actual models, costs, and provider availability are determined during implementation and loaded dynamically from provider APIs where possible.
providers:
example-provider:
litellm_provider: "anthropic" # LiteLLM routing identifier (optional, defaults to provider name)
family: "example-family" # cross-validation grouping (optional)
auth_type: api_key # api_key | oauth | custom_header | subscription | none
api_key: "${PROVIDER_API_KEY}"
# subscription_token: "..." # subscription token (subscription auth only; passed to LiteLLM as api_key; sensitive -- use env vars or secret management)
# tos_accepted_at: "..." # timestamp when subscription ToS was accepted
models: # example entries -- real list loaded from provider
- id: "example-large-001"
alias: "large"
cost_per_1k_input: 0.015 # illustrative, verify at implementation time
cost_per_1k_output: 0.075
max_context: 200000
estimated_latency_ms: 1500 # optional, used by fastest strategy
- id: "example-medium-001"
alias: "medium"
cost_per_1k_input: 0.003
cost_per_1k_output: 0.015
max_context: 200000
estimated_latency_ms: 500
- id: "example-small-001"
alias: "small"
cost_per_1k_input: 0.0008
cost_per_1k_output: 0.004
max_context: 200000
estimated_latency_ms: 200
openrouter:
auth_type: api_key # api_key | oauth | custom_header | subscription | none
api_key: "${OPENROUTER_API_KEY}"
base_url: "https://openrouter.ai/api/v1"
models: # example entries
- id: "vendor-a/model-medium"
alias: "or-medium"
- id: "vendor-b/model-pro"
alias: "or-pro"
- id: "vendor-c/model-reasoning"
alias: "or-reasoning"
ollama:
auth_type: none
base_url: "http://localhost:11434"
models: # example entries
- id: "llama3.3:70b"
alias: "local-llama"
cost_per_1k_input: 0.0 # free, local
cost_per_1k_output: 0.0
- id: "qwen2.5-coder:32b"
alias: "local-coder"
cost_per_1k_input: 0.0
cost_per_1k_output: 0.0
LiteLLM Integration¶
The framework uses LiteLLM as the provider abstraction layer:
- Unified API across 100+ providers
- Built-in cost tracking
- Automatic retries and fallbacks
- Load balancing across providers
- Chat completions-compatible interface (all providers normalized)
- Model database:
litellm.model_costprovides pricing and context window data for all known models. Used at provider creation to dynamically populate model lists with up-to-date metadata. Provider-specific version filters (for example, a newer generation filter applied per provider) exclude older generations. Deduplicates dated model variants (e.g. prefersexample-large-002overexample-large-002-20260205). Falls back to presetdefault_modelswhen no models are found in the database.
Provider Management¶
Providers can be managed at runtime through the API without restarting:
- CRUD:
POST /api/v1/providers(create),PUT /api/v1/providers/{name}(update),DELETE /api/v1/providers/{name}(delete) - Connection test:
POST /api/v1/providers/{name}/test-- sends a minimal probe and reports latency - Model discovery:
POST /api/v1/providers/{name}/discover-models - Queries the provider endpoint for available models (Ollama
/api/tags, standard/models) and updates the provider config. - Accepts an optional
preset_hintquery parameter (?preset_hint={preset_name}) that guides endpoint selection (Ollama vs standard API path). Thepreset_hintis no longer used for SSRF trust decisions. - Auto-triggered on preset creation for no-auth providers with empty model lists.
- SSRF trust is determined by a dynamic
host:portallowlist (ProviderDiscoveryPolicy), seeded from presetcandidate_urlsat startup and auto-updated on provider create/update/delete. Trusted URLs bypass SSRF validation; untrusted URLs go through full private-IP/DNS-rebinding checks. Bypasses are logged at WARNING level (PROVIDER_DISCOVERY_SSRF_BYPASSED). - Discovery allowlist:
GET /api/v1/providers/discovery-policy(read),POST /api/v1/providers/discovery-policy/entries(add entry),POST /api/v1/providers/discovery-policy/remove-entry(remove entry) -- manage the dynamic SSRF allowlist of trustedhost:portpairs for provider discovery. Persisted in the settings system (DB > env > YAML > code). - Presets:
GET /api/v1/providers/presetslists built-in cloud and local provider templates (11 presets: Anthropic, OpenAI, Google AI, Mistral, Groq, DeepSeek, Azure OpenAI, Ollama, LM Studio, vLLM, OpenRouter);POST /api/v1/providers/from-presetcreates from a template. Each preset declaressupported_auth_types(e.g.["api_key"],["none"],["api_key", "subscription"]) which the UI uses to present the available authentication options during provider creation. Presets also declarerequires_base_url(e.g.truefor Azure, Ollama, LM Studio, vLLM) which the UI uses to conditionally require a base URL. Presets also declaresupports_model_pull,supports_model_delete,supports_model_config(local model management capability flags used by the UI to gate management controls). - Preset auto-probe:
POST /api/v1/providers/probe-preset-- for presets withcandidate_urls(local providers: Ollama and LM Studio), probes each URL in priority order (host.docker.internal, Docker bridge IP,localhost) with a 5-second timeout. Returns the first reachable URL and discovered model count. Used by the setup wizard to auto-detect local providers running on the host machine. SSRF validation is intentionally skipped because only hardcoded preset URLs are probed, never user input. Note: vLLM'scandidate_urlsis intentionally empty (users deploy vLLM at arbitrary endpoints), so it cannot be auto-probed and requires manual URL configuration. - Hot-reload: On mutation,
ProviderManagementServicerebuildsProviderRegistry+ModelRouterand atomically swaps them inAppState-- no downtime - Auth types:
api_key(default),subscription(token-based auth for provider subscription plans, passed to LiteLLM asapi_key, requires ToS acceptance),oauth(stores credentials, MVP uses pre-fetched token),custom_header,none(local providers) - Routing key: Optional
litellm_providerfield decouples the provider display name from LiteLLM routing (e.g. a provider named "my-claude" can route toanthropicvialitellm_provider: anthropic). Falls back to provider name when unset. - Credential safety: Secrets are Fernet-encrypted at rest via the
providers.configssensitive setting; API responses useProviderResponseDTO that strips all secrets and provideshas_api_key/has_oauth_credentials/has_custom_header/has_subscription_tokenboolean indicators - Health:
GET /api/v1/providers/{name}/health-- returns health status (up/degraded/down/unknown derived from 24h call count and error rate; unknown when no calls recorded), average response time, error rate percentage, call count, total tokens, and total cost. In-memory tracking viaProviderHealthTracker(concurrency-safe, append-only with periodic pruning). Token/cost totals are enriched fromCostTrackerat query time - Health probing:
ProviderHealthProberbackground service pings providers withbase_url(local/self-hosted) every 30 minutes using lightweight HTTP requests (no model loading). Ollama: pings root URL; standard providers:GET /models. Skips providers with recent real API traffic. Results are recorded inProviderHealthTracker. Cloud providers withoutbase_urlrely on real call outcomes for health status - Model capabilities:
GET /api/v1/providers/{name}/modelsreturnsProviderModelResponseDTOs enriched with runtime capability flags (supports_tools,supports_vision,supports_streaming) from the driver layer'sModelCapabilities. Falls back to defaults when driver is unavailable - Local model management: Providers with
supports_model_pull/supports_model_delete/supports_model_configcapability flags expose model lifecycle operations.POST /api/v1/providers/{name}/models/pullstreams download progress via SSE (Ollama/api/pull).DELETE /api/v1/providers/{name}/models/{model_id}removes models.PUT /api/v1/providers/{name}/models/{model_id}/configsets per-model launch parameters (LocalModelParams:num_ctx,num_gpu_layers,num_threads,num_batch,repeat_penalty). Currently implemented for Ollama; LM Studio support deferred (unstable API).
Model Routing Strategy¶
Model routing determines which LLM handles a given request. Six strategies are available, selectable via configuration:
| Strategy | Behavior |
|---|---|
manual |
Resolve an explicit model override; fails if not set |
role_based |
Match agent seniority level to routing rules, then catalog default |
cost_aware |
Match task-type rules, then pick cheapest model within budget |
cheapest |
Alias for cost_aware |
fastest |
Match task-type rules, then pick fastest model (by estimated_latency_ms) within budget; falls back to cheapest when no latency data is available |
smart |
Priority cascade: override > task-type > role > seniority > cheapest > fallback chain |
routing:
strategy: "smart" # smart, cheapest, fastest, role_based, cost_aware, manual
rules:
- role_level: "C-Suite"
preferred_model: "large"
fallback: "medium"
- role_level: "Senior"
preferred_model: "medium"
fallback: "small"
- role_level: "Junior"
preferred_model: "small"
fallback: "local-coder"
- task_type: "code_review"
preferred_model: "medium"
- task_type: "documentation"
preferred_model: "small"
- task_type: "architecture"
preferred_model: "large"
fallback_chain:
- "example-provider"
- "openrouter"
- "ollama"
Multi-Provider Model Resolution¶
When multiple providers register the same model ID or alias, the ModelResolver
stores all variants as a candidate tuple rather than raising a collision error.
At resolution time, a ModelCandidateSelector picks the best candidate from the
tuple.
Two built-in selectors are provided:
| Selector | Behavior |
|---|---|
QuotaAwareSelector (default) |
Prefer providers with available quota, then cheapest among those; falls back to cheapest overall when all providers are exhausted |
CheapestSelector |
Always pick the cheapest candidate by total cost per 1k tokens, ignoring quota state |
The selector is injected into ModelResolver (and transitively into ModelRouter)
at construction time. QuotaAwareSelector is constructed with a snapshot from
QuotaTracker.peek_quota_available(), which returns a synchronous dict[str, bool]
of per-provider quota availability.
All routing strategies (smart, cost_aware, fastest, etc.) and the fallback chain
automatically use the injected selector when resolving model references, so multi-provider
selection is transparent to the strategy layer.
See Also¶
- Budget & Cost Management -- token metering, cost tracking, CFO optimization, quota degradation
- Tools -- tool categories, sandboxing, MCP integration
- Design Overview -- full index