Skip to content

Providers

The provider layer is how SynthOrg reaches every LLM -- cloud APIs, OpenRouter, Ollama, LM Studio, vLLM, or any custom endpoint -- through a single unified interface. It handles authentication, model discovery, cost metering, health probing, and runtime hot-reload without restarting the engine.


Provider Abstraction

The framework provides a unified interface for all LLM interactions. The provider layer abstracts away vendor differences, exposing a single completion() method regardless of whether the backend is a cloud API, OpenRouter, Ollama, or a custom endpoint.

Unified Model Interface: completion(messages, tools, config) -> resp

Cloud API Adapter OpenRouter Adapter Ollama Adapter Custom Adapter
Method Direct API call 400+ LLMs via OR Local LLMs, self-host Any API

Provider Configuration

Provider Configuration (YAML)

Model IDs, pricing, and provider examples below are illustrative. Actual models, costs, and provider availability are determined during implementation and loaded dynamically from provider APIs where possible.

providers:
  example-provider:
    litellm_provider: "anthropic"  # LiteLLM routing identifier (optional, defaults to provider name)
    family: "example-family"       # cross-validation grouping (optional)
    auth_type: api_key             # api_key | oauth | custom_header | subscription | none
    api_key: "${PROVIDER_API_KEY}"
    # subscription_token: "..."    # subscription token (subscription auth only; passed to LiteLLM as api_key; sensitive -- use env vars or secret management)
    # tos_accepted_at: "..."       # timestamp when subscription ToS was accepted
    models:                        # example entries -- real list loaded from provider
      - id: "example-large-001"
        alias: "large"
        cost_per_1k_input: 0.015   # illustrative, verify at implementation time
        cost_per_1k_output: 0.075
        max_context: 200000
        estimated_latency_ms: 1500 # optional, used by fastest strategy
      - id: "example-medium-001"
        alias: "medium"
        cost_per_1k_input: 0.003
        cost_per_1k_output: 0.015
        max_context: 200000
        estimated_latency_ms: 500
      - id: "example-small-001"
        alias: "small"
        cost_per_1k_input: 0.0008
        cost_per_1k_output: 0.004
        max_context: 200000
        estimated_latency_ms: 200

  openrouter:
    auth_type: api_key           # api_key | oauth | custom_header | subscription | none
    api_key: "${OPENROUTER_API_KEY}"
    base_url: "https://openrouter.ai/api/v1"
    models:                        # example entries
      - id: "vendor-a/model-medium"
        alias: "or-medium"
      - id: "vendor-b/model-pro"
        alias: "or-pro"
      - id: "vendor-c/model-reasoning"
        alias: "or-reasoning"

  ollama:
    auth_type: none
    base_url: "http://localhost:11434"
    models:                        # example entries
      - id: "llama3.3:70b"
        alias: "local-llama"
        cost_per_1k_input: 0.0    # free, local
        cost_per_1k_output: 0.0
      - id: "qwen2.5-coder:32b"
        alias: "local-coder"
        cost_per_1k_input: 0.0
        cost_per_1k_output: 0.0

LiteLLM Integration

The framework uses LiteLLM as the provider abstraction layer:

  • Unified API across 100+ providers
  • Built-in cost tracking
  • Automatic retries and fallbacks
  • Load balancing across providers
  • Chat completions-compatible interface (all providers normalized)
  • Model database: litellm.model_cost provides pricing and context window data for all known models. Used at provider creation to dynamically populate model lists with up-to-date metadata. Provider-specific version filters (for example, a newer generation filter applied per provider) exclude older generations. Deduplicates dated model variants (e.g. prefers example-large-002 over example-large-002-20260205). Falls back to preset default_models when no models are found in the database.

Provider Management

Providers can be managed at runtime through the API without restarting:

  • CRUD: POST /api/v1/providers (create), PUT /api/v1/providers/{name} (update), DELETE /api/v1/providers/{name} (delete)
  • Connection test: POST /api/v1/providers/{name}/test -- sends a minimal probe and reports latency
  • Model discovery: POST /api/v1/providers/{name}/discover-models
  • Queries the provider endpoint for available models (Ollama /api/tags, standard /models) and updates the provider config.
  • Accepts an optional preset_hint query parameter (?preset_hint={preset_name}) that guides endpoint selection (Ollama vs standard API path). The preset_hint is no longer used for SSRF trust decisions.
  • Auto-triggered on preset creation for no-auth providers with empty model lists.
  • SSRF trust is determined by a dynamic host:port allowlist (ProviderDiscoveryPolicy), seeded from preset candidate_urls at startup and auto-updated on provider create/update/delete. Trusted URLs bypass SSRF validation; untrusted URLs go through full private-IP/DNS-rebinding checks. Bypasses are logged at WARNING level (PROVIDER_DISCOVERY_SSRF_BYPASSED).
  • Discovery allowlist: GET /api/v1/providers/discovery-policy (read), POST /api/v1/providers/discovery-policy/entries (add entry), POST /api/v1/providers/discovery-policy/remove-entry (remove entry) -- manage the dynamic SSRF allowlist of trusted host:port pairs for provider discovery. Persisted in the settings system (DB > env > YAML > code).
  • Presets: GET /api/v1/providers/presets lists built-in cloud and local provider templates (11 presets: Anthropic, OpenAI, Google AI, Mistral, Groq, DeepSeek, Azure OpenAI, Ollama, LM Studio, vLLM, OpenRouter); POST /api/v1/providers/from-preset creates from a template. Each preset declares supported_auth_types (e.g. ["api_key"], ["none"], ["api_key", "subscription"]) which the UI uses to present the available authentication options during provider creation. Presets also declare requires_base_url (e.g. true for Azure, Ollama, LM Studio, vLLM) which the UI uses to conditionally require a base URL. Presets also declare supports_model_pull, supports_model_delete, supports_model_config (local model management capability flags used by the UI to gate management controls).
  • Preset auto-probe: POST /api/v1/providers/probe-preset -- for presets with candidate_urls (local providers: Ollama and LM Studio), probes each URL in priority order (host.docker.internal, Docker bridge IP, localhost) with a 5-second timeout. Returns the first reachable URL and discovered model count. Used by the setup wizard to auto-detect local providers running on the host machine. SSRF validation is intentionally skipped because only hardcoded preset URLs are probed, never user input. Note: vLLM's candidate_urls is intentionally empty (users deploy vLLM at arbitrary endpoints), so it cannot be auto-probed and requires manual URL configuration.
  • Hot-reload: On mutation, ProviderManagementService rebuilds ProviderRegistry + ModelRouter and atomically swaps them in AppState -- no downtime
  • Auth types: api_key (default), subscription (token-based auth for provider subscription plans, passed to LiteLLM as api_key, requires ToS acceptance), oauth (stores credentials, MVP uses pre-fetched token), custom_header, none (local providers)
  • Routing key: Optional litellm_provider field decouples the provider display name from LiteLLM routing (e.g. a provider named "my-claude" can route to anthropic via litellm_provider: anthropic). Falls back to provider name when unset.
  • Credential safety: Secrets are Fernet-encrypted at rest via the providers.configs sensitive setting; API responses use ProviderResponse DTO that strips all secrets and provides has_api_key/has_oauth_credentials/has_custom_header/has_subscription_token boolean indicators
  • Health: GET /api/v1/providers/{name}/health -- returns health status (up/degraded/down/unknown derived from 24h call count and error rate; unknown when no calls recorded), average response time, error rate percentage, call count, total tokens, and total cost. In-memory tracking via ProviderHealthTracker (concurrency-safe, append-only with periodic pruning). Token/cost totals are enriched from CostTracker at query time
  • Health probing: ProviderHealthProber background service pings providers with base_url (local/self-hosted) every 30 minutes using lightweight HTTP requests (no model loading). Ollama: pings root URL; standard providers: GET /models. Skips providers with recent real API traffic. Results are recorded in ProviderHealthTracker. Cloud providers without base_url rely on real call outcomes for health status
  • Model capabilities: GET /api/v1/providers/{name}/models returns ProviderModelResponse DTOs enriched with runtime capability flags (supports_tools, supports_vision, supports_streaming) from the driver layer's ModelCapabilities. Falls back to defaults when driver is unavailable
  • Local model management: Providers with supports_model_pull/supports_model_delete/supports_model_config capability flags expose model lifecycle operations. POST /api/v1/providers/{name}/models/pull streams download progress via SSE (Ollama /api/pull). DELETE /api/v1/providers/{name}/models/{model_id} removes models. PUT /api/v1/providers/{name}/models/{model_id}/config sets per-model launch parameters (LocalModelParams: num_ctx, num_gpu_layers, num_threads, num_batch, repeat_penalty). Currently implemented for Ollama; LM Studio support deferred (unstable API).

Model Routing Strategy

Model routing determines which LLM handles a given request. Six strategies are available, selectable via configuration:

Strategy Behavior
manual Resolve an explicit model override; fails if not set
role_based Match agent seniority level to routing rules, then catalog default
cost_aware Match task-type rules, then pick cheapest model within budget
cheapest Alias for cost_aware
fastest Match task-type rules, then pick fastest model (by estimated_latency_ms) within budget; falls back to cheapest when no latency data is available
smart Priority cascade: override > task-type > role > seniority > cheapest > fallback chain
routing:
  strategy: "smart"              # smart, cheapest, fastest, role_based, cost_aware, manual
  rules:
    - role_level: "C-Suite"
      preferred_model: "large"
      fallback: "medium"
    - role_level: "Senior"
      preferred_model: "medium"
      fallback: "small"
    - role_level: "Junior"
      preferred_model: "small"
      fallback: "local-coder"
    - task_type: "code_review"
      preferred_model: "medium"
    - task_type: "documentation"
      preferred_model: "small"
    - task_type: "architecture"
      preferred_model: "large"
  fallback_chain:
    - "example-provider"
    - "openrouter"
    - "ollama"

Multi-Provider Model Resolution

When multiple providers register the same model ID or alias, the ModelResolver stores all variants as a candidate tuple rather than raising a collision error. At resolution time, a ModelCandidateSelector picks the best candidate from the tuple.

Two built-in selectors are provided:

Selector Behavior
QuotaAwareSelector (default) Prefer providers with available quota, then cheapest among those; falls back to cheapest overall when all providers are exhausted
CheapestSelector Always pick the cheapest candidate by total cost per 1k tokens, ignoring quota state

The selector is injected into ModelResolver (and transitively into ModelRouter) at construction time. QuotaAwareSelector is constructed with a snapshot from QuotaTracker.peek_quota_available(), which returns a synchronous dict[str, bool] of per-provider quota availability.

All routing strategies (smart, cost_aware, fastest, etc.) and the fallback chain automatically use the injected selector when resolving model references, so multi-provider selection is transparent to the strategy layer.


See Also