Providers¶
LLM provider abstraction: protocol, base class, drivers, capabilities, routing, and resilience.
Protocol¶
protocol
¶
Typed protocol for completion providers.
The engine and tests type-hint against CompletionProvider for loose
coupling. Concrete adapters and test doubles satisfy it structurally.
CompletionProvider
¶
Bases: Protocol
Structural interface every LLM provider adapter must satisfy.
Defines four async methods: complete for non-streaming chat
completion, stream for streaming completion,
get_model_capabilities for a single-model capability lookup, and
batch_get_capabilities for many-model capability lookup with
per-model graceful degradation.
complete
async
¶
Execute a non-streaming chat completion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[ChatMessage]
|
Conversation history. |
required |
model
|
str
|
Model identifier to use. |
required |
tools
|
list[ToolDefinition] | None
|
Available tools for function calling. |
None
|
config
|
CompletionConfig | None
|
Optional completion parameters. |
None
|
Returns:
| Type | Description |
|---|---|
CompletionResponse
|
The full completion response. |
Source code in src/synthorg/providers/protocol.py
stream
async
¶
Execute a streaming chat completion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[ChatMessage]
|
Conversation history. |
required |
model
|
str
|
Model identifier to use. |
required |
tools
|
list[ToolDefinition] | None
|
Available tools for function calling. |
None
|
config
|
CompletionConfig | None
|
Optional completion parameters. |
None
|
Returns:
| Type | Description |
|---|---|
AsyncIterator[StreamChunk]
|
Async iterator of stream chunks. |
Source code in src/synthorg/providers/protocol.py
get_model_capabilities
async
¶
Return capability metadata for the given model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model identifier. |
required |
Returns:
| Type | Description |
|---|---|
ModelCapabilities
|
Static capability and cost information. |
batch_get_capabilities
async
¶
Return capability metadata for many models in one call.
Failures degrade per-model: models whose lookup fails surface as
None entries so callers preserve graceful per-model fallback.
The returned mapping keys are exactly the input models tuple.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
tuple[str, ...]
|
Tuple of model identifiers to look up. |
required |
Returns:
| Type | Description |
|---|---|
Mapping[str, ModelCapabilities | None]
|
Mapping from model id to capabilities (or |
Source code in src/synthorg/providers/protocol.py
Base Provider¶
base
¶
Abstract base class for completion providers.
Concrete adapters subclass BaseCompletionProvider and implement
the _do_* hooks. The base class handles input validation,
automatic retry, rate limiting, and provides a cost-computation helper.
BaseCompletionProvider
¶
Bases: ABC
Shared base for all completion provider adapters.
Subclasses implement three hooks:
_do_complete-- raw non-streaming call_do_stream-- raw streaming call_do_get_model_capabilities-- capability lookup
The public methods validate inputs before delegating to hooks.
When a retry_handler and/or rate_limiter are provided,
calls are automatically wrapped with retry and rate-limiting logic.
A static compute_cost helper is available for subclasses to
build TokenUsage records from raw token counts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
retry_handler
|
RetryHandler | None
|
Optional retry handler for transient errors. |
None
|
rate_limiter
|
RateLimiter | None
|
Optional client-side rate limiter. |
None
|
clock
|
Clock | None
|
Optional injectable :class: |
None
|
Source code in src/synthorg/providers/base.py
complete
async
¶
Validate inputs, delegate to _do_complete.
Applies rate limiting and retry automatically when configured.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[ChatMessage]
|
Conversation history. |
required |
model
|
str
|
Model identifier to use. |
required |
tools
|
list[ToolDefinition] | None
|
Available tools for function calling. |
None
|
config
|
CompletionConfig | None
|
Optional completion parameters. |
None
|
Returns:
| Type | Description |
|---|---|
CompletionResponse
|
The completion response. |
Raises:
| Type | Description |
|---|---|
InvalidRequestError
|
If messages are empty or model is blank. |
RetryExhaustedError
|
If all retries are exhausted. |
Source code in src/synthorg/providers/base.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 | |
stream
async
¶
Validate inputs, delegate to _do_stream.
Only the initial connection setup is retried; mid-stream errors are not retried.
.. note::
Unlike :meth:`complete`, ``stream`` does **not** fire the
cost-recording chokepoint. Streaming responses surface
usage as a terminal ``StreamEventType.USAGE`` chunk, so the
recording logic would have to consume the iterator to
extract token counts -- conflating cost recording with the
stream-consumption contract. Until streaming becomes a
mainstream LLM call path in this codebase, callers using
``stream()`` are responsible for emitting their own
``CostRecord`` from the final usage chunk. No call site in
the current diff uses ``stream()`` for paid LLM work.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[ChatMessage]
|
Conversation history. |
required |
model
|
str
|
Model identifier to use. |
required |
tools
|
list[ToolDefinition] | None
|
Available tools for function calling. |
None
|
config
|
CompletionConfig | None
|
Optional completion parameters. |
None
|
Returns:
| Type | Description |
|---|---|
AsyncIterator[StreamChunk]
|
Async iterator of stream chunks. |
Raises:
| Type | Description |
|---|---|
InvalidRequestError
|
If messages are empty or model is blank. |
RetryExhaustedError
|
If all retries are exhausted. |
Source code in src/synthorg/providers/base.py
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | |
get_model_capabilities
async
¶
Validate model identifier, delegate to _do_get_model_capabilities.
Capability lookups go through the same retry handler and rate
limiter as complete() / stream() so the contract "all
provider calls go through BaseCompletionProvider" stays honest
for any future driver whose _do_get_model_capabilities
does network I/O. Same budget as completions: capability
lookups consume a rate-limiter slot and are retried on
retryable errors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model identifier. |
required |
Returns:
| Type | Description |
|---|---|
ModelCapabilities
|
Static capability and cost information. |
Raises:
| Type | Description |
|---|---|
InvalidRequestError
|
If model is blank. |
RetryExhaustedError
|
If all retries are exhausted. |
Source code in src/synthorg/providers/base.py
batch_get_capabilities
async
¶
Fan out capability lookups across many models in parallel.
The default implementation runs get_model_capabilities per
model concurrently via :class:asyncio.TaskGroup.
Per-model classification errors (model-not-found, validation,
non-retryable provider errors) degrade to None entries so a
single bad model id does not poison the whole batch.
RetryExhaustedError propagates: retry exhaustion is a
signal that the provider is unhealthy, not a per-model
classification issue. Surfacing it lets the caller decide
whether to fail the whole list-models request or retry the
batch later. MemoryError and RecursionError also
propagate unchanged.
Subclasses that expose a cheaper bulk source (e.g. a static preset catalog) should override this to avoid the per-model round trip.
Source code in src/synthorg/providers/base.py
compute_cost
staticmethod
¶
Build a TokenUsage from raw token counts and per-1k rates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_tokens
|
int
|
Number of input tokens (must be >= 0). |
required |
output_tokens
|
int
|
Number of output tokens (must be >= 0). |
required |
cost_per_1k_input
|
float
|
Cost per 1,000 input tokens in the configured currency (finite and >= 0). |
required |
cost_per_1k_output
|
float
|
Cost per 1,000 output tokens in the configured currency (finite and >= 0). |
required |
Returns:
| Type | Description |
|---|---|
TokenUsage
|
Populated |
Raises:
| Type | Description |
|---|---|
InvalidRequestError
|
If any parameter is negative or non-finite. |
Source code in src/synthorg/providers/base.py
Models¶
models
¶
Provider-layer domain models for chat completion requests and responses.
ZERO_TOKEN_USAGE
module-attribute
¶
ZERO_TOKEN_USAGE = TokenUsage(input_tokens=0, output_tokens=0, cost=0.0)
Additive identity for TokenUsage.
TokenUsage
pydantic-model
¶
Bases: BaseModel
Token counts and cost for a single completion call.
This is the lightweight provider-layer record. The budget layer's
synthorg.budget.CostRecord adds agent/task context around it.
Attributes:
| Name | Type | Description |
|---|---|---|
input_tokens |
int
|
Number of input (prompt) tokens. |
output_tokens |
int
|
Number of output (completion) tokens. |
total_tokens |
int
|
Sum of input and output tokens (computed). |
cost |
float
|
Estimated cost in the configured currency for this call. |
Config:
frozen:Trueallow_inf_nan:False
Fields:
-
input_tokens(int) -
output_tokens(int) -
cost(float)
ToolDefinition
pydantic-model
¶
Bases: BaseModel
Schema for a tool the model can invoke.
Uses raw JSON Schema for parameters_schema because every LLM
provider consumes it natively.
Note
The parameters_schema dict is shallowly frozen by Pydantic's
frozen=True -- field reassignment is prevented but nested
contents can still be mutated in place. BaseTool.to_definition()
provides a deep-copied schema, and ToolInvoker deep-copies
arguments at the execution boundary, so no additional caller-side
copying is needed for standard tool/provider workflows. Direct
consumers outside these paths should deep-copy if they intend to
modify the schema. See the tech stack page (docs/architecture/tech-stack.md).
Attributes:
| Name | Type | Description |
|---|---|---|
name |
NotBlankStr
|
Tool name. |
description |
str
|
Human-readable description of the tool. |
parameters_schema |
dict[str, Any]
|
JSON Schema dict describing the tool parameters. |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
name(NotBlankStr) -
description(str) -
parameters_schema(dict[str, Any]) -
l1_metadata(ToolL1Metadata | None) -
l2_body(ToolL2Body | None) -
l3_resources(tuple[ToolL3Resource, ...])
Validators:
-
_validate_l1_name_matches
ToolCall
pydantic-model
¶
Bases: BaseModel
A tool invocation requested by the model.
Note
The arguments dict is shallowly frozen by Pydantic's
frozen=True -- field reassignment is prevented but nested
contents can still be mutated in place. The ToolInvoker
deep-copies arguments before passing them to tool
implementations. See the tech stack page (docs/architecture/tech-stack.md).
Attributes:
| Name | Type | Description |
|---|---|---|
id |
NotBlankStr
|
Provider-assigned tool call identifier. |
name |
NotBlankStr
|
Name of the tool to invoke. |
arguments |
dict[str, Any]
|
Parsed arguments dict. |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
id(NotBlankStr) -
name(NotBlankStr) -
arguments(dict[str, Any])
ToolResult
pydantic-model
¶
Bases: BaseModel
Result of executing a tool call, sent back to the model.
Attributes:
| Name | Type | Description |
|---|---|---|
tool_call_id |
NotBlankStr
|
The |
content |
str
|
String content returned by the tool. |
is_error |
bool
|
Whether the tool execution failed. |
is_timeout |
bool
|
Whether the tool execution timed out specifically
(a stricter form of |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
tool_call_id(NotBlankStr) -
content(str) -
is_error(bool) -
is_timeout(bool)
Validators:
-
_validate_timeout_implies_error
ChatMessage
pydantic-model
¶
Bases: BaseModel
A single message in a chat completion conversation.
Attributes:
| Name | Type | Description |
|---|---|---|
role |
MessageRole
|
Message role (system, user, assistant, tool). |
content |
str | None
|
Text content of the message. |
tool_calls |
tuple[ToolCall, ...]
|
Tool calls requested by the assistant (assistant only). |
tool_result |
ToolResult | None
|
Result of a tool execution (tool role only). |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
role(MessageRole) -
content(str | None) -
tool_calls(tuple[ToolCall, ...]) -
tool_result(ToolResult | None)
Validators:
-
_validate_role_constraints
CompletionConfig
pydantic-model
¶
Bases: BaseModel
Optional parameters for a completion request.
All fields are optional -- the provider fills in defaults.
Attributes:
| Name | Type | Description |
|---|---|---|
temperature |
float | None
|
Sampling temperature (0.0-2.0). Actual valid range may vary by provider. |
max_tokens |
int | None
|
Maximum tokens to generate. |
stop_sequences |
tuple[str, ...]
|
Sequences that stop generation. |
top_p |
float
|
Nucleus sampling threshold. |
timeout |
float | None
|
Request timeout in seconds. |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
temperature(float | None) -
max_tokens(int | None) -
stop_sequences(tuple[str, ...]) -
top_p(float) -
timeout(float | None)
top_p
pydantic-field
¶
Nucleus-sampling threshold. Defaults to 1.0 (full distribution, no truncation) so every completion call has an explicit deterministic value without each site having to repeat it. Override when the prompt class needs a custom value alongside temperature.
CompletionResponse
pydantic-model
¶
Bases: BaseModel
Result of a non-streaming completion call.
Attributes:
| Name | Type | Description |
|---|---|---|
content |
str | None
|
Generated text content (may be |
tool_calls |
tuple[ToolCall, ...]
|
Tool calls the model wants to execute. |
finish_reason |
FinishReason
|
Why the model stopped generating. |
usage |
TokenUsage
|
Token usage and cost breakdown. |
model |
NotBlankStr
|
Model identifier that served the request. |
provider_request_id |
str | None
|
Provider-assigned request ID for debugging. |
provider_metadata |
dict[str, object]
|
Provider metadata injected by the base class
( |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
content(str | None) -
tool_calls(tuple[ToolCall, ...]) -
finish_reason(FinishReason) -
usage(TokenUsage) -
model(NotBlankStr) -
provider_request_id(str | None) -
provider_metadata(dict[str, object])
Validators:
-
_validate_has_output
provider_metadata
pydantic-field
¶
Provider metadata injected by the base class (synthorg* keys).
StreamChunk
pydantic-model
¶
Bases: BaseModel
A single chunk from a streaming completion response.
The event_type discriminator determines which optional fields are
populated.
Attributes:
| Name | Type | Description |
|---|---|---|
event_type |
StreamEventType
|
Type of stream event. |
content |
str | None
|
Text delta (for |
tool_call_delta |
ToolCall | None
|
Tool call received during streaming (for |
usage |
TokenUsage | None
|
Final token usage (for |
error_message |
str | None
|
Error description (for |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
event_type(StreamEventType) -
content(str | None) -
tool_call_delta(ToolCall | None) -
usage(TokenUsage | None) -
error_message(str | None)
Validators:
-
_validate_event_fields
add_token_usage
¶
Create a new TokenUsage with summed token counts and cost.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
TokenUsage
|
First usage record. |
required |
b
|
TokenUsage
|
Second usage record. |
required |
Returns:
| Type | Description |
|---|---|
TokenUsage
|
New |
TokenUsage
|
( |
Source code in src/synthorg/providers/models.py
Enums¶
enums
¶
Provider-layer enumerations.
Errors¶
errors
¶
Provider error hierarchy.
Every provider error carries a is_retryable flag so retry logic
can decide whether to attempt again without inspecting concrete
exception types.
ProviderErrorLabel
module-attribute
¶
ProviderErrorLabel = Literal[
"rate_limit",
"timeout",
"connection",
"internal",
"invalid_request",
"auth",
"content_filter",
"not_found",
"other",
]
Bounded Prometheus label value returned by :func:classify_provider_error.
Kept in lockstep with
:data:synthorg.observability.prometheus_labels.VALID_PROVIDER_ERROR_CLASSES
by the record helper; updating either requires updating both.
ProviderLifecycleConflictError
¶
Bases: ConflictError
Raised when ProviderHealthProber.start() is called after a timed-out stop.
Mirrors :class:BackupUnrestartableError -- a stuck drain leaves
the prober's loop alive on the original instance, so the canonical
lifecycle pattern marks the prober unrestartable rather than
layering a second loop on top of an orphan task.
Source code in src/synthorg/core/domain_errors.py
ProviderError
¶
Bases: DomainError
Base exception for all provider-layer errors.
Attributes:
| Name | Type | Description |
|---|---|---|
message |
Human-readable error description. |
|
context |
MappingProxyType[str, Any]
|
Immutable metadata about the error (provider, model, etc.). |
is_retryable |
bool
|
Whether the caller should retry the request. |
Class Attributes
status_code: HTTP 502 Bad Gateway (upstream failure).
error_code: RFC 9457 error code; subclasses override.
error_category: PROVIDER_ERROR.
retryable: Alias of is_retryable for the exception handler.
default_message: Generic message safe for 5xx scrubbing.
Note
When converted to string, sensitive context keys (api_key, token, secret, password, authorization) are automatically redacted regardless of casing.
Initialize a provider error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
message
|
str
|
Human-readable error description. |
required |
context
|
dict[str, Any] | None
|
Arbitrary metadata about the error. Stored as an immutable mapping; defaults to empty if not provided. |
None
|
Source code in src/synthorg/providers/errors.py
__str__
¶
Format error with optional context metadata.
Sensitive keys (api_key, token, etc.) are redacted to prevent accidental secret leakage in logs and tracebacks.
Source code in src/synthorg/providers/errors.py
AuthenticationError
¶
Bases: ProviderError
Invalid or missing API credentials.
Source code in src/synthorg/providers/errors.py
RateLimitError
¶
Bases: ProviderError
Provider rate limit exceeded.
Initialize a rate limit error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
message
|
str
|
Human-readable error description. |
required |
retry_after
|
float | None
|
Seconds to wait before retrying, if provided by the provider. |
None
|
context
|
dict[str, Any] | None
|
Arbitrary metadata about the error. |
None
|
Source code in src/synthorg/providers/errors.py
ModelNotFoundError
¶
Bases: ProviderError
Requested model does not exist or is not available.
Source code in src/synthorg/providers/errors.py
InvalidRequestError
¶
Bases: ProviderError
Malformed request (bad parameters, too many tokens, etc.).
The HTTP status stays at 422 (the provider rejected the request
as invalid), but the RFC 9457 error_category is
PROVIDER_ERROR to match the 7xxx error_code prefix --
the underlying signal originates from the upstream provider, not
from local boundary validation. DomainError.__init_subclass__
enforces the prefix-vs-category alignment at class-definition
time.
Source code in src/synthorg/providers/errors.py
ContentFilterError
¶
Bases: ProviderError
Request or response blocked by the provider's content filter.
Same prefix-vs-category alignment fix as :class:InvalidRequestError:
the 7xxx error_code keeps its semantic PROVIDER_ERROR
category; HTTP status stays 422.
Source code in src/synthorg/providers/errors.py
ProviderTimeoutError
¶
Bases: ProviderError
Request timed out waiting for provider response.
Source code in src/synthorg/providers/errors.py
ProviderConnectionError
¶
Bases: ProviderError
Network-level failure connecting to the provider.
Source code in src/synthorg/providers/errors.py
ProviderInternalError
¶
Bases: ProviderError
Provider returned a server-side error (5xx).
Source code in src/synthorg/providers/errors.py
DriverNotRegisteredError
¶
Bases: ProviderError
Requested provider driver is not registered in the registry.
Source code in src/synthorg/providers/errors.py
DriverAlreadyRegisteredError
¶
Bases: ProviderError
A driver with this name is already registered.
Reserved for future use if the registry gains mutable operations (add/remove after construction). Not currently raised.
Source code in src/synthorg/providers/errors.py
DriverFactoryNotFoundError
¶
Bases: ProviderError
No factory found for the requested driver type string.
Source code in src/synthorg/providers/errors.py
ProviderAlreadyExistsError
¶
Bases: ProviderError
A provider with this name already exists.
409 Conflict: provider name uniqueness violation, not a 502
upstream failure. Override the parent's 502 default so the
domain handler maps directly without a controller-level catch +
re-raise as ConflictError.
Source code in src/synthorg/providers/errors.py
ProviderNotFoundError
¶
Bases: ProviderError
A provider with this name does not exist.
404 Not Found: provider does not exist locally, not a 502 upstream failure. Override the parent's 502 default so the domain handler maps directly.
Source code in src/synthorg/providers/errors.py
ProviderModelNotFoundError
¶
Bases: ProviderError
A model identifier does not exist on the provider.
404 Not Found: distinct from ProviderNotFoundError (the whole
provider is missing) and from ProviderValidationError (the
request shape is wrong). Lets the API controller route missing-
model errors to HTTP 404 without parsing free-form validation
text.
Source code in src/synthorg/providers/errors.py
ProviderValidationError
¶
Bases: ProviderError
Provider configuration failed validation.
422 Unprocessable Entity: input shape is wrong, not a 502 upstream failure. Override the parent's 502 default so the domain handler maps directly.
Source code in src/synthorg/providers/errors.py
classify_provider_error
¶
Classify exc into one of nine bounded Prometheus label values.
Falls back to "other" for any exception not in the direct
canonical map, which guarantees the label set in
:data:VALID_PROVIDER_ERROR_CLASSES stays finite even as driver
implementations add new error types.
Uses a direct-type lookup first (cheapest), then falls back to
isinstance for the hierarchy so subclasses of the canonical
provider-error types are bucketed with their parents. Any
ProviderError subclass that is not in the direct map (e.g.
DriverNotRegisteredError, ProviderValidationError) and
unknown (non-ProviderError) exception types both resolve to
"other"; the Prometheus label set therefore stays bounded
regardless of what the provider driver raises.
Returns:
| Type | Description |
|---|---|
ProviderErrorLabel
|
One of the :data: |
ProviderErrorLabel
|
return type gives static guarantees to callers (e.g. the |
ProviderErrorLabel
|
Prometheus collector's |
ProviderErrorLabel
|
allowlisted labels flow through. |
Source code in src/synthorg/providers/errors.py
Capabilities¶
capabilities
¶
Model capability descriptors for provider routing decisions.
ModelCapabilities
pydantic-model
¶
Bases: BaseModel
Static capability and cost metadata for a single LLM model.
Used by the routing layer to decide which model handles a request based on required features (tools, vision, streaming) and cost.
Attributes:
| Name | Type | Description |
|---|---|---|
model_id |
NotBlankStr
|
Provider model identifier (e.g. |
provider |
NotBlankStr
|
Provider name (e.g. |
max_context_tokens |
int
|
Maximum context window size in tokens. |
max_output_tokens |
int
|
Maximum output tokens per request. |
supports_tools |
bool
|
Whether the model supports tool/function calling. |
supports_vision |
bool
|
Whether the model accepts image inputs. |
supports_streaming |
bool
|
Whether the model supports streaming responses. |
supports_streaming_tool_calls |
bool
|
Whether tool calls can be streamed. |
supports_system_messages |
bool
|
Whether system messages are accepted. |
cost_per_1k_input |
float
|
Cost per 1 000 input tokens, denominated in the
currency declared by the source that populated this capability
record -- either the provider preset (whose prices the operator
is responsible for keeping aligned with |
cost_per_1k_output |
float
|
Cost per 1 000 output tokens, same currency
semantics as |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
model_id(NotBlankStr) -
provider(NotBlankStr) -
max_context_tokens(int) -
max_output_tokens(int) -
supports_tools(bool) -
supports_vision(bool) -
supports_streaming(bool) -
supports_streaming_tool_calls(bool) -
supports_system_messages(bool) -
cost_per_1k_input(float) -
cost_per_1k_output(float)
Validators:
-
_validate_cross_field_constraints
supports_streaming_tool_calls
pydantic-field
¶
Supports streaming tool calls
cost_per_1k_input
pydantic-field
¶
Cost per 1k input tokens in the pricing currency declared by the capability source (provider preset or upstream model database); operators must keep preset pricing aligned with budget.currency
cost_per_1k_output
pydantic-field
¶
Cost per 1k output tokens; same currency semantics as cost_per_1k_input
Registry¶
registry
¶
Provider registry -- the Employment Agency.
Maps provider names to concrete BaseCompletionProvider driver
instances. Built from config via from_config, which reads each
provider's driver field to select the appropriate factory.
ProviderRegistry
¶
Immutable registry of named provider drivers.
Use from_config to build a registry from a config dict, or
construct directly with a pre-built mapping.
Examples:
Build from config::
registry = ProviderRegistry.from_config(
root_config.providers,
)
driver = registry.get("example-provider")
response = await driver.complete(messages, "medium")
Check membership::
if "example-provider" in registry:
...
Initialize with a name -> driver mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drivers
|
dict[str, BaseCompletionProvider]
|
Mutable dict of provider name to driver instance. The registry takes ownership and freezes a copy. |
required |
cassette_session
|
CassetteSession | None
|
The shared cassette session when the
cassette seam is active, else |
None
|
Source code in src/synthorg/providers/registry.py
get
¶
Look up a driver by provider name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Provider name (e.g. |
required |
Returns:
| Type | Description |
|---|---|
BaseCompletionProvider
|
The registered driver instance. |
Raises:
| Type | Description |
|---|---|
DriverNotRegisteredError
|
If no driver is registered. |
Source code in src/synthorg/providers/registry.py
list_providers
¶
__contains__
¶
__len__
¶
from_config
classmethod
¶
Build a registry from a provider config dict.
For each provider, reads the driver field to select a
factory. The factory is called with
(provider_name, config) to produce a driver instance.
When cassette is active every driver is wrapped in a
:class:CassetteCompletionProvider sharing one session -- the
single provider-layer chokepoint, so no consumer (engine,
coordinator, judge, runtime builder) can bypass record/replay.
In replay mode the inner driver is not built at all: no
factory is called, so a pure replay run constructs no real
provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
providers
|
Mapping[str, ProviderConfig]
|
Provider config dict (key = provider name). |
required |
factory_overrides
|
dict[str, object] | None
|
Optional driver-type -> factory mapping for testing or native SDK swaps. |
None
|
cassette
|
CassetteConfig | None
|
Cassette configuration; |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new |
Raises:
| Type | Description |
|---|---|
DriverFactoryNotFoundError
|
If a provider's |
Source code in src/synthorg/providers/registry.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 | |
LiteLLM Driver¶
litellm_driver
¶
LiteLLM-backed completion driver.
Wraps litellm.acompletion behind the BaseCompletionProvider
contract, mapping between domain models and LiteLLM's chat-completion
API.
LiteLLMDriver
¶
Bases: BaseCompletionProvider
Completion driver backed by LiteLLM.
Uses litellm.acompletion for both streaming and non-streaming
calls. Model identifiers are prefixed with the LiteLLM routing key
(litellm_provider if set, otherwise the provider name -- e.g.
example-provider/example-medium-001) so LiteLLM routes to the
correct backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider_name
|
str
|
Provider key from config (e.g. |
required |
config
|
ProviderConfig
|
Provider configuration including API key, base URL, and model definitions. |
required |
Raises:
| Type | Description |
|---|---|
ProviderError
|
All LiteLLM exceptions are mapped to the
|
Source code in src/synthorg/providers/drivers/litellm_driver.py
batch_get_capabilities
async
¶
Resolve capabilities for many models in a single tight loop.
Overrides the base implementation: each capability is built
from the static preset catalog plus a LiteLLM model-info
lookup, all of which is synchronous and in-process. Per-model
failures (unknown ids, validation errors) collapse to None
entries; MemoryError and RecursionError propagate.
Source code in src/synthorg/providers/drivers/litellm_driver.py
Routing¶
models
¶
Domain models for the routing engine.
ResolvedModel
pydantic-model
¶
Bases: BaseModel
A fully resolved model reference.
Attributes:
| Name | Type | Description |
|---|---|---|
provider_name |
NotBlankStr
|
Provider that owns this model (e.g. |
model_id |
NotBlankStr
|
Concrete model identifier (e.g. |
alias |
NotBlankStr | None
|
Short alias used in routing rules, if any. |
cost_per_1k_input |
float
|
Cost per 1,000 input tokens in the configured currency. |
cost_per_1k_output |
float
|
Cost per 1,000 output tokens in the configured currency. |
max_context |
int
|
Maximum context window size in tokens. |
estimated_latency_ms |
int | None
|
Estimated median latency in milliseconds. |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
provider_name(NotBlankStr) -
model_id(NotBlankStr) -
alias(NotBlankStr | None) -
cost_per_1k_input(float) -
cost_per_1k_output(float) -
max_context(int) -
estimated_latency_ms(int | None)
RoutingRequest
pydantic-model
¶
Bases: BaseModel
Inputs to a routing decision.
Not all fields are used by every strategy:
- ManualStrategy requires
model_override. - RoleBasedStrategy requires
agent_level. - CostAwareStrategy uses
task_typeandremaining_budget. - FastestStrategy uses
task_typeandremaining_budget. - SmartStrategy uses all fields in priority order.
Attributes:
| Name | Type | Description |
|---|---|---|
agent_level |
SeniorityLevel | None
|
Seniority level of the requesting agent. |
task_type |
NotBlankStr | None
|
Task type label (e.g. |
model_override |
NotBlankStr | None
|
Explicit model reference for manual routing. |
remaining_budget |
float | None
|
Per-request cost ceiling. Compared against
each model's |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
agent_level(SeniorityLevel | None) -
task_type(NotBlankStr | None) -
model_override(NotBlankStr | None) -
remaining_budget(float | None)
remaining_budget
pydantic-field
¶
Per-request cost ceiling in the configured currency, compared against model total_cost_per_1k. Not a total session budget.
RoutingDecision
pydantic-model
¶
Bases: BaseModel
Output of a routing decision.
Attributes:
| Name | Type | Description |
|---|---|---|
resolved_model |
ResolvedModel
|
The chosen model. |
strategy_used |
NotBlankStr
|
Name of the strategy that produced this decision. |
reason |
NotBlankStr
|
Human-readable explanation. |
fallbacks_tried |
tuple[str, ...]
|
Model refs that were tried before the final choice. |
Config:
frozen:Trueallow_inf_nan:Falseextra:forbid
Fields:
-
resolved_model(ResolvedModel) -
strategy_used(NotBlankStr) -
reason(NotBlankStr) -
fallbacks_tried(tuple[str, ...])
router
¶
Model router -- main entry point for routing decisions.
Constructed from RoutingConfig and a provider config dict.
Delegates to strategy implementations.
ModelRouter
¶
Route requests to the appropriate LLM model.
Examples:
Build from config::
router = ModelRouter(
routing_config=root_config.routing,
providers=root_config.providers,
)
decision = router.route(
RoutingRequest(agent_level=SeniorityLevel.SENIOR),
)
Initialize the router.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
routing_config
|
RoutingConfig
|
Routing configuration (strategy, rules, fallback). |
required |
providers
|
dict[str, ProviderConfig]
|
Provider configurations keyed by provider name. |
required |
selector
|
ModelCandidateSelector | None
|
Optional candidate selector for multi-provider
model resolution. Defaults to |
None
|
Raises:
| Type | Description |
|---|---|
UnknownRoutingStrategyError
|
If the configured strategy is not recognized. |
Source code in src/synthorg/providers/routing/router.py
route
¶
Route a request to a model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
RoutingRequest
|
Routing inputs. |
required |
Returns:
| Type | Description |
|---|---|
RoutingDecision
|
A routing decision with the chosen model. |
Raises:
| Type | Description |
|---|---|
ModelResolutionError
|
If a required model cannot be found. |
NoAvailableModelError
|
If all candidates are exhausted. |
Source code in src/synthorg/providers/routing/router.py
strategies
¶
Routing strategies -- stateless implementations of RoutingStrategy.
Each strategy selects a model given a RoutingRequest, a
RoutingConfig, and a ModelResolver. Strategies are stateless
singletons registered in a module-level mapping.
STRATEGY_MAP
module-attribute
¶
STRATEGY_MAP = MappingProxyType(
{
STRATEGY_NAME_MANUAL: _MANUAL,
STRATEGY_NAME_ROLE_BASED: _ROLE_BASED,
STRATEGY_NAME_COST_AWARE: _COST_AWARE,
STRATEGY_NAME_FASTEST: _FASTEST,
STRATEGY_NAME_SMART: _SMART,
STRATEGY_NAME_CHEAPEST: _COST_AWARE,
}
)
Maps config strategy names to singleton instances.
RoutingStrategy
¶
Bases: Protocol
Protocol for model routing strategies.
select
¶
Select a model for the given request.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
RoutingRequest
|
Routing inputs (agent level, task type, etc.). |
required |
config
|
RoutingConfig
|
Routing configuration (rules, fallback chain). |
required |
resolver
|
ModelResolver
|
Model resolver for alias/ID lookup. |
required |
Returns:
| Type | Description |
|---|---|
RoutingDecision
|
A routing decision with the chosen model. |
Raises:
| Type | Description |
|---|---|
ModelResolutionError
|
If the requested model cannot be found. |
NoAvailableModelError
|
If all candidates are exhausted. |
Source code in src/synthorg/providers/routing/strategies.py
ManualStrategy
¶
Resolve an explicit model override.
Requires request.model_override to be set.
select
¶
Select the explicitly requested model.
Raises:
| Type | Description |
|---|---|
ModelResolutionError
|
If |
Source code in src/synthorg/providers/routing/strategies.py
RoleBasedStrategy
¶
Select model based on agent seniority level.
Matches the first routing rule where rule.role_level equals
request.agent_level. If no rule matches, uses the seniority
catalog's typical_model_tier as a fallback lookup.
select
¶
Select model based on role level.
Raises:
| Type | Description |
|---|---|
ModelResolutionError
|
If no agent_level is set. |
NoAvailableModelError
|
If all candidates are exhausted. |
Source code in src/synthorg/providers/routing/strategies.py
CostAwareStrategy
¶
Select the cheapest model, optionally respecting a budget.
Matches task_type rules first, then falls back to the cheapest
model from the resolver.
select
¶
Select the cheapest available model.
Raises:
| Type | Description |
|---|---|
NoAvailableModelError
|
If no models are registered. |
Source code in src/synthorg/providers/routing/strategies.py
FastestStrategy
¶
Select the fastest model, optionally respecting a budget.
Matches task_type rules first, then falls back to the fastest
model from the resolver. When no models have latency data,
delegates to cheapest (lower-cost models are typically smaller
and faster, making cost a reasonable proxy).
select
¶
Select the fastest available model.
Raises:
| Type | Description |
|---|---|
NoAvailableModelError
|
If no models are registered. |
Source code in src/synthorg/providers/routing/strategies.py
SmartStrategy
¶
Combined strategy with priority-based signal merging.
Priority order: model_override > task_type rules > role_level rules > seniority default > cheapest available (budget-aware) > global fallback_chain > exhausted.
select
¶
Select a model using all available signals.
Raises:
| Type | Description |
|---|---|
NoAvailableModelError
|
If all candidates are exhausted. |
Source code in src/synthorg/providers/routing/strategies.py
Resilience¶
retry
¶
Retry handler with exponential backoff and jitter.
RetryResult
dataclass
¶
Bases: Generic[T]
Immutable result of a retry-wrapped execution.
Returned by :meth:RetryHandler.execute so callers get
per-invocation retry metadata without shared mutable state.
Attributes:
| Name | Type | Description |
|---|---|---|
value |
T
|
The return value of the wrapped callable. |
attempt_count |
int
|
Number of attempts made (1 = no retry). |
retry_reason |
str | None
|
Exception type name if a retry occurred. |
RetryHandler
¶
Wraps async callables with retry logic.
Retries transient errors (is_retryable=True) using exponential
backoff with optional jitter. Non-retryable errors raise immediately.
After exhausting max_retries, raises RetryExhaustedError.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
RetryConfig
|
Retry configuration. |
required |
Source code in src/synthorg/providers/resilience/retry.py
execute
async
¶
Execute func with retry on transient errors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable[[], Coroutine[object, object, T]]
|
Zero-argument async callable to execute. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
RetryResult[T]
|
class: |
RetryResult[T]
|
per-invocation retry metadata. |
Raises:
| Type | Description |
|---|---|
RetryExhaustedError
|
If all retries are exhausted. |
ProviderError
|
If the error is non-retryable. |
Source code in src/synthorg/providers/resilience/retry.py
rate_limiter
¶
Client-side rate limiter with RPM and concurrency controls.
RateLimiter
¶
Client-side rate limiter with RPM tracking and concurrency control.
Uses a sliding window for RPM tracking and an asyncio semaphore for
concurrency limiting. Supports pause-until from provider
retry_after hints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
RateLimiterConfig
|
Rate limiter configuration. |
required |
provider_name
|
str
|
Provider name for logging context. |
required |
clock
|
Clock | None
|
Time source for RPM-window timestamps and pause-until
tracking. Defaults to |
None
|
Source code in src/synthorg/providers/resilience/rate_limiter.py
acquire
async
¶
Wait for an available slot.
Blocks until both the RPM window and concurrency semaphore allow a new request. Also respects any active pause.
Source code in src/synthorg/providers/resilience/rate_limiter.py
release
¶
pause
¶
Block new requests for seconds.
Called when a RateLimitError with retry_after is received.
Multiple calls take the latest pause-until if it extends further.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seconds
|
float
|
Duration to pause in seconds. Must be finite and non-negative. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If seconds is negative or not finite. |
Source code in src/synthorg/providers/resilience/rate_limiter.py
errors
¶
Resilience-specific error types.
RetryExhaustedError
¶
Bases: ProviderError
All retry attempts exhausted for a retryable error.
Raised by RetryHandler when max_retries is reached.
The engine layer catches this to trigger fallback chains.
Attributes:
| Name | Type | Description |
|---|---|---|
original_error |
The last retryable error that was raised. |
Initialize with the original error that exhausted retries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_error
|
ProviderError
|
The last retryable |
required |