Skip to content

Scoring hyperparameters

This page tracks the operator-tunable scoring weights and thresholds across the SynthOrg engine. Each entry lists the current default, the setting that controls it, and a rationale for the value. Where rationale reads "audit-set placeholder", the value carries no validated empirical derivation; it ships as a starting point and is subject to revision when an evaluation infrastructure for the relevant scoring path is in place.

This document is tracking-only. Hyperparameter improvements come from manual bug fixes, architectural changes, and prompt engineering; not from auto-tuning sweeps. See issue #1739 for the convention rule that lifted these values from bare numeric literals into settings.

Routing scorer (AgentTaskScorer)

The agent-task routing scorer assigns a 0-1 fitness score to every candidate agent for a given subtask. Sum of weights + bonuses is 1.1 with the tag bonus; the caller caps at 1.0.

Setting Default Controls
engine.routing.weight_primary_skill 0.4 Weight on primary-skill overlap component.
engine.routing.weight_secondary_skill 0.2 Weight on secondary-skill overlap (excluding primary matches).
engine.routing.weight_tag_match_bonus 0.1 Bonus when every required tag is covered by matched skills.
engine.routing.weight_role_match_bonus 0.2 Bonus on role-name match (case-insensitive).
engine.routing.weight_seniority_alignment_bonus 0.2 Bonus on seniority/complexity alignment.
engine.routing.min_score 0.1 Minimum viable candidate score; below filters out before ranking.

Rationale. Audit-set placeholders calibrated so primary-skill overlap dominates (0.4) while role and seniority each independently push fit by 0.2. The tag bonus at 0.1 keeps tag-match a tiebreaker rather than a primary axis. min_score=0.1 filters out candidates that score on seniority alone (matches the seniority bonus). No empirical derivation; revisit when routing-decision telemetry is in place.

Model matcher (match_model)

Selects the best provider-model fit for a tier-bound ModelRequirement. Three score components: tier base + headroom + priority alignment.

Setting Default Controls
engine.matcher.tier_base_score 0.5 Floor when a model satisfies the tier.
engine.matcher.headroom_max_bonus 0.25 Max bonus when context window comfortably exceeds the requirement.
engine.matcher.priority_max_bonus 0.25 Max bonus from priority-axis ranking (cost/quality/speed).
engine.matcher.headroom_ratio_cap 2.0 Maximum context-headroom multiple credited.
engine.matcher.balanced_partial_credit 0.125 Bonus for the balanced-priority "no preference" fallback.

Rationale. Audit-set placeholders chosen so tier match alone gives 0.5, headroom adds up to 0.25, and priority alignment adds up to 0.25. The 2.0 ratio cap means a model with twice the requested context gets the full headroom bonus; beyond that, more headroom is wasted on the priority axis. Balanced partial credit at 0.125 is half of priority_max_bonus. No empirical derivation; revisit alongside matcher-quality telemetry.

Heuristic quality grader (HeuristicRubricGrader)

Rule-based fallback grader used when no LLM grader is configured. Grades probes by checking whether each criterion's source text appears in the artifact payload (case-insensitive).

Setting Default Controls
engine.quality.heuristic.pass_threshold 0.5 Probe-pass-ratio threshold; ratio greater than or equal to this value earns the PASS verdict.
engine.quality.heuristic.pass_grade 0.8 Per-criterion grade on pass.
engine.quality.heuristic.fail_grade 0.3 Per-criterion grade on fail.
engine.quality.heuristic.confidence_ceiling 0.9 Maximum reported confidence.
engine.quality.heuristic.confidence_bias 0.1 Additive bias on derived confidence (prevents 0%).

Rationale. Audit-set placeholders. Pass threshold of 0.5 means "at least half the probes match". Pass/fail grades of 0.8/0.3 give a clean PASS-vs-FAIL split that downstream consumers can threshold against. Confidence ceiling 0.9 acknowledges the heuristic is deterministic but not authoritative; bias 0.1 ensures every grading returns at least some confidence. Revisit when LLM-graded evaluations create comparison ground-truth.

Default client feedback (_build_default_client)

Synthetic feedback profile attached to default AIClient instances. The strictness multiplier scales a profile's strictness_level onto the 0-1 acceptance curve.

Setting Default Controls
client.scored_feedback.passing_score 0.5 Default passing-score threshold.
client.scored_feedback.strictness_multiplier 2.0 Multiplier on profile strictness for acceptance sensitivity.
client.scored_feedback.strictness_floor 0.1 Floor on the multiplier (keeps strictness=0 from disabling feedback).

Rationale. Audit-set placeholders. Passing score of 0.5 sits at the midpoint, treating exactly-half-correct interactions as the boundary. Strictness multiplier of 2.0 means a profile with strictness_level=0.5 produces an effective multiplier of 1.0 (the "neutral" point); strictness_level=1.0 doubles sensitivity. The 0.1 floor keeps the multiplier non-zero so feedback weighting never collapses entirely. No empirical derivation; revisit when client simulation calibration data is available.

Limit / timeout settings (cluster #28)

Operator-tunable concurrency, retry, and shutdown budgets. The consumer modules (rate-limit stores, CAS retry handler, bus bridge, dispatcher, lifecycle stop sequence, fine-tune chunker) keep module-level constants that mirror these defaults so a service constructed without an explicit settings handle still observes the documented behaviour. Each fallback constant carries a # lint-allow: magic-numbers -- bootstrap fallback for <yaml_path> marker pointing at the canonical setting.

Setting Default Controls
api.rate_limit.gc_every_n_acquires 1024 Sliding-window limiter: acquires between cold-bucket GC sweeps.
api.rate_limit.gc_min_horizon_seconds 60 Sliding-window limiter: floor on the cold-bucket eviction horizon.
api.rate_limit.inflight_gc_every_n_acquires 1024 Inflight (per-op concurrency) limiter: acquires between GC sweeps.
api.rate_limit.inflight_min_retry_after_seconds 1 Inflight limiter: minimum Retry-After value emitted on 429.
coordination.cas.max_attempts 2 CAS attempt budget for optimistic-concurrency mutations (counts the first call).
communication.bus_bridge.max_consecutive_errors 30 Bus bridge poll-loop error budget before back-off escalates.
communication.bus_bridge.drain_timeout_seconds 10.0 Bus bridge stop() drain hard deadline.
workers.dispatcher.publish_max_attempts 3 Task-claim publish retry budget.
workers.dispatcher.publish_backoff_base_seconds 0.1 Dispatcher exponential-backoff base.
workers.dispatcher.publish_backoff_cap_seconds 1.0 Dispatcher backoff per-attempt ceiling.
memory.fine_tune.vram_batch_table [[40,128],[16,64],[8,32]] VRAM-to-batch-size mapping for embedding fine-tune preflight.
memory.fine_tune.chunk_size 512 Word-chunk size for synthetic-data generation.
communication.loop_prevention_window_seconds 60.0 Per-pair delegation rate-limit window.
api.lifecycle.task_engine_shutdown_seconds 8.0 Lifecycle stop step deadline (task engine).
api.lifecycle.meeting_scheduler_shutdown_seconds 2.0 Lifecycle stop step deadline (meeting scheduler).
api.lifecycle.performance_tracker_shutdown_seconds 2.0 Lifecycle stop step deadline (performance tracker).
api.lifecycle.backup_shutdown_seconds 5.0 Lifecycle stop step deadline (backup service).
api.lifecycle.settings_dispatcher_shutdown_seconds 2.0 Lifecycle stop step deadline (settings dispatcher).
api.lifecycle.bridge_shutdown_seconds 2.0 Lifecycle stop step deadline (bus / webhook bridge, per bridge).
api.lifecycle.distributed_queue_shutdown_seconds 3.0 Lifecycle stop step deadline (JetStream distributed queue).
api.lifecycle.message_bus_shutdown_seconds 3.0 Lifecycle stop step deadline (in-process message bus).
api.lifecycle.persistence_shutdown_seconds 5.0 Lifecycle stop step deadline (persistence backend drain + checkpoint).
api.lifecycle.approval_timeout_shutdown_seconds 1.0 Lifecycle stop step deadline (approval timeout scheduler).
api.lifecycle.drain_timeout_seconds 40.0 Outer asyncio.wait_for budget around the cumulative stop sequence.

Rationale. Audit-set placeholders. Rate-limiter GC at 1024 acquires balances bookkeeping latency against stale-bucket retention under typical request volumes; the 60s horizon matches the common sliding-window default. CAS attempt budget of 2 (one retry) keeps mutation contention bounded without amplifying load on a hot row. Dispatcher 3 attempts with 0.1s base / 1.0s cap absorbs a transient NATS reconnect without pushing publish latency into the multi-second tail. VRAM table (40GB->128, 16GB->64, 8GB->32) is a conventional GPU-memory-to-batch heuristic for transformer fine-tunes; revisit when distinct GPU profiles surface. Lifecycle stage budgets sum to ~33s under the 40s drain ceiling, leaving ~7s of headroom so the outer hard deadline does not pre-empt normally-progressing stages yet still bounds stuck-stage tail latency; persistence and task-engine claim the largest slices because they own connection-pool drains and in-flight task finalisation respectively. Revisit when telemetry shows real-world shutdown timing distributions.