CLAUDE.md Reference: Infrequently Needed Sections¶

Moved from root CLAUDE.md to reduce per-message token cost. Read on demand.

Documentation¶

Docs: docs/ (Markdown, built with Zensical, config: mkdocs.yml). Also embedded in the web Docker image at /docs/ via a docs-builder stage
Design spec: docs/design/ (50 pages) , Architecture: docs/architecture/, Roadmap: docs/roadmap/
Security: docs/security.md, Licensing: docs/licensing.md, Reference: docs/reference/
REST API reference: docs/openapi/index.md (landing page) + docs/openapi/reference.html (Scalar viewer) + docs/openapi/openapi.json (schema). The viewer and schema are generated by scripts/export_openapi.py and written as static siblings of the landing page so zensical copies them through on build. Sitemap policy: scripts/patch_sitemap.py includes reference.html (a real landing page that should be discoverable) but excludes openapi.json (Google never renders raw JSON in search results, and including it created permanent "Discovered, currently not indexed" noise in Search Console).
Comparison data: data/competitors.yaml, the shared YAML source for docs/reference/comparison.md (generated by scripts/generate_comparison.py) and site/src/pages/compare.astro
Library reference: docs/api/ (auto-generated via mkdocstrings + Griffe, AST-based)
Scripts: scripts/: CI/build utilities and development-time validation hooks (relaxed ruff rules: print and deferred imports allowed). Validation hooks include: check_push_rebased.sh (blocks push if behind main), check_bash_no_write.sh (blocks file writes via Bash), check_git_c_cwd.sh (blocks unnecessary git -C), check_web_design_system.py (validates design tokens on web file edits). CI scripts include: evaluate-scan.sh (DRY Trivy JSON result evaluation), cis-scan.sh (CIS Docker Benchmark wrapper), report-image-size.sh (image size reporting to step summary)
Landing page: site/ (Astro + React islands via @astrojs/react). Includes /get/ CLI install page, /compare/ framework comparison, contact form, interactive dashboard preview, SEO
Deps: docs group in pyproject.toml (zensical, mkdocstrings[python], griffe-pydantic)

Docker¶

# Build and run (from repo root)
cp docker/.env.example docker/.env        # configure env vars
docker compose -f docker/compose.yml build
docker compose -f docker/compose.yml up -d
docker compose -f docker/compose.yml down

# Verify
curl http://localhost:3001/api/v1/readyz   # backend (direct)
curl http://localhost:3000/api/v1/readyz   # backend (via web proxy)

Images: backend (Wolfi apko-composed distroless, non-root), web (Caddy pure-apko, SPA + API proxy + embedded docs), sandbox (Python + Node.js Wolfi, non-root)
Config: all Docker files in docker/: Dockerfiles, compose, .env.example. Single root .dockerignore (all images build with context: .)
Verification: CLI verifies cosign signatures + SLSA provenance at pull time; bypass with --skip-verify
Tags: version from pyproject.toml, semver, SHA, plus dev tags (v0.8.4-dev.3, dev rolling) for dev channel builds

Package Structure¶

src/synthorg/
  api/            # Litestar REST + WebSocket API, RFC 9457 errors, setup wizard, personality presets, auth/ (role-based access control, HttpOnly cookie sessions, CSRF double-submit, lockout / refresh-token / session repositories under persistence/{sqlite,postgres}/, concurrent session enforcement, user presence, OrgRole enum for org config permissions), guards (HumanRole-based + OrgRole-based with department scoping via require_org_mutation), user management (CRUD + org-role grant/revoke), dto_org (request DTOs for company/department/agent mutations), dto_workflow (request/response DTOs for workflow definition and execution operations), services/org_mutations (read-modify-write config mutation service), auto-wiring, lifecycle (auto-promote first owner), bootstrap (agent registry init from config), template packs (list + live-apply), memory admin (fine-tuning pipeline with orchestrator, checkpoint management, preflight checks, run history, embedder queries), optimistic concurrency (ETag/If-Match), TLS config, tiered rate limiting (unauth by IP, auth by user ID), rate_limits/policies (RATE_LIMIT_POLICIES canonical registry of (max_requests, window_seconds) defaults per operation id + per_op_rate_limit_from_policy helper; operator overrides flow through PerOpRateLimitConfig.overrides separately), a2a/ (the inbound A2A JSON-RPC gateway controller + well-known Agent Card controller; sits outside controllers/ because it maps domain errors to JSON-RPC error envelopes, not the RFC 9457 envelope the REST controllers raise), workflows (visual workflow definition CRUD, validation, YAML export, blueprint listing, blueprint instantiation, version history, diff, rollback), workflow executions (activate, list, get, cancel), ceremony policy (project + per-department query/override, resolved policy with field origins), quality overrides (per-agent quality score override CRUD), reports (on-demand report generation, period listing), notification_dispatcher (fan-out notification sink), training (training plan CRUD, execution, preview, overrides)
  backup/         # Backup/restore orchestrator, scheduler, retention, handlers/
  budget/         # Cost tracking, budget enforcement, quota degradation (including synchronous peek for routing-time selector hints), CFO optimization, trend analysis, budget forecasting, configurable currency formatting, risk budget (cumulative risk-unit tracking, risk scoring integration, risk check, risk records), automated reporting (periodic comprehensive reports, spending/performance/task-completion/risk-trends templates, report scheduling config), coordination metrics (9 empirical metrics: efficiency, overhead, error amplification, message density, redundancy, Amdahl ceiling, straggler gap, token/speedup ratio, message overhead), project cost aggregates (durable per-project lifetime cost totals surviving retention pruning)
  cli/            # Python CLI module (superseded by top-level cli/ Go binary)
  client/         # Client simulation: ai_client, human_client, hybrid_client, pool, adapters, runner, continuous, store, simulation_state, config, models, protocols, feedback/ (binary, scored, criteria_check, adversarial), generators/ (template, llm, dataset, procedural, hybrid), report/ (detailed, summary, metrics_only, json_export)
  a2a/            # Optional A2A external federation domain (JSON-RPC 2.0): agent_card (safe-subset projection), client (outbound federation), models (A2A protocol types), task_mapper + message_mapper (bidirectional mapping), config, security (peer validation, payload limits), peer_registry, push_verifier (HMAC-SHA256), connection_types/ (a2a_peer registration). The inbound Litestar controllers live under api/a2a/.
  communication/  # Message bus, dispatcher, channels, delegation, conflict resolution, meeting/, event_stream/ (AG-UI SSE hub, event projector, interrupt/resume protocol, evidence package re-exports)
  config/         # YAML company config loading and validation
  core/           # Shared domain models, base classes, resilience config, immutable (deep_copy_mapping, freeze_recursive for frozen Pydantic field protection), tool_disclosure (ToolL1Metadata, ToolL2Body, ToolL3Resource), tool_constraints (ToolSubConstraints, the five dimension enums, get_sub_constraints; placed in core so core.agent need not import the tools hub), delegation_types (DelegationRequest/DelegationResult/DelegationRecord value objects; placed in core so engine.classification need not import the communication hub, breaking an engine<->communication cold-import cycle), scheduler (AsyncCycleScheduler base + MIN_INTERVAL_SECONDS for periodic background schedulers), url_redaction (redact_url + REDACTED_QUERY/USERINFO_MASK/QueryPolicy, the single URL-credential redactor for providers/communication/git)
  execution/      # Light leaf for execution-trace types, engine-free so non-engine consumers (e.g. budget.coordination_collector) can import them cold: turn (TurnRecord, NodeType, BehaviorTag), efficiency (EfficiencyRatios, IdealTrajectoryBaseline, compute_efficiency_ratios), view (ExecutionResultView runtime-checkable protocol), parked_context (ParkedContext, the serialised parked-agent snapshot the persistence/worker/API layers name without pulling engine). See ADR-0012.
  engine/         # Orchestration, execution loops, task engine (observer registration, background observer dispatch), coordination, checkpoint recovery, structured failure diagnosis (FailureCategory, infer_failure_category, RecoveryResult failure_context/criteria_failed/stagnation_evidence), approval/review gates (no-self-review enforcement via SelfReviewError, immutable DecisionRecord drop-box), stagnation detection, context budget, compaction, hybrid loop, prompt profiles (tier-based prompt adaptation, personality trimming via max_personality_tokens), procedural memory integration (failure-driven), post_execution/ (extracted memory hooks -- distillation capture, procedural memory pipeline, evolution trigger), evolution/ (pluggable trigger/proposer/guard/adapter pipeline, EvolutionService orchestrator, EvolutionConfig with safe defaults, triggers/ (batched/inflection/per-task/composite), proposers/ (separate-analyzer/self-report/composite), adapters/ (identity/strategy-selection/prompt-template), guards/ (rate-limit/review-gate/rollback/shadow-evaluation (with shadow_protocol.py protocols + shadow_providers.py Configured/RecentHistory strategies)/approve-all (no-op fallback when every guard is disabled)/composite)), identity/ (diff utilities, store/ (IdentityVersionStore protocol, append-only + copy-on-write implementations with rollback)), workspace/ (git worktree isolation, merge orchestration, semantic conflict detection), quality/ (step-level quality signal classifier, accuracy-effort ratio, StepQualityClassifier protocol), health/ (two-layer health monitoring pipeline, HealthJudge + TriageFilter, EscalationTicket, NotificationSink wiring), trajectory/ (best-of-K trajectory scoring, TrajectoryScorer, budget guard, TrajectoryConfig), intake/ (IntakeEngine lifecycle walker, strategies/ (DirectIntake pass-through, AgentIntake LLM-driven triage)), review/ (ReviewPipeline chain walker, stages/ (InternalReviewStage, ClientReviewStage)), completion_oracle/ (the build/test/review completion oracle: classifier + evaluator for the deterministic build/test verdict, gate/runner/reviewer_identity/prompt/tools for the agent-session peer reviewer, in-memory + dual-backend report archive; on by default, fails CLOSED to escalation), workflow/ (Kanban board, Agile sprints, WIP limits, sprint lifecycle, velocity tracking, ceremony scheduling, strategy migration, strategies/ (pluggable scheduling strategies), velocity_calculators/ (pluggable velocity calculators), definition (visual workflow graph model, node/edge types, validation, YAML export), blueprint_loader (starter blueprint loading), blueprint_models (blueprint data models), blueprints/ (5 YAML starter templates), diff (version diff computation), version (version snapshot model), execution (workflow activation service, execution models, condition evaluator (compound AND/OR/NOT), graph utilities, execution_observer (TaskEngine bridge for lifecycle transitions), execution_activation_helpers (graph walking, conditional processing, task config parsing), execution_lifecycle (execution transitions, status management, task-event handling), subworkflow_registry (subworkflow publishing, version resolution, parent references))), strategy/ (trendslop mitigation: strategic lenses, constitutional principles, confidence calibration, cost tier resolution, lens_assignment (LensAssigner protocol, DiversityMaximizingAssigner round-robin), consensus (ConsensusVelocityDetector, ConsensusAction), premortem (PremortemExecutor protocol, DefaultPremortemExecutor, FailureMode, PremortemOutput)), delegation/ (blocking sub-agent delegation: SubAgentRunner protocol + InProcessSubAgentRunner reusing AgentEngine.run inline for a child Task, SubAgentDelegationSpec/Result, depth + cycle guard bounding recursion)
  hr/             # Hiring, firing, onboarding, agent registry (evolve_identity for evolution-approved changes), performance tracking (InflectionSink protocol, PerformanceInflection events for trend direction changes), activity timeline, activity event types, cost event redaction, career history, evaluation/ (five-pillar evaluation framework, pluggable pillar scoring strategies, EvaluationConfig), quality scoring (layered composite: CI signal + LLM judge + human override, QualityOverrideStore), scaling/ (dynamic company scaling: ScalingService orchestrator with runtime strategy enable/disable and priority reordering, domain models (ScalingSignal/ScalingContext/ScalingDecision/ScalingActionRecord), enums (ScalingActionType/ScalingOutcome/ScalingStrategyName), error types, ScalingContextBuilder (signal aggregation with graceful degradation), pluggable ScalingStrategy/ScalingSignalSource/ScalingTrigger/ScalingGuard protocols, strategies/ (WorkloadAutoScale, BudgetCap, SkillGap, PerformancePruning with evolution deferral), signals/ (workload, budget, skill, performance read-only adapters), triggers/ (BatchedScalingTrigger with overlap protection, SignalThresholdTrigger with crossing detection, CompositeScalingTrigger), guards/ (ConflictResolver with MappingProxyType-wrapped priority, CooldownGuard, RateLimitGuard with batch-aware enforcement, ApprovalGateGuard, CompositeScalingGuard with public get_guards()), config (per-strategy + trigger + guard), factory), training/ (pluggable training pipeline: TrainingService orchestrator, TrainingPlan/TrainingResult models, factory, config, selectors/ (role_top_performers, department_diversity, user_curated, composite), extractors/ (procedural, semantic, tool_patterns), curateurs/ (relevance, llm_curated), guards/ (sanitization, volume_cap, review_gate), onboarding_integration)
  integrations/   # External-service integrations: connections/ (typed connection catalog with encrypted credentials, health checks, rate limits), webhooks, oauth/ (OAuthTokenManager, PKCE), health/ (per-type health checks), tunnel/ (multi-provider public-URL tunnel: TunnelManager facade, cloudflare/ngrok/devtunnels adapters, shared binary auto-download, TunnelService MCP facade), mcp_services (client/artifact/ontology/catalog/oauth facades), rate-limit coordinator
  notifications/  # NotificationSink protocol, NotificationDispatcher fan-out, Notification model (category taxonomy: approval/budget/security/system/agent/health + severity taxonomy), adapters/ (console, ntfy, slack, email), config
  ontology/       # Semantic ontology subsystem: @ontology_entity decorator, OntologyBackend protocol, SQLiteOntologyBackend, OntologyService (bootstrap + CRUD), OntologyConfig (6 sub-configs), EntityDefinition/EntityField/EntityRelation models, versioning integration, drift detection types, error hierarchy, observability events
  memory/         # Pluggable MemoryBackend, retrieval pipeline (hybrid dense+BM25 sparse with RRF fusion, MMR diversity re-ranking via apply_diversity_penalty with pre-computed bigram cache), tool-based injection strategy with iterative Search-and-Ask reformulation loop (fail-safe reformulator/sufficiency_checker), ToolRegistry memory tool wrappers (SearchMemoryTool, RecallMemoryTool), fail-closed memory filter, agentic query reformulation, org memory, backends/ (sqlvector durable dense+lexical over the operational DB via pgvector/sqlite-vec, composite namespace-based routing, inmemory session-scoped and discouraged, EmbeddingCostConfig embedding cost tracking), consolidation/ (CompositeConsolidationStrategy composing pluggable ConsolidationOps: ConcatenationOp, SingleModeOp, DensityRoutingOp density-aware, LLMSynthesisOp with parallel TaskGroup per-category processing + trajectory-context injection from distillation entries, LLMConsolidationConfig, DistillationRequest capture helper tagged "distillation" EPISODIC, retention, archival), embedding/ (LMEB-ranked model selection, embedder config resolution, fine-tuning pipeline with orchestrator, cancellation, checkpoint management), procedural/ (failure-driven auto-generation, proposer LLM pipeline, SKILL.md materialization, ProceduralMemoryConfig, capture/ (failure/success/hybrid capture strategies), pruning/ (TTL/Pareto/hybrid pruning strategies), propagation/ (none/role-scoped/department-scoped cross-agent propagation))
  persistence/    # Pluggable PersistenceBackend, SQLite + Postgres backends, settings + user + artifact + project + preset + workflow definition + workflow execution + workflow version + agent identity versions + fine-tune + decision record (append-only audit drop-box) + completion oracle report (append-only peer-review verdict archive, twin of decision record) + risk override + SSRF violation + project cost aggregate + training plan + training result repositories, artifact content storage (pluggable ArtifactStorageBackend, filesystem impl), migrations.py + migration_helpers.py (yoyo-migrations runner coroutines and URL/discovery/result-dataclass helpers, in-process), sqlite/revisions/ + postgres/revisions/ (revision .sql files), optional TimescaleDB hypertable support for append-only time-series tables
  versioning/     # Generic versioning infrastructure: VersionSnapshot[T] model, VersioningService[T] (content-addressable deduplication via SHA-256 hash, INSERT OR IGNORE concurrent-write safety), compute_content_hash
  telemetry/      # Opt-in product telemetry (disabled by default): TelemetryReporter protocol, TelemetryEvent model, PrivacyScrubber (allowlist + forbidden pattern validation), TelemetryCollector (heartbeat scheduling, deployment ID persistence, environment resolution chain), host_info (Docker daemon `/info` enrichment for startup events via aiodocker), reporters/ (LogfireReporter, NoopReporter), TelemetryConfig
  observability/  # Structured logging, correlation tracking, redaction, third-party logger taming, log shipping (syslog, HTTP), compressed archival, events/
  providers/      # LLM provider abstraction, presets, model auto-discovery, capabilities, runtime CRUD (management/), local model management (pull/delete/config via LocalModelManager protocol), provider families, discovery SSRF allowlist, health tracking, active health probing, native image generation (generate_image capability via ImageGenerationMixin + ImageGenerationProvider protocol, image_models.py domain models, drivers/litellm_image.py hosted + drivers/scripted_image.py offline, supports_image_generation flag, cost_per_image billing), defaults_config (ProviderModelDefaults: last-resort metadata fallbacks when LiteLLM exposes no per-model data, e.g. fallback_max_output_tokens), routing/ (strategy-based model routing, multi-provider resolution with ModelCandidateSelector protocol, QuotaAwareSelector, CheapestSelector)
  settings/       # Runtime-editable settings (DB > env > code), Fernet encryption, ConfigResolver, bootstrap_resolver (pre-init env > default), definitions/, subscribers/ (SecuritySubscriber for discovery allowlist hot-reload)
  security/       # Rule engine, audit log, output scanner, autonomy levels, timeout policies, LLM fallback evaluator, custom policy rules, risk scoring (pluggable RiskScorer protocol, multi-dimensional RiskScore, DefaultRiskScorer), enforcement modes (active/shadow/disabled via SecurityEnforcementMode), risk override (SecOps risk tier reclassification via RiskTierOverride + SecOpsRiskClassifier), SSRF violation tracking (SsrfViolation model, pending/allowed/denied status for self-healing discovery allowlist)
  templates/      # Pre-built company templates (inheritance tree), template merge engine, personality presets, preset discovery/CRUD service, model requirements, tier-to-model matching, locale-aware name generation, workflow config rendering, pack_loader (additive team packs), packs/ (built-in pack YAMLs), uses_packs composition
  workers/        # Agent worker pool, task dispatcher, runtime service builder + engine assembly (_engine_assembly.py wires the tool registry; _image_provider_wiring.py resolves the boot design image provider from the design settings, off by default; _completion_oracle_runtime.py resolves the completion-oracle config + reviewer tier and re-attaches the oracle gates on boot and hot-reload)
  meta/           # Self-improvement meta-loop: signal aggregation (7 domains), rule engine (10 built-in rules + custom declarative rules via dashboard), improvement strategies (config/architecture/prompt tuning), proposal guards (scope/rollback/rate-limit/approval), rollout (before-after/canary, tiered regression detection), appliers (config/prompt/architecture/code each expose dry_run() validation via shared appliers/_validation.py helpers: parse_dotted_path, apply_diff_to_dict, validate_payload_keys, format_validation_errors), Chief of Staff role. Custom rule authoring: DeclarativeRule, CustomRuleDefinition model, METRIC_REGISTRY (25 metrics), CustomRuleRepository protocol + SQLite impl, CustomRuleController (CRUD + preview). Unified MCP API server: 245 tools across 22 domains with capability-based scoping (registry, scoper, invoker, tool builders, domain defs, handlers). Service orchestrator, factory, config
  tools/          # Tool registry, built-in tools, git SSRF prevention, MCP bridge, sandbox factory (gVisor default overrides via merge_gvisor_defaults), invocation tracking (invoker opens a cost_recording_scope around tools that declare a cost_scope_category, e.g. image generation), network_validator (shared SSRF), sub_constraint_enforcer (granular enforcement of core.tool_constraints), disclosure_config (ToolDisclosureConfig), disclosure_metrics (ToolDisclosureMetrics), discovery (ListToolsTool, LoadToolTool, LoadToolResourceTool, ToolDisclosureManager, DeferredDisclosureManager), web/ (HTTP requests, HTML parsing, web search), database/ (SQL query, schema inspection), terminal/ (sandboxed shell commands), design/ (image generation via the ImageProvider seam, default-shipped ProviderImageProvider adapter routing through the provider layer, durable path-traversal-guarded DesignAssetStore, diagram DSL generation, asset management), communication/ (SMTP email sending, notification dispatch via NotificationDispatcherProtocol, Jinja2 template formatting, delegate_and_await blocking sub-agent delegation gated on a wired SubAgentRunner), analytics/ (data aggregation via AnalyticsProvider protocol, report generation, metric collection via MetricSink protocol), sandbox/ (4-domain SandboxPolicy model (filesystem/network/process/inference), SandboxRuntimeResolver (gVisor probe + per-category runtime resolution with fallback), SandboxCredentialManager (env var credential stripping))

web/src/          # React 19 dashboard (see web/CLAUDE.md for full structure)
cli/              # Go CLI binary (see cli/CLAUDE.md for full structure)
site/             # Astro landing page (synthorg.io), React islands for interactive sections
data/             # Shared data files (competitors.yaml for comparison page)

Releasing¶

Automated by Release Please: every push to main creates/updates a release PR with changelog
Version bumping: always-bump-patch strategy; every release bumps patch (e.g. 0.5.3 -> 0.5.4), regardless of commit type. auto-rollover.yml detects when the last stable patch meets the __synthorg_rollover_at_patch threshold in .github/release-please-config.json (default 9) and creates an empty Release-As: 0.(X+1).0 commit to preserve the 0.X.9 -> 0.(X+1).0 pattern automatically.
Release-As trailer: for exception bumps (1.0 graduation, explicit version jumps), land a Release-As: X.Y.Z trailer in a commit on main. Two valid routes: (a) final paragraph of a feature-PR body that will be squash-merged (squash copies the trailer into the main-branch commit message, where auto-rollover.yml and Release Please both pick it up); (b) trigger Actions -> Graduate -> Run workflow with target_version + reason. The Graduate workflow mints a synthorg-repo-bot App installation token and creates a signed empty commit on main via the Git Data API, landing a Release-As: X.Y.Z trailer that both RP and auto-rollover pick up. Downgrades and same-version graduations are hard-blocked by the workflow's validation step; fix forward with a higher target instead. The prior "add Release-As: to the RP release PR body" route is deliberately unsupported: that edit never becomes a commit on main until the RP PR merges, so auto-rollover.yml can race ahead and push a conflicting trailer before RP reacts.
Signed commits: every CI-generated commit on main is produced via the GitHub API under the synthorg-repo-bot App installation token, verifying as {verified: true, reason: "valid"}. main enforces required_signatures via the protect-main ruleset, so an unsigned commit would be rejected outright. One deliberate exception: the BSL Change Date update on the Release Please PR branch (release.yml "Update BSL Change Date" step) commits via GITHUB_TOKEN rather than the App token. The commit lands on the RP PR branch (not main), so the recursion-suppression penalty of GITHUB_TOKEN does not apply, and GitHub's ambient token still produces a signed commit attributed to github-actions[bot] which satisfies branch protection via the eventual squash-merge.
Release flow: merge release PR -> draft Release + tag -> Docker + CLI workflows build, smoke-test the artifacts at build time (smoke-test-backend-image against the just-built image; smoke-test-cli-binary against the just-built binary), and attach assets to the draft -> finalize-release.yml posts a finalize-release commit status, assembles the Verification section, and publishes the draft. On stable releases, superseded dev pre-releases + tags (those whose base version is at or below the published stable) are then deleted; dev builds targeting a higher, not-yet-released version are preserved. Smoke tests run at build time (not at finalise) so a broken artifact fails the originating PR with a red ❌ on the commit row, not the finalise step after a tag has already been cut.
Dev channel: every push to main (except Release Please bumps) creates a dev pre-release (e.g. v0.8.4-dev.3) via dev-release.yml. Users opt in with synthorg config set channel dev. Dev releases flow through the same Docker + CLI pipelines as stable releases. When a stable release is published, dev releases and tags whose base version is at or below it are deleted; dev builds targeting a higher, not-yet-released version are preserved (a main push can mint the next version's dev.1 while the previous stable is still finalising). If a dev release is swept while its docker.yml run is still in flight, that run's update-release step skips gracefully (warns, exits 0) rather than failing.
Nightly verification: deliberately none. The build-time pipeline (docker.yml + cli.yml + finalize-release.yml) is the source of truth for release-body structure, asset signing, and SBOM attachment. App-token signing is a property of the GitHub API auth path (POST /git/commits under an installation token returns a GitHub-signed commit unconditionally), not of any code we own; a misconfigured secret or revoked installation would also fail the next real release, so a nightly canary mostly catches its own implementation drift. Earlier release-pipeline-health.yml and test-signing.yml workflows were removed for that reason.
Pre-1.0 -> post-1.0 transition: when v1.0.0 ships, always-bump-patch stays in place (the SynthOrg release cadence favours conservative patch bumps). What flips is bump-minor-pre-major: true in the RP config; after 1.0 this flag is dropped so BREAKING CHANGE: footers start producing major bumps again (1.x.y -> 2.0.0). Release-As: trailers keep working unchanged. auto-rollover.yml also keeps working unchanged; patch-rollover is version-independent, and rollover at 1.x.9 -> 1.(x+1).0 continues to use the same mechanism. A follow-up PR will flip the config flag when v1.0.0 lands.
Config: .github/release-please-config.json, .github/.release-please-manifest.json (do not edit manually)
Changelog: .github/CHANGELOG.md (auto-generated, do not edit)
Version locations: pyproject.toml ([tool.commitizen].version), src/synthorg/__init__.py (__version__)

CI¶

Path filtering: dorny/paths-filter; jobs only run when their domain is affected. CLI has its own workflow (cli.yml).
Jobs: lint (ruff) + type-check (mypy) + test-unit (matrix sharded via pytest-split, balanced from .test_durations.unit; shard count in .github/workflows/ci.yml matrix.shard) + test-integration (matrix sharded via pytest-split, balanced from .test_durations.integration, backed by services: postgres instead of testcontainers; conftest detects SYNTHORG_TEST_POSTGRES_HOST/PORT/USER/PASSWORD/DB and yields a connection-info proxy directly) + test-e2e (single shard, same service container) + test-conformance-sqlite (SQLite-only -k "not postgres" slice of the conformance suite). All four arms set COVERAGE_CORE=sysmon for the lower-overhead coverage.py tracing backend (line + branch parity since coverage 7.7). Each shard collects coverage; test-coverage-aggregate combines them, asserts every shard contributed, and enforces the coverage gate via coverage report --fail-under=$(...) driven by [tool.coverage.report] fail_under in pyproject.toml before a single best-effort Codecov upload. Plus python-audit (pip-audit), dockerfile-lint (hadolint), dashboard (lint/type-check/test under the active-handle gate/build/storybook-build/audit), export-openapi (runs scripts/export_openapi.py once and shares the artifact with the dashboard arm), and .github/actions/install-postgres-18-client (shared composite for PGDG postgresql-client-18 install with SHA-256-pinned signing key). All run in parallel -> ci-pass gate.
Merge queue (unavailable on this repo): GitHub gates the merge queue to org-owned public repos and Enterprise Cloud, so a repo owned by a personal user account cannot enable it -- both the ruleset REST API and the Settings -> Rules UI reject a merge_queue rule, and .github/branch_protection.yml must NOT declare one or branch-protection-audit reports permanent drift. The deadlock it would have dissolved is real: the four required contexts (CI Pass, CodSpeed Python Pass, CodSpeed Web Pass, Lighthouse Pass) are matched by context name on a commit SHA, a check run only exists once its workflow dispatches and runs on that SHA, and GitHub does not re-deliver a dropped pull_request dispatch, so a single missed fan-out leaves a required context stuck at "Expected" and hard-blocks the PR (re-push / re-run / admin bypass remain the workaround). The merge_group workflow plumbing below is kept in place but dormant: it does nothing without a queue, and is ready to enforce the moment the repo moves to an organisation, at which point the merge_queue rule is re-added to the spec and required checks are evaluated on the queue-owned merge_group ref at merge time instead of the fragile per-head dispatch. Contract for any future required-check workflow: add merge_group to its on: triggers and make its dorny/paths-filter changes job merge-group-correct (base: ${{ github.event.merge_group.base_sha }} + ref: ${{ github.event.merge_group.head_sha }} over a fetch-depth: "0" checkout; both expressions are empty/falsy on push / pull_request, preserving default detection there) so the required context reports on the queue ref. A sub-job that cannot run in the queue (its inputs have no merge-group equivalent) follows the lighthouse-site pattern: omit merge_group from that job's own if: guard, and have the workflow's pass-aggregator treat its skipped result as a pass (test result != failure && result != cancelled, never result == success). Skip-as-pass masking pitfall: a job is also reported skipped when one of its own needs: dependencies failed, so a blind result != failure test on a leaf job can mask an upstream failure as a pass. Two invariants keep the pattern safe and both are mandatory for any aggregator that adopts it: the pass-aggregator must needs: every sub-job directly (never a single leaf of a needs: chain, which would only ever observe the leaf's own skipped), and the skip-as-pass leniency must be gated on the path-filter's intent rather than the raw result (only treat skipped as a pass when the changes job's output says the sub-job was deliberately filtered out, e.g. lighthouse-pass checks each sub-job's result only when its *_CHANGED flag is true). An aggregator that depends on all sub-jobs and gates leniency on filter intent cannot be fooled by a dependency-induced skip; one that does neither can. lighthouse-site is exactly this case (it audits the per-PR pr-<n>.synthorg-pr-preview.pages.dev Cloudflare preview, which has no merge-group equivalent; the PR-head run covers it). Conversely, do NOT add merge_group as a bypass to a path-gated BUILD job's if: (e.g. ci.yml's dashboard-build): the dashboard flag is already computed correctly on merge_group via the paths-filter base/ref, so the flag is the right gate and a merge_group bypass would run the build even on irrelevant queue entries. Required-status-check contexts are also pinned to the GitHub Actions app via integration_id in the ruleset. The ruleset is mirrored in branch_protection.yml and checked by branch-protection-audit (scripts/audit_branch_protection.sh); any ruleset edit is a Settings -> Rules action that must be reflected there.
Pages: pages.yml: version extraction from pyproject.toml, OpenAPI export, comparison page generation, Astro + Zensical docs build, GitHub Pages deploy on push to main
PR Preview: pages-preview.yml: Cloudflare Pages deploy per PR (pr-<number>.synthorg-pr-preview.pages.dev), cleanup on PR close
Docker: docker.yml: build + Trivy scan + CIS benchmark run on every PR; push to GHCR + cosign sign + SLSA L3 provenance gated by the image-push deployment environment (branch policy main,v*). Build and publish are split into separate jobs per image (build-X + build-X-publish); only the publish half carries packages: write / id-token: write / attestations: write. Shared logic lives in composite actions (build-scan-image, publish-image). PR builds are amd64-only on the docker buildx driver; non-PR builds add arm64 and switch to docker-container, so most of build-scan-image is gated off pull_request and only ever executes after merge. Both the binfmt and BuildKit images for that path come through the docker-pull-resilient ladder and are consumed from its local tag, never re-resolved: docker/setup-qemu-action cannot take a local tag (it always pulls its image: input), so the arm64 handlers are registered by running the image directly. The docker-container driver also mirrors docker.io through mirror.gcr.io, covering the refs BuildKit resolves for itself (the # syntax= frontend and builder-stage bases) outside any docker pull the ladder wraps; the GHCR-hosted BASE_IMAGE final stage bypasses it. Digest pins are what make that mirror safe, enforced by invariant 8 of check_ci_workflow_resilience.py. CVE triage: .github/.trivyignore.yaml
GHCR Cleanup: ghcr-cleanup.yml: standalone workflow (schedule weekly + workflow_dispatch) that prunes dev and non-release container versions from GHCR across all ten packages, complementing finalize-release.yml's GitHub-Releases sweep. A config job resolves the effective dry-run once (a manual dispatch honours its checkbox; the weekly run deletes only when GHCR_CLEANUP_ENABLED=true, else stays dry-run), since the schedule trigger carries no inputs. Three passes per package: keep the newest 5 dev builds (X.Y.Z-dev.N), reap PR/scan sha-*/scan-* images older than 7 days, and reap orphaned cosign / attestation referrers. Release tags (X.Y.Z, X.Y, latest) and their signatures, attestations, and multi-arch children are always protected by an anchored alternation exclude-tags regex (with use-regex, comma-separated patterns are NOT split, so each exclude-tags / delete-tags value must be a single regex using (a|b) alternation) plus validate: true. A keep-* tag is the operator escape hatch for a version the packages API refuses to delete (the action treats that 400 as fatal, so it reds the leg every week forever); it moves the version out of the untagged pass and under exclude-tags, and only works while every child manifest still exists, since re-tagging re-pushes the index. Fully decoupled from the release path -- a prune blip never touches a release run. Uses the optional GHCR_CLEANUP_TOKEN PAT (write:packages + delete:packages), falling back to GITHUB_TOKEN. Each pass runs dataaxiom/ghcr-cleanup-action (SHA-pinned, on the Actions allowlist) directly; the prune is idempotent, so a spurious GHCR 401 Bad credentials under concurrent delete load (which the action treats as fatal yet is not an auth failure) fails only that leg and self-heals on the next weekly run.
CLI: cli.yml: Go lint/test/build (cross-compile) + govulncheck + fuzz. GoReleaser release on v* tags with cosign signing + SLSA provenance, gated by the release-tags deployment environment (v*-only, no privileged secrets; keeps RELEASE_PLEASE_TOKEN out of the tag path). The release job's gh release upload/download/edit + body-read calls go through .github/scripts/gh_with_retry.sh and the checksums.txt keyless signing through .github/scripts/cosign_sign_with_retry.sh sign-blob; the four attest-build-provenance steps and the SBOM install steps ride bounded continue-on-error retry ladders so a transient Rekor/Sigstore timeout does not fail a release
Renovate: weekly dependency updates via Mend GitHub App. 3 domain groups (Python, Web, Infrastructure), no auto-merge. The Infrastructure group spans Go modules, Dockerfile + docker-compose images, GitHub Actions SHAs, and every custom-regex pin (binary-tool versions like Trivy / Gitleaks / D2 / apko, container-image regexes for state.go / compose.yml / busybox / testcontainers, digest-pinned images pulled at CI job runtime under .github/ and scripts/ (postgres / caddy / moby/buildkit / tonistiigi/binfmt, each anchored on a # renovate: datasource=docker marker), action version: inputs like golangci-lint / GoReleaser, go install URLs like govulncheck). Config: renovate.json. Use /review-dep-pr before merging
Security scanning: gitleaks (push/PR + weekly), zizmor (workflow analysis), OSSF Scorecard (weekly), Socket.dev (PR supply chain), ZAP DAST (weekly + manual, rules: .github/zap-rules.tsv)
Coverage: Codecov (best-effort, CI not gated on availability)
Dependency review: dependency-review.yml: license allow-list (permissive + weak-copyleft), per-package GPL exemptions for dev-only tool deps (golangci-lint), PR comment summaries
CLA: cla.yml: two jobs splitting read and write. cla-check (pull_request_target) runs self-contained bash + gh api against .github/cla-signatures.json on the cla-signatures branch, with a gh_api_retry helper that does bounded exponential-with-cap retry on transient EPIPE / 5xx (8 attempts, ~10-min budget under a 12-min job timeout) and fails fast on definitive 4xx. It uses the  marker for idempotent PR comment updates (PATCH if the marker comment exists, POST on first transition). cla-sign (issue_comment matching the sign-text body) records the signature via the Git Data API under the synthorg-repo-bot App token. Bot allowlist (dependabot[bot], renovate[bot], synthorg-repo-bot[bot], github-actions[bot]) skips the CLA on both jobs.
Release: release.yml: Release Please creates draft release PR. Mints a synthorg-repo-bot App installation token via the release-runner-setup composite action (secrets documented in docs/reference/github-environments.md). Gated by the release deployment environment. Includes a Highlights step that calls Mistral (mistral-large-latest via the free Experiment tier, MISTRAL_API_KEY secret) to prepend a three-section summary to the release PR body, wrapped in ... markers. mistral-large-latest has a 128k input context (the largest changelog digest is ~7k tokens) and a ~1B-tokens/month free quota, far above the one-call-per-release workload; a built-in GitHub Models action does not fit because its 8k input cap is smaller than a large-minor changelog digest. The step is best-effort (continue-on-error, curl-level retry on 429/5xx): any Mistral failure logs a ::warning:: and skips the Highlights block rather than failing the release. Total bullet count is dynamic (1-15) scaled to the changelog volume and distributed across three fixed headers: What you'll notice (user-facing fixes + UX / behaviour changes), What's new (newly-introduced capabilities and extensions), Under the hood (maintenance, deps, refactors, included only when notable). Empty sections are omitted. Opt out per-release by adding a No-Highlights: trailer (case-insensitive, anywhere on its own line) to the Release Please PR body before the workflow runs. finalize-release.yml then promotes the same marker block from the merged release-please PR body into the published release body (release-please builds release notes from CHANGELOG.md only, so without this promotion the Highlights block would stay stranded on the PR; see "Finalise Release" below). The CLI consumes the same Highlights block during synthorg update on stable channels: it walks every release in (installed, target] oldest-to-newest in batches of 3 and renders the styled summary by default, with c toggling between the AI summary and the Release Please commit-based changelog. Releases without a Highlights block (pre-rollout or No-Highlights: opt-out) fall back to the commit view automatically. Dev pre-releases have no Highlights block by design, so the CLI walk renders a single combined commit list via the GitHub compare API instead. Walk is gated to interactive TTY runs; --quiet / --json / --yes / non-TTY contexts skip the walk and print the terse "Update available" notice + release-notes URL. The LICENSE / PR-body / head-SHA reads and the four required-status POSTs are wrapped by .github/scripts/gh_with_retry.sh (retry transient 401/5xx, fast-fail definitive 4xx); timeout-minutes: 15 bounds the stacked retry ladders so a black-holed connection cannot hold the release-please concurrency group.
Auto Rollover: auto-rollover.yml: detects when the last stable tag's patch meets the __synthorg_rollover_at_patch threshold in .github/release-please-config.json (default 9), creates an empty commit on a versioned rollover branch (chore/auto-rollover-v<next>), and opens a PR whose body carries the Release-As: 0.(minor+1).0 trailer so the squash-merge lands it on main and Release Please targets the minor bump. Four skip guards: (1) Release Please release commits and its own prior rollover commits (matched on subject prefix); (2) a history-independent check (gh pr list) that the rollover PR for this exact version branch is already merged or open (skips MERGED / OPEN, but not CLOSED-without-merge, which never took effect); (3) any Release-As: trailer already in the last-stable..HEAD range, evaluated fail-closed so a range that cannot be computed (incomplete fetch) skips the run rather than rolling over; (4) any open Release Please release PR whose body already queues a Release-As: trailer. Gated by the release deployment environment. The empty commit and the rollover branch ref are created via the Git Data API (POST /git/commits + POST /git/refs, force-PATCH if the branch ref already exists) under the App installation token, so the squash-merge onto main ships a verified signature (required by main's signed-commits rule) and triggers downstream Release + Dev Release workflows. The dedup-read gh pr list guards are wrapped by the shared .github/scripts/gh_with_retry.sh helper (bounded exponential retry on transient 401/5xx, fast-fail on definitive 4xx, exit 75 on exhaustion which here means fail-closed skip); timeout-minutes: 8 accommodates the helper's ~1m45s worst-case ladder. The Git Data API writes stay un-retried so a real write failure pages.
Graduate: graduate.yml: workflow_dispatch one-click Release-As: trailer for target versions that skip the normal patch cadence (1.0 graduation, explicit minor jumps). Inputs: target_version + reason. Validates target is strictly above last stable (hard-blocks downgrades). Creates a signed empty commit on main with the trailer via the Git Data API under the App installation token. Gated by the release deployment environment. The parent-tree and verification reads go through .github/scripts/gh_with_retry.sh; the commit POST + ref PATCH stay un-retried so a write failure on this manual, human-watched graduation pages.
Dev Release: dev-release.yml: creates semver dev tags (e.g. v0.8.4-dev.3) and draft pre-releases on every push to main (skips Release Please version-bump commits). Tags trigger existing Docker + CLI workflows for full build/scan/sign pipeline. Gated by the release deployment environment. Uses the release-runner-setup composite for token mint. Pre-release body is built locally via git log -1 on the head SHA and gh release create --notes-file: title $DEV_TAG (e.g. v0.8.4-dev.5), then a Dev build #N toward vX.Y.Z line, **Commit:** <short SHA>, **Subject:** <commit subject>, the **Full pipeline:** disclaimer, and the channel opt-in tip. Only the short SHA and the commit subject are written into the notes file -- the full commit body (squash-merge PR descriptions of hundreds of lines, nested markdown, tables) is deliberately omitted because it renders poorly on the release page and buries what changed. Variables go through printf '%s' placeholders (the --notes-file route avoids command substitution that bare --notes "..." would suffer if a commit subject contained backticks or $(...)). Failure path: if gh release create returns non-zero (transient API error, 5xx, rate limit), the workflow exits 1 with the orphan tag preserved -- deleting the tag would race the downstream tags: v*-listening workflows that the tag-create push already triggered (cli.yml, docker.yml), 404'ing their actions/checkout step. The orphan tag is later garbage-collected by the same workflow's incremental sweep (keeps 5 most recent dev pre-releases) and by finalize-release.yml's stable-release sweep. End-of-job regression guard Verify minted tag survived the run always re-resolves refs/tags/$DEV_TAG (via if: always() so the guard runs on failure paths where tag loss is most likely) and exits 1 if absent, routing through the existing report-failure job into the dev-release regression tracking issue. Workflow-tag-lifecycle pre-push gate (scripts/check_workflow_tag_lifecycle.py) statically prevents any future workflow from re-introducing the create-then-conditionally-delete shape. The end-of-run tag-survival check reads through .github/scripts/gh_with_retry.sh so a transient 401 cannot fire a false "tag deleted" alarm (a real 404 still fast-fails and trips the guard). GHCR image pruning runs as the standalone weekly ghcr-cleanup.yml (see "GHCR Cleanup" above), decoupled from the release path.
Finalise Release: finalize-release.yml: assembles the release body and publishes the draft once both Docker + CLI workflows succeed for the tag. Body assembly: prepends the AI Highlights block (stable releases only) extracted from the merged release-please PR body via the head_sha → pulls association, then re-applies the Verification section from the per-image marker comments (, , etc.). The strip step that prevents finalise re-runs from doubling sections gates EVERY marker-pair deletion on both START and END being present in the body; sed '/START/,/END/d' is greedy to EOF without an END, which would tank the entire CHANGELOG-derived body if a contributor's commit subject (now propagated verbatim into dev release bodies via dev-release.yml) happened to contain a literal opening marker. The gate applies to HIGHLIGHTS and to all five CLI_* / CONTAINER_* verification-data marker pairs. The FINALIZE_VERIFICATION marker is intentionally greedy-to-EOF: everything after it IS the verification section, rebuilt fresh on each finalise run. Posts a finalize-release commit status (pending at start, success / failure at finish) so workflow_run-triggered failures surface as a red ❌ on the commit row instead of disappearing into the Actions tab. Gated by the release deployment environment. Immutable releases enabled. Handles both stable and dev releases. Stable-release dev-cleanup deletes every dev release + every orphan dev tag matching vX.Y.Z-dev.N whose base version is at or below the published stable (future-version dev builds are skipped via a sort -V semver compare, so a next-version dev.1 minted during the previous stable's finalise window is not swept out from under its in-flight docker.yml run) -- the inner gh api calls are explicitly capture-and-checked (NOT mapfile < <(...), which silently treats inner-process failures as empty input) and per-tag gh release delete / gh api -X DELETE failures accumulate into a final exit-on-failure check so partial-cleanup is loudly diagnosed. The Highlights propagation path that fetches the release-please PR body splits the gh pr view call into capture + classify so an auth / rate-limit failure surfaces a ::warning:: distinct from "PR was deleted" (legitimate skip with ::notice::). Artifact smoke testing happens at BUILD time in cli.yml and docker.yml via the smoke-test-cli-binary and smoke-test-backend-image composite actions; the finalise step does not re-test (Docker images are content-addressed and CLI archives are SHA-256-verified by the cosign-signed checksums.txt).
Main Red Alert: main-red-alert.yml: workflow_run-triggered tracker for a red main. Watches CI, Docker, and CLI -- the workflows whose failure means main is genuinely broken -- and opens a pinned CI health: <workflow> is failing on main issue via post-tracking-issue on the first failure or timed_out push run, closing it automatically on the next green one. Advisory / noise-prone workflows (CodSpeed, Lighthouse, Scorecard) are deliberately out of scope: a tracker that cries wolf gets ignored. One tracker per watched workflow, since the title is the dedup key, and a concurrency group per workflow name so two completions cannot race the same body edit. The dangerous-triggers zizmor finding is suppressed with justification: workflow_run resolves github.sha to the DEFAULT branch, so the checkout is trusted main code rather than the triggering ref, no code from the triggering run executes, every payload field is read through env:, and fork PRs are excluded by the head_repository + event == 'push' guard. Being workflow_run, only the default-branch copy ever runs, so a change here takes effect after merge, never on the PR making it.
CI failure-surfacing policy: every CI workflow must surface its outcome somewhere visible. Non-schedule failure paths (push / pull_request / workflow_run / release / dispatch) post a commit status or PR check; schedule failure paths open or update a tracking GitHub Issue labelled automation:ci-health. A commit status is necessary but not sufficient on main: nobody watches the Actions tab per push, so a red main can persist across further merges undetected, which is likeliest for a workflow whose failing half is gated off pull_request and therefore passes every PR. main-red-alert.yml therefore adds the issue lane to the push-to-main path as well, for the workflows whose failure means main is genuinely broken. Schedule-triggered workflows have no commit context to attach to, hence the issue lane; manual workflow_dispatch runs surface failures in the run UI directly so they do not open issues. One deliberate carve-out: ghcr-cleanup.yml (weekly, dry-run by default) opens NO tracking issue -- the prune is idempotent housekeeping, so a transient GHCR 401 fails only that leg and self-heals on the next weekly run; the worst case is a week of stale dev/PR versions lingering, not worth a tracker. The shared composite is .github/actions/post-tracking-issue; it dedupes by title across all states (open + closed), so a regression that reappears reopens the same tracker rather than creating a duplicate; consumers that auto-close on success (e.g. ci-preflight.yml) should also unpin in the close path so a closed-and-resolved issue does not stay in the pinned row. Workflows currently using this pattern: apko-lock.yml, ci-preflight.yml, dast.yml, python-audit.yml, evals.yml, scorecard.yml, secret-scan.yml, release.yml, auto-rollover.yml, finalize-release.yml, lychee-external.yml, main-red-alert.yml (see below), and dev-release.yml (push to main has no PR-check context, so its report-failure job surfaces the dev-release regression title via the issue lane). Pinned tracking-issue label: automation:ci-health. Success events (stable release published, dev pre-release cut, auto-rollover success) deliberately do NOT generate notifications; the GitHub Releases tab and commit row already surface those, and posting them would just spam the tracker.
SBOM Diff: sbom-diff.yml: inform-only sticky PR comment on Release Please release PRs. Added / removed components + license category counts from the head backend SBOM vs last stable. dependency-review.yml remains the license gate; this comment is advisory.

Dependencies¶

Pinned: all versions use == in pyproject.toml
Groups: test (pytest + plugins, hypothesis), dev (includes test + ruff, mypy, pre-commit, commitizen, pip-audit)
Required: pgvector + sqlite-vec (dense vector search for agent memory, inside the existing databases), mmh3 (murmurhash3 for BM25 token hashing in hybrid search), cryptography (Fernet encryption for sensitive settings at rest), faker (multi-locale agent name generation for templates and setup wizard), httpx (async HTTP client for web tools)
Install: uv sync installs everything (dev group is default)
Web dashboard: Node.js 22+, TypeScript 6.0+, dependencies in web/package.json (React 19, react-router, shadcn/ui, Base UI, Tailwind CSS 4, Zustand, @xyflow/react, @dagrejs/dagre, d3-force, @dnd-kit, Recharts, Motion, cmdk-base, js-yaml, Axios, Lucide React, @fontsource-variable/geist, @fontsource-variable/geist-mono, @fontsource-variable/jetbrains-mono, @fontsource-variable/inter, @fontsource/ibm-plex-mono, @fontsource/ibm-plex-sans, CodeMirror 6, Storybook 10, MSW, msw-storybook-addon, Vitest, @vitest/coverage-v8, @testing-library/react, fast-check, ESLint, @eslint-react/eslint-plugin, eslint-plugin-security, Playwright, @lhci/cli, rollup-plugin-visualizer, cross-env)
CLI: Go 1.26+, dependencies in cli/go.mod (Cobra, charm.land/bubbletea/v2, charm.land/bubbles/v2, charm.land/huh/v2, charm.land/lipgloss/v2, sigstore-go, go-containerregistry, go-tuf, klauspost/compress for Snappy decompression of attestation bundle_url payloads)
Landing page: dependencies in site/package.json (Astro 7, @astrojs/react 6, @astrojs/sitemap, React 19, Tailwind CSS 4, js-yaml, Vitest 4)

Property-based Testing (Hypothesis): Deep Dive¶

The short rule in CLAUDE.md: Python uses Hypothesis; profiles live in tests/conftest.py; CI runs deterministic 10-example sweeps; failing examples are real bugs.

Profiles¶

Configured in tests/conftest.py, selected via HYPOTHESIS_PROFILE env var:

ci: deterministic, max_examples=10 + derandomize=True. Fixed seed per test, same inputs every run (no flakes).
dev: 1000 examples.
fuzz: 10,000 examples, no deadline. For dedicated fuzzing sessions.
extreme: 500,000 examples, no deadline. Overnight deep fuzzing.

.hypothesis/ is gitignored. Failing examples persist to ~/.synthorg/hypothesis-examples/ (write-only shared DB, survives worktree deletion) via _WriteOnlyDatabase in tests/conftest.py.

Running locally¶

Quick (1000 examples): HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n 8 -k properties
Deep (10,000 examples, all @given tests): HYPOTHESIS_PROFILE=fuzz uv run python -m pytest tests/ -m unit -n 8 --timeout=0
--timeout=0 disables the 30s per-test limit that would kill long-running property tests.
-k properties is intentionally omitted to cover all 51 files with @given, not just the 22 *_properties.py files.

When Hypothesis finds a failure¶

It is a real bug. The shrunk example is saved to ~/.synthorg/hypothesis-examples/ for analysis but is not replayed automatically (that would block all test runs).

Do NOT just rerun and move on. Read the failing example from the output, fix the underlying bug, and add an explicit @example(...) decorator to the test so the case is permanently covered in CI.

Cross-language equivalents¶

React: fast-check (fc.assert + fc.property)
Go: native testing.F fuzz functions (Fuzz*)