Skip to content

CLAUDE.md Reference: Infrequently Needed Sections

Moved from root CLAUDE.md to reduce per-message token cost. Read on demand.

Documentation

  • Docs: docs/ (Markdown, built with Zensical, config: mkdocs.yml). Also embedded in the web Docker image at /docs/ via a docs-builder stage
  • Design spec: docs/design/ (12 pages), Architecture: docs/architecture/, Roadmap: docs/roadmap/
  • Security: docs/security.md, Licensing: docs/licensing.md, Reference: docs/reference/
  • REST API reference: docs/openapi/index.md (landing page) + docs/openapi/reference.html (Scalar viewer) + docs/openapi/openapi.json (schema). The viewer and schema are generated by scripts/export_openapi.py and written as static siblings of the landing page so zensical copies them through on build. Sitemap policy: scripts/patch_sitemap.py includes reference.html (a real landing page that should be discoverable) but excludes openapi.json (Google never renders raw JSON in search results, and including it created permanent "Discovered, currently not indexed" noise in Search Console).
  • Comparison data: data/competitors.yaml, the shared YAML source for docs/reference/comparison.md (generated by scripts/generate_comparison.py) and site/src/pages/compare.astro
  • Library reference: docs/api/ (auto-generated via mkdocstrings + Griffe, AST-based)
  • Scripts: scripts/: CI/build utilities and development-time validation hooks (relaxed ruff rules: print and deferred imports allowed). Validation hooks include: check_push_rebased.sh (blocks push if behind main), check_bash_no_write.sh (blocks file writes via Bash), check_git_c_cwd.sh (blocks unnecessary git -C), check_web_design_system.py (validates design tokens on web file edits). CI scripts include: evaluate-scan.sh (DRY Trivy JSON result evaluation), cis-scan.sh (CIS Docker Benchmark wrapper), report-image-size.sh (image size reporting to step summary)
  • Landing page: site/ (Astro + React islands via @astrojs/react). Includes /get/ CLI install page, /compare/ framework comparison, contact form, interactive dashboard preview, SEO
  • Deps: docs group in pyproject.toml (zensical, mkdocstrings[python], griffe-pydantic)

Docker

# Build and run (from repo root)
cp docker/.env.example docker/.env        # configure env vars
docker compose -f docker/compose.yml build
docker compose -f docker/compose.yml up -d
docker compose -f docker/compose.yml down

# Verify
curl http://localhost:3001/api/v1/readyz   # backend (direct)
curl http://localhost:3000/api/v1/readyz   # backend (via web proxy)
  • Images: backend (Wolfi apko-composed distroless, non-root), web (Caddy pure-apko, SPA + API proxy + embedded docs), sandbox (Python + Node.js Wolfi, non-root)
  • Config: all Docker files in docker/: Dockerfiles, compose, .env.example. Single root .dockerignore (all images build with context: .)
  • Verification: CLI verifies cosign signatures + SLSA provenance at pull time; bypass with --skip-verify
  • Tags: version from pyproject.toml, semver, SHA, plus dev tags (v0.8.4-dev.3, dev rolling) for dev channel builds

Package Structure

src/synthorg/
  api/            # Litestar REST + WebSocket API, RFC 9457 errors, setup wizard, personality presets, auth/ (role-based access control, HttpOnly cookie sessions, CSRF double-submit, lockout / refresh-token / session repositories under persistence/{sqlite,postgres}/, concurrent session enforcement, user presence, OrgRole enum for org config permissions), guards (HumanRole-based + OrgRole-based with department scoping via require_org_mutation), user management (CRUD + org-role grant/revoke), dto_org (request DTOs for company/department/agent mutations), dto_workflow (request/response DTOs for workflow definition and execution operations), services/org_mutations (read-modify-write config mutation service), auto-wiring, lifecycle (auto-promote first owner), bootstrap (agent registry init from config), template packs (list + live-apply), memory admin (fine-tuning pipeline with orchestrator, checkpoint management, preflight checks, run history, embedder queries), optimistic concurrency (ETag/If-Match), TLS config, tiered rate limiting (unauth by IP, auth by user ID), rate_limits/policies (RATE_LIMIT_POLICIES canonical registry of (max_requests, window_seconds) defaults per operation id + per_op_rate_limit_from_policy helper; operator overrides flow through PerOpRateLimitConfig.overrides separately), workflows (visual workflow definition CRUD, validation, YAML export, blueprint listing, blueprint instantiation, version history, diff, rollback), workflow executions (activate, list, get, cancel), ceremony policy (project + per-department query/override, resolved policy with field origins), quality overrides (per-agent quality score override CRUD), reports (on-demand report generation, period listing), notification_dispatcher (fan-out notification sink), training (training plan CRUD, execution, preview, overrides)
  backup/         # Backup/restore orchestrator, scheduler, retention, handlers/
  budget/         # Cost tracking, budget enforcement, quota degradation (including synchronous peek for routing-time selector hints), CFO optimization, trend analysis, budget forecasting, configurable currency formatting, risk budget (cumulative risk-unit tracking, risk scoring integration, risk check, risk records), automated reporting (periodic comprehensive reports, spending/performance/task-completion/risk-trends templates, report scheduling config), coordination metrics (9 empirical metrics: efficiency, overhead, error amplification, message density, redundancy, Amdahl ceiling, straggler gap, token/speedup ratio, message overhead), project cost aggregates (durable per-project lifetime cost totals surviving retention pruning)
  cli/            # Python CLI module (superseded by top-level cli/ Go binary)
  client/         # Client simulation: ai_client, human_client, hybrid_client, pool, adapters, runner, continuous, store, simulation_state, config, models, protocols, feedback/ (binary, scored, criteria_check, adversarial), generators/ (template, llm, dataset, procedural, hybrid), report/ (detailed, summary, metrics_only, json_export)
  a2a/            # Optional A2A external gateway (JSON-RPC 2.0 federation), agent_card (safe-subset projection), client (outbound federation), gateway (inbound JSON-RPC dispatcher), models (A2A protocol types), task_mapper + message_mapper (bidirectional mapping), config, security (peer validation, payload limits), well_known (Agent Card discovery), peer_registry, push_verifier (HMAC-SHA256), connection_types/ (a2a_peer registration)
  communication/  # Message bus, dispatcher, channels, delegation, conflict resolution, meeting/, event_stream/ (AG-UI SSE hub, event projector, interrupt/resume protocol, evidence package re-exports)
  config/         # YAML company config loading and validation
  core/           # Shared domain models, base classes, resilience config, immutable (deep_copy_mapping, freeze_recursive for frozen Pydantic field protection), tool_disclosure (ToolL1Metadata, ToolL2Body, ToolL3Resource), tool_constraints (ToolSubConstraints, the five dimension enums, get_sub_constraints; placed in core so core.agent need not import the tools hub)
  execution/      # Light leaf for execution-trace types, engine-free so non-engine consumers (e.g. budget.coordination_collector) can import them cold: turn (TurnRecord, NodeType, BehaviorTag), efficiency (EfficiencyRatios, IdealTrajectoryBaseline, compute_efficiency_ratios), view (ExecutionResultView runtime-checkable protocol), parked_context (ParkedContext, the serialised parked-agent snapshot the persistence/worker/API layers name without pulling engine). See ADR-0012.
  engine/         # Orchestration, execution loops, task engine (observer registration, background observer dispatch), coordination, checkpoint recovery, structured failure diagnosis (FailureCategory, infer_failure_category, RecoveryResult failure_context/criteria_failed/stagnation_evidence), approval/review gates (no-self-review enforcement via SelfReviewError, immutable DecisionRecord drop-box), stagnation detection, context budget, compaction, hybrid loop, prompt profiles (tier-based prompt adaptation, personality trimming via max_personality_tokens), procedural memory integration (failure-driven), post_execution/ (extracted memory hooks -- distillation capture, procedural memory pipeline, evolution trigger), evolution/ (pluggable trigger/proposer/guard/adapter pipeline, EvolutionService orchestrator, EvolutionConfig with safe defaults, triggers/ (batched/inflection/per-task/composite), proposers/ (separate-analyzer/self-report/composite), adapters/ (identity/strategy-selection/prompt-template), guards/ (rate-limit/review-gate/rollback/shadow-evaluation (with shadow_protocol.py protocols + shadow_providers.py Configured/RecentHistory strategies)/approve-all (no-op fallback when every guard is disabled)/composite)), identity/ (diff utilities, store/ (IdentityVersionStore protocol, append-only + copy-on-write implementations with rollback)), workspace/ (git worktree isolation, merge orchestration, semantic conflict detection), quality/ (step-level quality signal classifier, accuracy-effort ratio, StepQualityClassifier protocol), health/ (two-layer health monitoring pipeline, HealthJudge + TriageFilter, EscalationTicket, NotificationSink wiring), trajectory/ (best-of-K trajectory scoring, TrajectoryScorer, budget guard, TrajectoryConfig), intake/ (IntakeEngine lifecycle walker, strategies/ (DirectIntake pass-through, AgentIntake LLM-driven triage)), review/ (ReviewPipeline chain walker, stages/ (InternalReviewStage, ClientReviewStage)), workflow/ (Kanban board, Agile sprints, WIP limits, sprint lifecycle, velocity tracking, ceremony scheduling, strategy migration, strategies/ (pluggable scheduling strategies), velocity_calculators/ (pluggable velocity calculators), definition (visual workflow graph model, node/edge types, validation, YAML export), blueprint_loader (starter blueprint loading), blueprint_models (blueprint data models), blueprints/ (5 YAML starter templates), diff (version diff computation), version (version snapshot model), execution (workflow activation service, execution models, condition evaluator (compound AND/OR/NOT), graph utilities, execution_observer (TaskEngine bridge for lifecycle transitions), execution_activation_helpers (graph walking, conditional processing, task config parsing), execution_lifecycle (execution transitions, status management, task-event handling), subworkflow_registry (subworkflow publishing, version resolution, parent references))), strategy/ (trendslop mitigation: strategic lenses, constitutional principles, confidence calibration, cost tier resolution, lens_assignment (LensAssigner protocol, DiversityMaximizingAssigner round-robin), consensus (ConsensusVelocityDetector, ConsensusAction), premortem (PremortemExecutor protocol, DefaultPremortemExecutor, FailureMode, PremortemOutput))
  hr/             # Hiring, firing, onboarding, agent registry (evolve_identity for evolution-approved changes), performance tracking (InflectionSink protocol, PerformanceInflection events for trend direction changes), activity timeline, activity event types, cost event redaction, career history, promotion/demotion, evaluation/ (five-pillar evaluation framework, pluggable pillar scoring strategies, EvaluationConfig), quality scoring (layered composite: CI signal + LLM judge + human override, QualityOverrideStore), scaling/ (dynamic company scaling: ScalingService orchestrator with runtime strategy enable/disable and priority reordering, domain models (ScalingSignal/ScalingContext/ScalingDecision/ScalingActionRecord), enums (ScalingActionType/ScalingOutcome/ScalingStrategyName), error types, ScalingContextBuilder (signal aggregation with graceful degradation), pluggable ScalingStrategy/ScalingSignalSource/ScalingTrigger/ScalingGuard protocols, strategies/ (WorkloadAutoScale, BudgetCap, SkillGap, PerformancePruning with evolution deferral), signals/ (workload, budget, skill, performance read-only adapters), triggers/ (BatchedScalingTrigger with overlap protection, SignalThresholdTrigger with crossing detection, CompositeScalingTrigger), guards/ (ConflictResolver with MappingProxyType-wrapped priority, CooldownGuard, RateLimitGuard with batch-aware enforcement, ApprovalGateGuard, CompositeScalingGuard with public get_guards()), config (per-strategy + trigger + guard + default_hire_level), factory), training/ (pluggable training pipeline: TrainingService orchestrator, TrainingPlan/TrainingResult models, factory, config, selectors/ (role_top_performers, department_diversity, user_curated, composite), extractors/ (procedural, semantic, tool_patterns), curateurs/ (relevance, llm_curated), guards/ (sanitization, volume_cap, review_gate), onboarding_integration)
  notifications/  # NotificationSink protocol, NotificationDispatcher fan-out, Notification model (category taxonomy: approval/budget/security/system/agent/health + severity taxonomy), adapters/ (console, ntfy, slack, email), config
  ontology/       # Semantic ontology subsystem: @ontology_entity decorator, OntologyBackend protocol, SQLiteOntologyBackend, OntologyService (bootstrap + CRUD), OntologyConfig (6 sub-configs), EntityDefinition/EntityField/EntityRelation models, versioning integration, drift detection types, error hierarchy, observability events
  memory/         # Pluggable MemoryBackend, retrieval pipeline (hybrid dense+BM25 sparse with RRF fusion, MMR diversity re-ranking via apply_diversity_penalty with pre-computed bigram cache), tool-based injection strategy with iterative Search-and-Ask reformulation loop (fail-safe reformulator/sufficiency_checker), ToolRegistry memory tool wrappers (SearchMemoryTool, RecallMemoryTool), fail-closed memory filter, agentic query reformulation, org memory, backends/ (composite namespace-based routing, inmemory session-scoped, mem0 Qdrant+SQLite, EmbeddingCostConfig embedding cost tracking), consolidation/ (SimpleConsolidationStrategy, DualModeConsolidationStrategy density-aware, LLMConsolidationStrategy with parallel TaskGroup per-category processing + trajectory-context injection from distillation entries, LLMConsolidationConfig, DistillationRequest capture helper tagged "distillation" EPISODIC, retention, archival), embedding/ (LMEB-ranked model selection, embedder config resolution, fine-tuning pipeline with orchestrator, cancellation, checkpoint management), procedural/ (failure-driven auto-generation, proposer LLM pipeline, SKILL.md materialization, ProceduralMemoryConfig, capture/ (failure/success/hybrid capture strategies), pruning/ (TTL/Pareto/hybrid pruning strategies), propagation/ (none/role-scoped/department-scoped cross-agent propagation))
  persistence/    # Pluggable PersistenceBackend, SQLite + Postgres backends, settings + user + artifact + project + preset + workflow definition + workflow execution + workflow version + agent identity versions + fine-tune + decision record (append-only audit drop-box) + risk override + SSRF violation + project cost aggregate + training plan + training result repositories, artifact content storage (pluggable ArtifactStorageBackend, filesystem impl), migrations.py + migration_helpers.py (yoyo-migrations runner coroutines and URL/discovery/result-dataclass helpers, in-process), sqlite/revisions/ + postgres/revisions/ (revision .sql files), optional TimescaleDB hypertable support for append-only time-series tables
  versioning/     # Generic versioning infrastructure: VersionSnapshot[T] model, VersioningService[T] (content-addressable deduplication via SHA-256 hash, INSERT OR IGNORE concurrent-write safety), compute_content_hash
  telemetry/      # Opt-in product telemetry (disabled by default): TelemetryReporter protocol, TelemetryEvent model, PrivacyScrubber (allowlist + forbidden pattern validation), TelemetryCollector (heartbeat scheduling, deployment ID persistence, environment resolution chain), host_info (Docker daemon `/info` enrichment for startup events via aiodocker), reporters/ (LogfireReporter, NoopReporter), TelemetryConfig
  observability/  # Structured logging, correlation tracking, redaction, third-party logger taming, log shipping (syslog, HTTP), compressed archival, events/
  providers/      # LLM provider abstraction, presets, model auto-discovery, capabilities, runtime CRUD (management/), local model management (pull/delete/config via LocalModelManager protocol), provider families, discovery SSRF allowlist, health tracking, active health probing, defaults_config (ProviderModelDefaults: last-resort metadata fallbacks when LiteLLM exposes no per-model data, e.g. fallback_max_output_tokens), routing/ (strategy-based model routing, multi-provider resolution with ModelCandidateSelector protocol, QuotaAwareSelector, CheapestSelector)
  settings/       # Runtime-editable settings (DB > env > code), Fernet encryption, ConfigResolver, bootstrap_resolver (pre-init env > default), definitions/, subscribers/ (SecuritySubscriber for discovery allowlist hot-reload)
  security/       # Rule engine, audit log, output scanner, progressive trust, autonomy levels, timeout policies, LLM fallback evaluator, custom policy rules, risk scoring (pluggable RiskScorer protocol, multi-dimensional RiskScore, DefaultRiskScorer), enforcement modes (active/shadow/disabled via SecurityEnforcementMode), risk override (SecOps risk tier reclassification via RiskTierOverride + SecOpsRiskClassifier), SSRF violation tracking (SsrfViolation model, pending/allowed/denied status for self-healing discovery allowlist)
  templates/      # Pre-built company templates (inheritance tree), template merge engine, personality presets, preset discovery/CRUD service, model requirements, tier-to-model matching, locale-aware name generation, workflow config rendering, pack_loader (additive team packs), packs/ (built-in pack YAMLs), uses_packs composition
  meta/           # Self-improvement meta-loop: signal aggregation (7 domains), rule engine (9 built-in rules + custom declarative rules via dashboard), improvement strategies (config/architecture/prompt tuning), proposal guards (scope/rollback/rate-limit/approval), rollout (before-after/canary, tiered regression detection), appliers (config/prompt/architecture/code each expose dry_run() validation via shared appliers/_validation.py helpers: parse_dotted_path, apply_diff_to_dict, validate_payload_keys, format_validation_errors), Chief of Staff role. Custom rule authoring: DeclarativeRule, CustomRuleDefinition model, METRIC_REGISTRY (25 metrics), CustomRuleRepository protocol + SQLite impl, CustomRuleController (CRUD + preview). Unified MCP API server: 240+ tools across 21 domains with capability-based scoping (registry, scoper, invoker, tool builders, domain defs, handlers). Service orchestrator, factory, config
  tools/          # Tool registry, built-in tools, git SSRF prevention, MCP bridge, sandbox factory (gVisor default overrides via merge_gvisor_defaults), invocation tracking, network_validator (shared SSRF), sub_constraint_enforcer (granular enforcement of core.tool_constraints), disclosure_config (ToolDisclosureConfig), disclosure_metrics (ToolDisclosureMetrics), discovery (ListToolsTool, LoadToolTool, LoadToolResourceTool, ToolDisclosureManager, DeferredDisclosureManager), web/ (HTTP requests, HTML parsing, web search), database/ (SQL query, schema inspection), terminal/ (sandboxed shell commands), design/ (image generation via ImageProvider protocol, diagram DSL generation, asset management), communication/ (SMTP email sending, notification dispatch via NotificationDispatcherProtocol, Jinja2 template formatting), analytics/ (data aggregation via AnalyticsProvider protocol, report generation, metric collection via MetricSink protocol), sandbox/ (4-domain SandboxPolicy model (filesystem/network/process/inference), SandboxRuntimeResolver (gVisor probe + per-category runtime resolution with fallback), SandboxCredentialManager (env var credential stripping))

web/src/          # React 19 dashboard (see web/CLAUDE.md for full structure)
cli/              # Go CLI binary (see cli/CLAUDE.md for full structure)
site/             # Astro landing page (synthorg.io), React islands for interactive sections
data/             # Shared data files (competitors.yaml for comparison page)

Releasing

  • Automated by Release Please: every push to main creates/updates a release PR with changelog
  • Version bumping: always-bump-patch strategy; every release bumps patch (e.g. 0.5.3 -> 0.5.4), regardless of commit type. auto-rollover.yml detects when the last stable patch meets the __synthorg_rollover_at_patch threshold in .github/release-please-config.json (default 9) and creates an empty Release-As: 0.(X+1).0 commit to preserve the 0.X.9 -> 0.(X+1).0 pattern automatically.
  • Release-As trailer: for exception bumps (1.0 graduation, explicit version jumps), land a Release-As: X.Y.Z trailer in a commit on main. Two valid routes: (a) final paragraph of a feature-PR body that will be squash-merged (squash copies the trailer into the main-branch commit message, where auto-rollover.yml and Release Please both pick it up); (b) trigger Actions -> Graduate -> Run workflow with target_version + reason. The Graduate workflow mints a synthorg-repo-bot App installation token and creates a signed empty commit on main via the Git Data API, landing a Release-As: X.Y.Z trailer that both RP and auto-rollover pick up. Downgrades and same-version graduations are hard-blocked by the workflow's validation step; fix forward with a higher target instead. The prior "add Release-As: to the RP release PR body" route is deliberately unsupported: that edit never becomes a commit on main until the RP PR merges, so auto-rollover.yml can race ahead and push a conflicting trailer before RP reacts.
  • Signed commits: every CI-generated commit on main is produced via the GitHub API under the synthorg-repo-bot App installation token, verifying as {verified: true, reason: "valid"}. main enforces required_signatures via the protect-main ruleset, so an unsigned commit would be rejected outright. One deliberate exception: the BSL Change Date update on the Release Please PR branch (release.yml "Update BSL Change Date" step) commits via GITHUB_TOKEN rather than the App token. The commit lands on the RP PR branch (not main), so the recursion-suppression penalty of GITHUB_TOKEN does not apply, and GitHub's ambient token still produces a signed commit attributed to github-actions[bot] which satisfies branch protection via the eventual squash-merge.
  • Release flow: merge release PR -> draft Release + tag -> Docker + CLI workflows build, smoke-test the artifacts at build time (smoke-test-backend-image against the just-built image; smoke-test-cli-binary against the just-built binary), and attach assets to the draft -> finalize-release.yml posts a finalize-release commit status, assembles the Verification section, and publishes the draft. On stable releases, superseded dev pre-releases + tags (those whose base version is at or below the published stable) are then deleted; dev builds targeting a higher, not-yet-released version are preserved. Smoke tests run at build time (not at finalise) so a broken artifact fails the originating PR with a red ❌ on the commit row, not the finalise step after a tag has already been cut.
  • Dev channel: every push to main (except Release Please bumps) creates a dev pre-release (e.g. v0.8.4-dev.3) via dev-release.yml. Users opt in with synthorg config set channel dev. Dev releases flow through the same Docker + CLI pipelines as stable releases. When a stable release is published, dev releases and tags whose base version is at or below it are deleted; dev builds targeting a higher, not-yet-released version are preserved (a main push can mint the next version's dev.1 while the previous stable is still finalising). If a dev release is swept while its docker.yml run is still in flight, that run's update-release step skips gracefully (warns, exits 0) rather than failing.
  • Nightly verification: deliberately none. The build-time pipeline (docker.yml + cli.yml + finalize-release.yml) is the source of truth for release-body structure, asset signing, and SBOM attachment. App-token signing is a property of the GitHub API auth path (POST /git/commits under an installation token returns a GitHub-signed commit unconditionally), not of any code we own; a misconfigured secret or revoked installation would also fail the next real release, so a nightly canary mostly catches its own implementation drift. Earlier release-pipeline-health.yml and test-signing.yml workflows were removed for that reason.
  • Pre-1.0 -> post-1.0 transition: when v1.0.0 ships, always-bump-patch stays in place (the SynthOrg release cadence favours conservative patch bumps). What flips is bump-minor-pre-major: true in the RP config; after 1.0 this flag is dropped so BREAKING CHANGE: footers start producing major bumps again (1.x.y -> 2.0.0). Release-As: trailers keep working unchanged. auto-rollover.yml also keeps working unchanged; patch-rollover is version-independent, and rollover at 1.x.9 -> 1.(x+1).0 continues to use the same mechanism. A follow-up PR will flip the config flag when v1.0.0 lands.
  • Config: .github/release-please-config.json, .github/.release-please-manifest.json (do not edit manually)
  • Changelog: .github/CHANGELOG.md (auto-generated, do not edit)
  • Version locations: pyproject.toml ([tool.commitizen].version), src/synthorg/__init__.py (__version__)

CI

  • Path filtering: dorny/paths-filter; jobs only run when their domain is affected. CLI has its own workflow (cli.yml).
  • Jobs: lint (ruff) + type-check (mypy) + test-unit (matrix sharded via pytest-split, balanced from .test_durations.unit; shard count in .github/workflows/ci.yml matrix.shard) + test-integration (matrix sharded via pytest-split, balanced from .test_durations.integration, backed by services: postgres instead of testcontainers; conftest detects SYNTHORG_TEST_POSTGRES_HOST/PORT/USER/PASSWORD/DB and yields a connection-info proxy directly) + test-e2e (single shard, same service container) + test-conformance-sqlite (SQLite-only -k "not postgres" slice of the conformance suite). All four arms set COVERAGE_CORE=sysmon for the lower-overhead coverage.py tracing backend (line + branch parity since coverage 7.7). Each shard collects coverage; test-coverage-aggregate combines them, asserts every shard contributed, and enforces the coverage gate via coverage report --fail-under=$(...) driven by [tool.coverage.report] fail_under in pyproject.toml before a single best-effort Codecov upload. Plus python-audit (pip-audit), dockerfile-lint (hadolint), dashboard (lint/type-check/test under the active-handle gate/build/storybook-build/audit), export-openapi (runs scripts/export_openapi.py once and shares the artifact with the dashboard arm), and .github/actions/install-postgres-18-client (shared composite for PGDG postgresql-client-18 install with SHA-256-pinned signing key). All run in parallel -> ci-pass gate.
  • Pages: pages.yml: version extraction from pyproject.toml, OpenAPI export, comparison page generation, Astro + Zensical docs build, GitHub Pages deploy on push to main
  • PR Preview: pages-preview.yml: Cloudflare Pages deploy per PR (pr-<number>.synthorg-pr-preview.pages.dev), cleanup on PR close
  • Docker: docker.yml: build + Trivy scan + CIS benchmark run on every PR; push to GHCR + cosign sign + SLSA L3 provenance gated by the image-push deployment environment (branch policy main,v*). Build and publish are split into separate jobs per image (build-X + build-X-publish); only the publish half carries packages: write / id-token: write / attestations: write. Shared logic lives in composite actions (build-scan-image, publish-image). CVE triage: .github/.trivyignore.yaml
  • CLI: cli.yml: Go lint/test/build (cross-compile) + govulncheck + fuzz. GoReleaser release on v* tags with cosign signing + SLSA provenance, gated by the release-tags deployment environment (v*-only, no privileged secrets; keeps RELEASE_PLEASE_TOKEN out of the tag path). The release job's gh release upload/download/edit + body-read calls go through .github/scripts/gh_with_retry.sh and the checksums.txt keyless signing through .github/scripts/cosign_sign_with_retry.sh sign-blob; the four attest-build-provenance steps and the SBOM install steps ride bounded continue-on-error retry ladders so a transient Rekor/Sigstore timeout does not fail a release
  • Renovate: daily dependency updates via Mend GitHub App. 3 domain groups (Python, Web, Infrastructure), no auto-merge. The Infrastructure group spans Go modules, Dockerfile + docker-compose images, GitHub Actions SHAs, and every custom-regex pin (binary-tool versions like Trivy / Gitleaks / D2 / apko, container-image regexes for state.go / compose.yml / busybox / testcontainers, action version: inputs like golangci-lint / GoReleaser, go install URLs like govulncheck). Config: renovate.json. Use /review-dep-pr before merging
  • Security scanning: gitleaks (push/PR + weekly), zizmor (workflow analysis), OSSF Scorecard (weekly), Socket.dev (PR supply chain), ZAP DAST (weekly + manual, rules: .github/zap-rules.tsv)
  • Coverage: Codecov (best-effort, CI not gated on availability)
  • Dependency review: dependency-review.yml: license allow-list (permissive + weak-copyleft), per-package GPL exemptions for dev-only tool deps (golangci-lint), PR comment summaries
  • CLA: cla.yml: two jobs splitting read and write. cla-check (pull_request_target) runs self-contained bash + gh api against .github/cla-signatures.json on the cla-signatures branch, with a gh_api_retry helper that does bounded exponential-with-cap retry on transient EPIPE / 5xx (8 attempts, ~10-min budget under a 12-min job timeout) and fails fast on definitive 4xx. It uses the <!-- synthorg-cla-check --> marker for idempotent PR comment updates (PATCH if the marker comment exists, POST on first transition). cla-sign (issue_comment matching the sign-text body) records the signature via the Git Data API under the synthorg-repo-bot App token. Bot allowlist (dependabot[bot], renovate[bot], synthorg-repo-bot[bot], github-actions[bot]) skips the CLA on both jobs.
  • Release: release.yml: Release Please creates draft release PR. Mints a synthorg-repo-bot App installation token via the release-runner-setup composite action (secrets documented in docs/reference/github-environments.md). Gated by the release deployment environment. Includes a Highlights step that calls GitHub Models (openai/gpt-4.1-mini via actions/ai-inference, Copilot Pro quota, no new secret) to prepend a three-section summary to the release PR body, wrapped in <!-- HIGHLIGHTS_START -->...<!-- HIGHLIGHTS_END --> markers. Total bullet count is dynamic (1-15) scaled to the changelog volume and distributed across three fixed headers: What you'll notice (user-facing fixes + UX / behaviour changes), What's new (newly-introduced capabilities and extensions), Under the hood (maintenance, deps, refactors, included only when notable). Empty sections are omitted. Opt out per-release by adding a No-Highlights: trailer (case-insensitive, anywhere on its own line) to the Release Please PR body before the workflow runs. finalize-release.yml then promotes the same marker block from the merged release-please PR body into the published release body (release-please builds release notes from CHANGELOG.md only, so without this promotion the Highlights block would stay stranded on the PR; see "Finalise Release" below). The CLI consumes the same Highlights block during synthorg update on stable channels: it walks every release in (installed, target] oldest-to-newest in batches of 3 and renders the styled summary by default, with c toggling between the AI summary and the Release Please commit-based changelog. Releases without a Highlights block (pre-rollout or No-Highlights: opt-out) fall back to the commit view automatically. Dev pre-releases have no Highlights block by design, so the CLI walk renders a single combined commit list via the GitHub compare API instead. Walk is gated to interactive TTY runs; --quiet / --json / --yes / non-TTY contexts skip the walk and print the terse "Update available" notice + release-notes URL. The LICENSE / PR-body / head-SHA reads and the four required-status POSTs are wrapped by .github/scripts/gh_with_retry.sh (retry transient 401/5xx, fast-fail definitive 4xx); timeout-minutes: 15 bounds the stacked retry ladders so a black-holed connection cannot hold the release-please concurrency group.
  • Auto Rollover: auto-rollover.yml: detects when the last stable tag's patch meets the __synthorg_rollover_at_patch threshold in .github/release-please-config.json (default 9), creates an empty commit on a versioned rollover branch (chore/auto-rollover-v<next>), and opens a PR whose body carries the Release-As: 0.(minor+1).0 trailer so the squash-merge lands it on main and Release Please targets the minor bump. Four skip guards: (1) Release Please release commits and its own prior rollover commits (matched on subject prefix); (2) a history-independent check (gh pr list) that the rollover PR for this exact version branch is already merged or open (skips MERGED / OPEN, but not CLOSED-without-merge, which never took effect); (3) any Release-As: trailer already in the last-stable..HEAD range, evaluated fail-closed so a range that cannot be computed (incomplete fetch) skips the run rather than rolling over; (4) any open Release Please release PR whose body already queues a Release-As: trailer. Gated by the release deployment environment. The empty commit and the rollover branch ref are created via the Git Data API (POST /git/commits + POST /git/refs, force-PATCH if the branch ref already exists) under the App installation token, so the squash-merge onto main ships a verified signature (required by main's signed-commits rule) and triggers downstream Release + Dev Release workflows. The dedup-read gh pr list guards are wrapped by the shared .github/scripts/gh_with_retry.sh helper (bounded exponential retry on transient 401/5xx, fast-fail on definitive 4xx, exit 75 on exhaustion which here means fail-closed skip); timeout-minutes: 8 accommodates the helper's ~1m45s worst-case ladder. The Git Data API writes stay un-retried so a real write failure pages.
  • Graduate: graduate.yml: workflow_dispatch one-click Release-As: trailer for target versions that skip the normal patch cadence (1.0 graduation, explicit minor jumps). Inputs: target_version + reason. Validates target is strictly above last stable (hard-blocks downgrades). Creates a signed empty commit on main with the trailer via the Git Data API under the App installation token. Gated by the release deployment environment. The parent-tree and verification reads go through .github/scripts/gh_with_retry.sh; the commit POST + ref PATCH stay un-retried so a write failure on this manual, human-watched graduation pages.
  • Dev Release: dev-release.yml: creates semver dev tags (e.g. v0.8.4-dev.3) and draft pre-releases on every push to main (skips Release Please version-bump commits). Tags trigger existing Docker + CLI workflows for full build/scan/sign pipeline. Gated by the release deployment environment. Uses the release-runner-setup composite for token mint. Pre-release body is built locally via git log -1 on the head SHA and gh release create --notes-file: title $DEV_TAG (e.g. v0.8.4-dev.5), then a Dev build #N toward vX.Y.Z line, **Commit:** <short SHA>, **Subject:** <commit subject>, the **Full pipeline:** disclaimer, and the channel opt-in tip. Only the short SHA and the commit subject are written into the notes file -- the full commit body (squash-merge PR descriptions of hundreds of lines, nested markdown, tables) is deliberately omitted because it renders poorly on the release page and buries what changed. Variables go through printf '%s' placeholders (the --notes-file route avoids command substitution that bare --notes "..." would suffer if a commit subject contained backticks or $(...)). Failure path: if gh release create returns non-zero (transient API error, 5xx, rate limit), the workflow exits 1 with the orphan tag preserved -- deleting the tag would race the downstream tags: v*-listening workflows that the tag-create push already triggered (cli.yml, docker.yml), 404'ing their actions/checkout step. The orphan tag is later garbage-collected by the same workflow's incremental sweep (keeps 5 most recent dev pre-releases) and by finalize-release.yml's stable-release sweep. End-of-job regression guard Verify minted tag survived the run always re-resolves refs/tags/$DEV_TAG (via if: always() so the guard runs on failure paths where tag loss is most likely) and exits 1 if absent, routing through the existing report-failure job into the dev-release regression tracking issue. Workflow-tag-lifecycle pre-push gate (scripts/check_workflow_tag_lifecycle.py) statically prevents any future workflow from re-introducing the create-then-conditionally-delete shape. The end-of-run tag-survival check reads through .github/scripts/gh_with_retry.sh so a transient 401 cannot fire a false "tag deleted" alarm (a real 404 still fast-fails and trips the guard).
  • Finalise Release: finalize-release.yml: assembles the release body and publishes the draft once both Docker + CLI workflows succeed for the tag. Body assembly: prepends the AI Highlights block (stable releases only) extracted from the merged release-please PR body via the head_sha → pulls association, then re-applies the Verification section from the per-image marker comments (<!-- CLI_VERIFICATION_DATA -->, <!-- CONTAINER_VERIFICATION_DATA -->, etc.). The strip step that prevents finalise re-runs from doubling sections gates EVERY marker-pair deletion on both START and END being present in the body; sed '/START/,/END/d' is greedy to EOF without an END, which would tank the entire CHANGELOG-derived body if a contributor's commit subject (now propagated verbatim into dev release bodies via dev-release.yml) happened to contain a literal opening marker. The gate applies to HIGHLIGHTS and to all five CLI_* / CONTAINER_* verification-data marker pairs. The FINALIZE_VERIFICATION marker is intentionally greedy-to-EOF: everything after it IS the verification section, rebuilt fresh on each finalise run. Posts a finalize-release commit status (pending at start, success / failure at finish) so workflow_run-triggered failures surface as a red ❌ on the commit row instead of disappearing into the Actions tab. Gated by the release deployment environment. Immutable releases enabled. Handles both stable and dev releases. Stable-release dev-cleanup deletes every dev release + every orphan dev tag matching vX.Y.Z-dev.N whose base version is at or below the published stable (future-version dev builds are skipped via a sort -V semver compare, so a next-version dev.1 minted during the previous stable's finalise window is not swept out from under its in-flight docker.yml run) -- the inner gh api calls are explicitly capture-and-checked (NOT mapfile < <(...), which silently treats inner-process failures as empty input) and per-tag gh release delete / gh api -X DELETE failures accumulate into a final exit-on-failure check so partial-cleanup is loudly diagnosed. The Highlights propagation path that fetches the release-please PR body splits the gh pr view call into capture + classify so an auth / rate-limit failure surfaces a ::warning:: distinct from "PR was deleted" (legitimate skip with ::notice::). Artifact smoke testing happens at BUILD time in cli.yml and docker.yml via the smoke-test-cli-binary and smoke-test-backend-image composite actions; the finalise step does not re-test (Docker images are content-addressed and CLI archives are SHA-256-verified by the cosign-signed checksums.txt).
  • CI failure-surfacing policy: every CI workflow must surface its outcome somewhere visible. Non-schedule failure paths (push / pull_request / workflow_run / release / dispatch) post a commit status or PR check; schedule failure paths open or update a tracking GitHub Issue labelled automation:ci-health. Schedule-triggered workflows have no commit context to attach to, hence the issue lane; manual workflow_dispatch runs surface failures in the run UI directly so they do not open issues. The shared composite is .github/actions/post-tracking-issue; it dedupes by title across all states (open + closed), so a regression that reappears reopens the same tracker rather than creating a duplicate; consumers that auto-close on success (e.g. ci-preflight.yml) should also unpin in the close path so a closed-and-resolved issue does not stay in the pinned row. Workflows currently using this pattern: apko-lock.yml, ci-preflight.yml, dast.yml, python-audit.yml, evals.yml, scorecard.yml, secret-scan.yml. Pinned tracking-issue label: automation:ci-health. Success events (stable release published, dev pre-release cut, auto-rollover success) deliberately do NOT generate notifications; the GitHub Releases tab and commit row already surface those, and posting them would just spam the tracker.
  • SBOM Diff: sbom-diff.yml: inform-only sticky PR comment on Release Please release PRs. Added / removed components + license category counts from the head backend SBOM vs last stable. dependency-review.yml remains the license gate; this comment is advisory.

Dependencies

  • Pinned: all versions use == in pyproject.toml
  • Groups: test (pytest + plugins, hypothesis), dev (includes test + ruff, mypy, pre-commit, commitizen, pip-audit)
  • Required: mem0ai (Mem0 memory backend, the default backend), mmh3 (murmurhash3 for BM25 sparse vector encoding in hybrid search), cryptography (Fernet encryption for sensitive settings at rest), faker (multi-locale agent name generation for templates and setup wizard), httpx (async HTTP client for web tools)
  • Install: uv sync installs everything (dev group is default)
  • Web dashboard: Node.js 22+, TypeScript 6.0+, dependencies in web/package.json (React 19, react-router, shadcn/ui, Base UI, Tailwind CSS 4, Zustand, @tanstack/react-query, @xyflow/react, @dagrejs/dagre, d3-force, @dnd-kit, Recharts, Motion, cmdk-base, js-yaml, Axios, Lucide React, @fontsource-variable/geist, @fontsource-variable/geist-mono, @fontsource-variable/jetbrains-mono, @fontsource-variable/inter, @fontsource/ibm-plex-mono, @fontsource/ibm-plex-sans, CodeMirror 6, Storybook 10, MSW, msw-storybook-addon, Vitest, @vitest/coverage-v8, @testing-library/react, fast-check, ESLint, @eslint-react/eslint-plugin, eslint-plugin-security, Playwright, @lhci/cli, rollup-plugin-visualizer, cross-env)
  • CLI: Go 1.26+, dependencies in cli/go.mod (Cobra, charm.land/huh/v2, charm.land/lipgloss/v2, sigstore-go, go-containerregistry, go-tuf)
  • Landing page: dependencies in site/package.json (Astro 6, @astrojs/react, React 19, Tailwind CSS 4, js-yaml)

Property-based Testing (Hypothesis): Deep Dive

The short rule in CLAUDE.md: Python uses Hypothesis; profiles live in tests/conftest.py; CI runs deterministic 10-example sweeps; failing examples are real bugs.

Profiles

Configured in tests/conftest.py, selected via HYPOTHESIS_PROFILE env var:

  • ci: deterministic, max_examples=10 + derandomize=True. Fixed seed per test, same inputs every run (no flakes).
  • dev: 1000 examples.
  • fuzz: 10,000 examples, no deadline. For dedicated fuzzing sessions.
  • extreme: 500,000 examples, no deadline. Overnight deep fuzzing.

.hypothesis/ is gitignored. Failing examples persist to ~/.synthorg/hypothesis-examples/ (write-only shared DB, survives worktree deletion) via _WriteOnlyDatabase in tests/conftest.py.

Running locally

  • Quick (1000 examples): HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n 8 -k properties
  • Deep (10,000 examples, all @given tests): HYPOTHESIS_PROFILE=fuzz uv run python -m pytest tests/ -m unit -n 8 --timeout=0
  • --timeout=0 disables the 30s per-test limit that would kill long-running property tests.
  • -k properties is intentionally omitted to cover all 46 files with @given, not just the 12 *_properties.py files.

When Hypothesis finds a failure

It is a real bug. The shrunk example is saved to ~/.synthorg/hypothesis-examples/ for analysis but is not replayed automatically (that would block all test runs).

Do NOT just rerun and move on. Read the failing example from the output, fix the underlying bug, and add an explicit @example(...) decorator to the test so the case is permanently covered in CI.

Cross-language equivalents

  • React: fast-check (fc.assert + fc.property)
  • Go: native testing.F fuzz functions (Fuzz*)