HR & Agent Lifecycle¶

This page covers the operational lifecycle of every agent in a synthetic organisation, from hiring through performance tracking, evolution, and offboarding. The HR subsystem is how SynthOrg simulates a workforce: closed-loop hiring when new skills are needed, performance-driven pruning when agents fail to deliver, and pluggable evolution for agents that need to adapt their identity.

See Agents for the identity layer (personality, skills, tool namespaces, identity versioning).

Authority: role + reporting graph¶

Authority is not a scalar rank. It derives from an agent's role and its position in the organisation's reporting graph. Each Role declares an optional reports_to (the role name of its supervisor); the CEO role sits at the root with reports_to = None.

core/authority.py computes authority from that graph:

role_depth(role): distance from the CEO root (CEO is 0, its reports 1, and so on).
reporting_chain(role): the ordered chain of supervisors up to the root.
outranks(a, b) / compare_authority(a, b): whether role a is a (transitive) superior of role b, and a sign-comparison by reporting depth.

Consumers that need "who is more senior" (conflict resolution, owner selection, department-head detection) compare reporting depth via these helpers rather than reading a per-agent level. A role's model tier is a separate, independent axis driven by the work's capability demand (see Providers), not by org position.

Role Catalog¶

The role catalog is extensible; users can add custom roles via config. The built-in catalog covers common organisational roles:

C-Suite / ExecutiveProduct & DesignEngineeringQuality AssuranceData & AnalyticsOperations & SupportCreative & Marketing

CEO: Overall strategy, final decision authority, cross-department coordination
CTO: Technical vision, architecture decisions, technology choices
CFO: Budget management, cost optimisation, resource allocation
COO: Operations, process optimisation, workflow management
CPO: Product strategy, roadmap, feature prioritisation

Product Manager: Requirements, user stories, prioritisation, stakeholder communication
UX Designer: User research, wireframes, user flows, usability
UI Designer: Visual design, component design, design systems
UX Researcher: User interviews, analytics, A/B test design
Technical Writer: Documentation, API docs, user guides

Software Architect: System design, technology decisions, patterns
Frontend Developer (Junior/Mid/Senior): UI implementation, components, state management
Backend Developer (Junior/Mid/Senior): APIs, business logic, databases
Full-Stack Developer (Junior/Mid/Senior): End-to-end implementation
DevOps/SRE Engineer: Infrastructure, CI/CD, monitoring, deployment
Database Engineer: Schema design, query optimisation, migrations
Security Engineer: Security audits, vulnerability assessment, secure coding

QA Lead: Test strategy, quality gates, release readiness
QA Engineer: Test plans, manual testing, bug reporting
Automation Engineer: Test frameworks, CI integration, E2E tests
Performance Engineer: Load testing, profiling, optimisation
Red Team: Adversarial review of high-stakes deliverables (boot-instantiated)
Completion Reviewer: Independent peer review at the completion oracle (boot-instantiated)

Data Analyst: Metrics, dashboards, business intelligence
Data Engineer: Pipelines, ETL, data infrastructure
ML Engineer: Model training, inference, MLOps

Project Manager: Timelines, dependencies, risk management, status tracking
Scrum Master: Agile ceremonies, impediment removal, team health
HR Manager: Hiring recommendations, team composition, performance tracking
Security Operations: Request validation, safety checks, approval workflows

Content Writer: Blog posts, marketing copy, social media
Brand Strategist: Messaging, positioning, competitive analysis
Growth Marketer: Campaigns, analytics, conversion optimisation

Dynamic Roles¶

Users can define custom roles via config:

custom_roles:
  - name: "Blockchain Developer"
    department: "Engineering"
    skills: ["solidity", "web3", "smart-contracts"]
    system_prompt_template: "blockchain_dev.md"
    reports_to: "CTO"
    suggested_model: "large"

Hiring Process¶

The HR system manages the agent workforce dynamically:

HR agent (or human) identifies a skill gap or workload issue
HR generates candidate cards based on team needs:
- What skills are underrepresented?
- What role (and where in the reporting graph) is needed?
- What personality would complement the team?
- What model/provider fits the budget?
Candidate cards are presented for approval (to CEO or human)
Approved candidates are instantiated and onboarded
Onboarding includes: company context, project briefing, team introductions, learned from seniors (training mode)

Training Mode¶

Training mode is a pluggable knowledge-transfer pipeline that seeds newly hired agents with curated senior experience at onboarding time. It runs as the LEARNED_FROM_SENIORS onboarding step.

Pipeline:

Source selection: select senior agents as knowledge sources (pluggable: role top performers, department diversity sampling, user-curated list, or composite)
Extraction: extract procedural memories, semantic knowledge, and tool usage patterns from source agents in parallel
Curation: reduce candidates to a ranked subset (pluggable: relevance score or LLM-curated)
Guard chain: sanitization (mandatory, non-bypassable), volume caps (per-content-type hard limits), review gate (human approval via ApprovalStore)
Storage: seed approved items into the new agent's memory backend with training tags

Per-hire customisation:

override_sources: explicit agent IDs bypassing the selector
content_types: enable/disable specific extractors
custom_caps: override default volume caps per content type
skip_training: bypass the step entirely

Safe defaults: RoleTopPerformers (top 3), RelevanceScoreCuration, all guards enabled, human review required. Idempotent by plan ID.

Design decisions (Decision Log D8)

D8.1: Source. Templates + LLM customisation. Templates for common roles (reuses existing template system). LLM generates config for novel roles not covered by templates. Approval gate catches invalid/bad configs before instantiation.
D8.2: Persistence. Operational store via PersistenceBackend. YAML stays as bootstrap seed; operational store wins for runtime state. Enables rehiring and auditable history.
D8.3: Hot-plug. Agents are hot-pluggable at runtime via a dedicated company/registry service (not AgentEngine, which remains the per-agent task runner). Thread-safe registry, wired into message bus + tools + budget.

Pruning¶

The pruning service automates performance-driven agent removal with mandatory human approval.

PruningPolicy protocol with two implementations:
ThresholdPruningPolicy: prunes agents with quality AND collaboration below thresholds for N+ consecutive windows (7d/30d/90d).
TrendPruningPolicy: prunes agents with declining Theil-Sen trend across all three windows.
PruningService runs as a periodic background task, evaluates all active agents, and creates CRITICAL-risk approval items for eligible candidates.
On human approval, delegates to OffboardingService with FiringReason.PERFORMANCE.
Approval deduplication prevents multiple pending approvals per agent.
Transient offboarding failures are retried on subsequent cycles.

Module: src/synthorg/hr/pruning/ (models, policy, service).

Dynamic Scaling¶

The scaling service closes the loop between workload, budget, skill coverage, and performance signals and the existing hiring/pruning pipelines. It evaluates four pluggable strategies in parallel, filters decisions through a guard chain, and produces approved scaling actions.

Architecture¶

Orchestrated by ScalingService in hr/scaling/service.py.

Strategies¶

Strategy	Signals	Actions	Default
WorkloadAutoScale	avg utilisation, queue depth	HIRE when > 85% sustained, PRUNE when < 30% sustained	Enabled
BudgetCap	burn rate %, alert level	PRUNE when > 90% safety margin, HOLD to block hires	Enabled
SkillGap	coverage ratio, missing skills	HIRE with specific skill profile	Disabled (LLM cost)
PerformancePruning	quality/collaboration trends	PRUNE via existing PruningPolicy	Enabled

Each strategy supports a headless (rule-based) path and an optional agent-delegated path (agent_delegate config field). Agent delegation is protocol-stubbed but not implemented; the headless path is always used.

PerformancePruningStrategy coordinates with the evolution system: when defer_during_evolution is True (default), agents with recent evolution adaptations are skipped.

Guard Chain¶

All decisions flow through guards sequentially before execution:

ConflictResolver: priority-ordered resolution. Default: BudgetCap (0) > PerformancePruning (1) > SkillGap (2) > Workload (3). HOLD from BudgetCap blocks HIRE from lower-priority strategies.
CooldownGuard: per action-type + target cooldown (default 1 hour).
RateLimitGuard: global daily caps (default 3 hires, 1 prune per day).
ApprovalGateGuard: routes decisions through ApprovalStore as ApprovalItem entries for human approval.

Configuration¶

scaling:
  enabled: true
  workload:
    enabled: true
    hire_threshold: 0.85
    prune_threshold: 0.30
  budget_cap:
    enabled: true
    safety_margin: 0.90
    headroom_fraction: 0.60
  skill_gap:
    enabled: false
  performance_pruning:
    enabled: true
    defer_during_evolution: true
  triggers:
    batched_interval_seconds: 900
  guards:
    cooldown_seconds: 3600
    max_hires_per_day: 3
    max_prunes_per_day: 1
    approval_expiry_days: 7

Dashboard¶

The /scaling page shows:

Signal gauges: utilisation, budget burn, declining agent count
Strategy controls: enabled status, priority order
Pending decisions: awaiting human approval
Recent decisions: history with outcome and rationale

Module: src/synthorg/hr/scaling/ (models, protocols, strategies, signals, triggers, guards, context, config, factory, service).

Boot wiring and rollout¶

The pipeline is OPT-IN but ghost-wired. build_scaling_service (in hr/scaling/factory.py) assembles the ScalingService over the hiring and offboarding services, and wire_scaling (in api/lifecycle_helpers/scaling_wiring.py) constructs it at startup whenever its collaborators are present, regardless of hr.scaling_enabled. The hr.scaling_enabled switch (off by default) is enforced live at the /scaling endpoints via ensure_feature_enabled, which 503s while it is off; toggling it applies on the next request with no restart. wire_scaling also constructs the durable HiringService, attaches the per-backend hiring_requests repository, and reloads in-flight requests so an approved hire is not orphaned by a restart. Gated on a connected persistence backend plus a wired registry, performance tracker, and approval store; absent any of those the service stays unwired and the endpoints 503.

Firing / Offboarding¶

Offboarding is triggered by: budget cuts, poor performance metrics, project completion, or human decision.

Agent's memory is archived (not deleted)
Active tasks are reassigned
Team is notified

Design decisions (Decision Log D9, D10)

Each decision below names the protocol that is currently implemented and the concrete Initial strategy that the default factory wires. "Initial strategy" is the shipped default, not aspirational scaffolding; operators replace it by registering an alternative strategy on the relevant factory.

D9: Task Reassignment. Pluggable TaskReassignmentStrategy protocol. Initial strategy: queue-return (concrete: QueueReturnStrategy in src/synthorg/hr/queue_return_strategy.py); tasks return to unassigned queue, existing TaskRoutingService re-routes with priority boost for reassigned tasks. Future strategies on the backlog: same-department / lowest-load, manager-decides (LLM), HR agent decides.
D10: Memory Archival. Pluggable MemoryArchivalStrategy protocol. Initial strategy: full snapshot, read-only (concrete: FullSnapshotStrategy in src/synthorg/hr/full_snapshot_strategy.py). Pipeline: retrieve all memories, archive to ArchivalStore, selectively promote semantic+procedural memories to OrgMemoryBackend (rule-based), clean hot store, mark agent TERMINATED. Rehiring restores archived memories into a new AgentIdentity. Future strategies on the backlog: selective discard, full-accessible.

Performance Tracking¶

Performance data is exposed via three API sub-routes on /api/v1/agents/{agent_id} (the agent's stable id):

Sub-route	Response model	Description
`GET /performance`	`AgentPerformanceSummary`	Flat summary: tasks completed (total/7d/30d), success rate, cost per task, quality/collaboration scores, trend direction, plus raw window metrics and trend results
`GET /activity`	`PaginatedResponse[ActivityEvent]`	Paginated chronological timeline merging lifecycle events, task metrics, cost records, tool invocations, and delegation records (most recent first). Supports typed `ActivityEventType` enum filtering (invalid values return 400). Cost events are redacted for read-only roles. Response includes `degraded_sources` field for partial data detection
`GET /history`	`ApiResponse[tuple[CareerEvent, ...]]`	Career-relevant lifecycle events (hired, fired, promoted, demoted, onboarded) in chronological order

The framework tracks detailed per-agent metrics:

agent_metrics:
  tasks_completed: 42
  tasks_failed: 2
  average_quality_score: 8.5     # from code reviews, peer feedback
  average_cost_per_task: 0.45
  average_completion_time: "2h"
  collaboration_score: 7.8       # peer ratings
  last_review_date: "2026-02-20"

Design decisions (Decision Log D2, D3, D11, D12)

D2: Quality Scoring. Pluggable QualityScoringStrategy protocol. Initial strategy: layered combination, comprising:

FREE: Objective CI signals (test pass/fail, lint, coverage delta)
Small daily cost (illustrative): Small-model LLM judge (different family than agent) evaluates output vs acceptance criteria (actual spend is in the operator's configured currency and provider)
On-demand: Human override via API, highest weight

All three layers are implemented via CompositeQualityStrategy (configurable CI/LLM weights, human override short-circuits with highest priority). Human override CRUD is exposed at /agents/{agent_id}/quality/override. Config fields: quality_judge_model, quality_judge_provider, quality_ci_weight, quality_llm_weight in PerformanceConfig. Future strategies: CI-only, LLM-only, human-only.

D3: Collaboration Scoring. Pluggable CollaborationScoringStrategy protocol. Initial strategy: automated behavioural telemetry, computed as:

collaboration_score = weighted_average(
    delegation_success_rate,
    delegation_response_latency,
    conflict_resolution_constructiveness,
    meeting_contribution_rate,
    loop_prevention_score,
    handoff_completeness
)

Weights are configurable per-role. Periodic LLM sampling (1%, configurable) for calibration is implemented via LlmCalibrationSampler (opt-in, requires llm_sampling_model config). Human override via API is implemented via CollaborationOverrideStore + CollaborationController at /agents/{agent_id}/collaboration. Future strategies: LLM evaluation, peer ratings, human-provided.

D11: Rolling Windows. Pluggable MetricsWindowStrategy protocol. Initial strategy: multiple simultaneous windows:

7d for acute regressions
30d for sustained patterns
90d for baseline/drift

Minimum 5 data points per window; below that, the system reports "insufficient data." Future strategies: fixed single window, per-metric configurable.

D12: Trend Detection. Pluggable TrendDetectionStrategy protocol. Initial strategy: Theil-Sen regression slope per window + configurable thresholds classify trends as improving/stable/declining. Theil-Sen has 29.3% outlier breakdown (tolerates ~1 in 3 bad data points). Minimum 5 data points. Future strategies: period-over-period, OLS regression, threshold-only.

Evaluation Loop¶

The closed-loop evaluation framework continuously measures agent performance and identifies improvement opportunities, built on top of the five-pillar evaluation, performance tracking, and trajectory scoring described elsewhere on this page. It captures traces, tags behaviour, enriches each turn with five-pillar evaluation, and proposes targeted fixes validated on the next run. The framework has its own design page: Evaluation Loop.

Agent Evolution¶

Agents improve over time through a pluggable evolution pipeline that closes the loop between execution outcomes, learned knowledge, and agent behaviour. The system follows the EvoSkill three-agent separation principle: the executing agent does not propose its own identity changes; a separate analyser does.

Architecture¶

The pipeline is orchestrated by EvolutionService in engine/evolution/service.py.

Pluggable Axes¶

Every bullet is a strategy behind a @runtime_checkable Protocol:

Triggers (engine/evolution/triggers/): BatchedTrigger, InflectionTrigger, PerTaskTrigger, CompositeTrigger
Proposers (engine/evolution/proposers/): SeparateAnalyzerProposer (EvoSkill strict), SelfReportProposer (heuristic), CompositeProposer (routes by outcome)
Adapters (engine/evolution/adapters/): IdentityAdapter (identity mutation via version store), StrategySelectionAdapter (preference memory), PromptTemplateAdapter (prompt injection)
Guards (engine/evolution/guards/): RateLimitGuard, ReviewGateGuard, RollbackGuard, ShadowEvaluationGuard (runs adapted agent on a probe task suite via a pluggable ShadowTaskProvider + ShadowAgentRunner and rejects when score or pass rate regresses beyond configured tolerances), ApproveAllGuard (no-op fallback used when every real guard is disabled), CompositeGuard (chains ALL)

Identity Version Store¶

engine/identity/store/ provides versioned identity storage with rollback:

IdentityVersionStore protocol: put, get_current, get_version, list_versions, set_current (rollback)
AppendOnlyIdentityStore: Every mutation appends a new version (full audit trail). set_current writes a new version pointing to the restored content.
CopyOnWriteIdentityStore: Maintains a separate version pointer. set_current only updates the pointer (cheaper, but loses rollback audit trail).

Both wrap AgentRegistryService + VersioningService[AgentIdentity].

Performance Inflection Events¶

PerformanceTracker emits PerformanceInflection events via an InflectionSink protocol when a metric's trend direction changes (e.g., stable to declining). InflectionTrigger implements InflectionSink and queues events for the evolution service.

Safe Defaults¶

Axis	Default	Rationale
Triggers	batched (daily) + inflection	Low cost, reactive
Proposer	composite (analyser for failures, self-report for success)	EvoSkill separation
Adapters	prompt_template ON, strategy_selection ON, identity OFF	Identity is highest risk
Guards	review_gate + rollback + rate_limit ON; shadow OFF	Safety first
Identity store	append_only	Audit trail by default
Propagation	none	Opt-in per org

Configuration¶

evolution:
  enabled: true
  triggers:
    types: [batched, inflection]
    batched_interval_seconds: 86400
  proposer:
    type: composite
    model: example-small-001
    temperature: 0.3
    max_tokens: 2000
  adapters:
    identity: false
    strategy_selection: true
    prompt_template: true
  guards:
    review_gate: true
    rollback: true
    rollback_window_tasks: 20
    rollback_regression_threshold: 0.1
    rate_limit: true
    rate_limit_per_day: 3
    shadow_evaluation: null        # null disables; set a ShadowEvaluationConfig to enable
  memory:
    capture:
      type: hybrid           # failure | success | hybrid
      min_quality_score: 8.0
    pruning:
      type: ttl              # ttl | pareto | hybrid
      max_age_days: 90
    propagation:
      type: none             # none | role_scoped | department_scoped
  identity_store:
    type: append_only

Runtime wiring status

The evolution config, service, and factory are implemented and wired: build_evolution_service() is called from the worker engine assembly (workers/_engine_assembly.py). Runtime evolution management has no REST API or dashboard UI; it is configured in the application code that wires the service.

Five-Pillar Evaluation Framework¶

Performance data is also evaluated through a structured five-pillar framework (InfoQ: Evaluating AI Agents):

Pillar	Measures	Data Sources
Intelligence/Accuracy	Quality of task output, reasoning coherence	`QualityScoreResult`, `LlmCalibrationRecord`
Performance/Efficiency	Cost, latency, token usage	`WindowMetrics` (cost, time, tokens)
Reliability/Resilience	Consistency, failure recovery, streaks	`TaskMetricRecord` sequences
Responsibility/Governance	Compliance, trust stability, autonomy adherence	Audit log, trust system, autonomy system
User Experience	Clarity, helpfulness, tone, satisfaction	`InteractionFeedback` records

Each pillar and its individual metrics can be independently enabled/disabled via EvaluationConfig. Disabled pillars/metrics have their weight redistributed proportionally to remaining enabled ones. All pillars ship enabled by default with recommended weights (equal 0.2 each).

The EvaluationService orchestrates scoring, delegating to a pluggable PillarScoringStrategy per pillar. The default per-pillar strategy is ConfigurablePillarScorer composed with the corresponding per-pillar MetricExtractor (one extractor per file under hr/evaluation/extractors/). The composite owns the shared "redistribute weights → weighted-average → clamp → confidence → log → PillarScore" pipeline so each extractor stays focused on the per-pillar data extraction. Human-calibrated LLM labelling uses the existing LlmCalibrationSampler infrastructure; calibration drift above a configurable threshold reduces the intelligence pillar's confidence (via the extractor's confidence_multiplier), signalling the need for more human labels.

Design decisions (Decision Log D24)

D24: Five-Pillar Evaluation. Pluggable PillarScoringStrategy protocol with single EvaluationContext bag. The default per-pillar strategy is ConfigurablePillarScorer composed with a per-pillar MetricExtractor:

Intelligence: IntelligenceMetricExtractor blends CI quality score (70%) with LLM calibration score (30%). High calibration drift reduces confidence via the extractor's drift multiplier.
Efficiency: EfficiencyMetricExtractor normalises cost (40%), time (30%), and token (30%) sub-metrics from the 30d window (with 7d fallback). The cost and time sub-metrics are runtime-gated by the hr.evaluation_cost_enabled and hr.evaluation_latency_enabled kill switches via the optional ConfigResolver.
Resilience: ResilienceMetricExtractor; success rate (40%), recovery rate (25%), quality consistency (20%), streak bonus (15%).
Governance: GovernanceMetricExtractor; audit compliance (50%), trust level (30%), autonomy compliance (20%).
Experience: ExperienceMetricExtractor; clarity (25%), helpfulness (25%), trust (20%), tone (15%), satisfaction (15%). Custom confidence saturation at min_feedback_count * 3 data points.

All metrics toggleable via EvaluationConfig per-pillar sub-configs. Weight redistribution follows the BehavioralTelemetryStrategy pattern. Pull-based evaluation (no background daemon).

HR Service Layer¶

MCP handlers and REST controllers never reach into HR repositories directly; every read goes through a narrow service facade so auditing, pagination, and optional-dependency degradation stay in one place per domain. The services follow the standard protocol + strategy + factory + config-discriminator pattern where interchangeable backends exist (e.g. AutonomyPolicyService, ScalingConfigService), and collapse to a single class where the behaviour is strictly orchestration (e.g. ActivityFeedService).

Service	Module	Role
`ActivityFeedService`	`src/synthorg/hr/activity_service.py`	Aggregates lifecycle events, task metrics, cost records, tool invocations, and delegation records into a single agent-scoped timeline for `synthorg_agents_get_activity`. Uses `asyncio.TaskGroup` with per-source safe-default helpers so one failing tracker cannot abort the merge.
`AgentHealthService`	`src/synthorg/hr/health/service.py`	Derives a compact `AgentHealthReport` (`healthy` / `degraded` / `unavailable`) from the tightest populated `PerformanceTracker` window. Rejects reports where `recent_failed_count > recent_task_count` via a cross-field validator.
`AgentVersionService`	`src/synthorg/hr/identity/version_service.py`	Reads paged identity-version history for `synthorg_agents_get_history`. Lifted out of the REST controller so the MCP surface doesn't depend on HTTP request/response shapes.
`PersonalityService`	`src/synthorg/hr/personalities/service.py`	Thin facade over `PersonalityPresetService` for MCP list/get endpoints.
`ScalingDecisionService`	`src/synthorg/hr/scaling/decision_service.py`	Wraps the scaling decision repository + trigger. MCP tools list paged decisions, look up a specific one, read current config, and trigger an evaluation.
`TrainingService` (extended)	`src/synthorg/hr/training/service.py`	Already owned the training pipeline; now additionally owns a bounded in-memory session store (FIFO, cap 500) used by `synthorg_training_list_sessions` / `_get_session` / `_start_session`.

HR & Agent Lifecycle¶

Authority: role + reporting graph¶

Role Catalog¶

Dynamic Roles¶

Hiring Process¶

Training Mode¶

Pruning¶

Dynamic Scaling¶

Architecture¶

Strategies¶

Guard Chain¶

Configuration¶

Dashboard¶

Boot wiring and rollout¶

Firing / Offboarding¶

Performance Tracking¶

Evaluation Loop¶

Agent Evolution¶

Architecture¶

Pluggable Axes¶

Identity Version Store¶

Performance Inflection Events¶

Safe Defaults¶

Configuration¶

Five-Pillar Evaluation Framework¶

HR Service Layer¶

See Also¶