Architecture¶

System Layout¶

┌─────────────────── Dell k3s (100.95.212.93) ──────────────────┐
│                                                                │
│  rig-conductor namespace                                         │
│  ┌──────────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ rig-conductor  │  │ Valkey   │  │ Postgres │  │  Cost    │  │
│  │ API (.NET)   │  │ (Redis)  │  │ (Marten) │  │Dashboard │  │
│  │ - webhooks   │  │ - streams│  │ - events │  │          │  │
│  │ - merge      │  │ - signals│  │ - logs   │  │          │  │
│  │ - dashboard  │  │ - session│  │ - costs  │  │          │  │
│  └──────┬───────┘  └─────┬────┘  └──────────┘  └──────────┘  │
│         │                │                                     │
│  dev-e namespace (KEDA scale-to-zero)                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │ Dev-E (Node) │  │Dev-E (Dotnet)│  │Dev-E (Python)│         │
│  │ :node image  │  │ :dotnet image│  │ :python image│         │
│  │ 0-1 replicas │  │ 0-1 replicas │  │ 0-1 replicas │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
│                                                                │
│  review-e namespace (KEDA scale-to-zero)                       │
│  ┌──────────────┐                                              │
│  │  Review-E    │                                              │
│  │ 0-1 replicas │                                              │
│  └──────────────┘                                              │
│                                                                │
│  keda namespace                                                │
│  ┌──────────────┐                                              │
│  │ KEDA 2.16    │  Watches signal:{agentId} lists              │
│  └──────────────┘                                              │
└────────────────────────────────────────────────────────────────┘

┌──── Mac Mini M4 ────┐     ┌──── Human Dev ──────┐
│ iBuild-E (launchd)  │     │ Claude Code / Codex  │
│ Polls rig-conductor   │     │ conductor-e-hook.sh  │
│ iOS/macOS tasks     │     │ Reports to Conductor │
└─────────────────────┘     └──────────────────────┘

Event-Driven Pipeline¶

Every action flows through rig-conductor's event store. No polling, no timers.

Issue labeled "agent-ready"
  │
  ▼ GitHub webhook
rig-conductor
  ├─ Records ISSUE_APPROVED event
  ├─ Reads .rig-agent.yaml → determines stack (node/dotnet/python/ios)
  ├─ Routes to agent: XADD assignments:dev-e-{stack} + LPUSH signal:dev-e-{stack}
  ├─ Records ISSUE_ASSIGNED event
  │
  ▼ KEDA detects signal (LLEN > 0) → scales 0→1
Dev-E pod starts (~25s)
  ├─ Consumes from Valkey stream (XREADGROUP)
  ├─ Creates execution log in rig-conductor
  ├─ Runs Claude CLI with issue prompt
  ├─ Clones repo, creates branch, implements, tests, commits, pushes
  ├─ Creates PR with "Closes #N"
  │
  ▼ GitHub webhook (pull_request.opened)
rig-conductor
  ├─ Records PR_CREATED event (links PR to issue)
  ├─ Records REVIEW_ASSIGNED event
  ├─ Routes to Review-E: XADD assignments:review-e + LPUSH signal:review-e
  │
  ▼ KEDA wakes Review-E
Review-E pod starts
  ├─ Reviews PR diff
  ├─ Approves or requests changes
  │
  ├─ If APPROVED:
  │   ├─ Stream consumer detects "approved" in output
  │   ├─ Posts REVIEW_PASSED event
  │   ├─ Calls POST /api/merge
  │   │
  │   ▼ rig-conductor merges
  │   ├─ Waits for CI clean (polls mergeable_state)
  │   ├─ Checks for do-not-merge label
  │   ├─ Squash merges via GitHub API
  │   ├─ Records MERGED + ISSUE_DONE events
  │
  ├─ If CHANGES_REQUESTED:
  │   ├─ Records REVIEW_DISPUTED event
  │   ├─ Routes back to Dev-E with review feedback
  │   ├─ Clears review dedup (allows re-review after fix)
  │   ▼ Dev-E iterates on same branch, pushes fix
  │     └─ Webhook (synchronize) → re-routes to Review-E
  │
  ▼ KEDA cooldown (5 min) → scales 1→0

Repos¶

Repo	Visibility	Purpose
rig-agent-runtime	Public	Agent runtime — Dockerfiles, Helm chart, stream consumer, CLI providers
rig-conductor	Private	.NET API — event store, webhooks, dashboard, merge logic
rig-gitops	Private	FluxCD manifests — HelmReleases, KEDA ScaledObjects, secrets
rig-tools	Private	Developer hooks, workflow sync script, install.sh
infra	Private	Terraform — GitHub, Cloudflare, GCP, k8s config

Multi-Stack Images¶

All images extend rig-agent-runtime:base (git, gh, claude-cli, codex-cli, Node.js 22):

Image	Extra Tools	For
`:node`	TypeScript, Jest, ESLint, Prettier	JS/TS repos
`:dotnet`	.NET 10 SDK	C# repos
`:python`	Python 3, pytest, black, ruff	Python repos
`:base`	Core tools only	Default

Agents can install additional tools at runtime (npm, pip, apt-get).

Per-Repo Config¶

Each repo has .rig-agent.yaml:

stack: node          # which image to use
tools:               # extra tools to install
  - firebase-tools
testCommand: npm test
buildCommand: npm run build
escalate:
  - "needs Xcode (requires-macos)"

rig-conductor reads this on every assignment to determine routing.

KEDA Scale-to-Zero¶

Agents scale to zero when idle. Wake-up uses a signal list pattern:

rig-conductor publishes: XADD assignments:{agent} (work) + LPUSH signal:{agent} (wake signal)
KEDA watches LLEN signal:{agent} every 15 seconds
When LLEN > 0 → scales deployment 0→1
Agent starts, deletes signal key, processes work from stream
After 5 min idle → KEDA scales 1→0

This solves the chicken-and-egg problem with Redis Streams (stream scaler needs a consumer, but consumer is in the pod that's at 0).

Human Developer Integration¶

Humans using any AI tool report to rig-conductor via rig-tools:

# Install (one time)
git clone git@github.com:Stig-Johnny/rig-tools.git && cd rig-tools && ./install.sh

# Automatic for Claude Code (via hooks in settings.json)
# Manual for other tools:
conductor-e-hook WORK_STARTED
conductor-e-hook PR_CREATED --pr 42 --url https://...

Dashboard shows human developers alongside AI agents.

Dashboard¶

URL: https://rig-conductor.dashecorp.com/

Tabs: - Overview — Agent status (online/offline, provider, task), queue depth - Issues — All tracked issues with state, PR, agent, cost. Sortable + filterable. - Events — Live SSE event stream - Costs — Per-agent cost breakdown - Logs — Live agent CLI output (Valkey pub/sub → SSE)

Issue detail panel shows: - Event timeline (every webhook + agent event) - Cost per step - Execution runs (turns, tokens in/out/cache, duration, PR link)

Light/dark mode. Agent filter (online, 24h, 7d, all).

Execution Logs¶

Stored in Marten (PostgreSQL) as ExecutionLog documents:

Run: dev-e-node/5n72v · 12 Apr 09:20 · PR #131
Status: completed · 93s · $0.21 · 14 turns
Tokens: 1,234 in / 567 out / 10,921 cache-read / 4,305 cache-write

Steps:
  assigned   ✓ Assigned to dev-e-node
  implement  ✓ Created feature branch, implemented, pushed

Retention: raw logs 30 days, summaries 90 days.

Resource Requests¶

Single-node cluster rule: 1m CPU / 1Mi memory requests for all pods. No CPU limits (burstable). This ensures pods always schedule regardless of node pressure.

Credentials¶

Secret	Type	Rotation
Claude OAuth token	`sk-ant-oat01-...`	1-year, in Bitwarden
GitHub App PEM	Per-agent apps	Auto-refresh (1h tokens)
RELEASE_PAT	Review-E PAT	Expires 2026-05-30
GHCR pull secret	Container registry	Long-lived
Discord bot token	Per-agent bots	Long-lived