Operations¶
Deployment¶
Prerequisites¶
- Dell k3s cluster accessible via Tailscale (
100.95.212.93) - vCluster Platform running (
vcluster.invotek.no) - FluxCD installed on host cluster
- SealedSecrets controller available
Create Rig vCluster¶
# SSH to Dell
ssh -i ~/.ssh/dell-stig-1 claude@100.95.212.93
# Create vCluster via vCluster Platform CLI or API
vcluster create rig --namespace rig-vcluster
# Get kubeconfig
vcluster connect rig --namespace rig-vcluster -- kubectl get ns
Bootstrap FluxCD¶
# In rig-gitops repo
flux bootstrap github \
--owner=Stig-Johnny \
--repository=rig-gitops \
--branch=main \
--path=clusters/rig \
--personal
Deploy Components¶
FluxCD reconciles from rig-gitops:
rig-gitops/
├── clusters/rig/
│ └── flux-system/ # FluxCD bootstrap
├── base/
│ ├── namespaces.yaml # conductor, dev-agents
│ └── postgres/ # PostgreSQL StatefulSet
├── apps/
│ ├── atl-e/ # Conductor Deployment + Service
│ └── dev-e/ # Dev-E Deployment
└── secrets/
├── conductor-secrets.yaml # SealedSecret
└── dev-e-secrets.yaml # SealedSecret
└── ghcr-pull.yaml # Image pull secret
Monitoring¶
Health Checks¶
# Conductor health
curl https://rig.dashecorp.com/health
# Agent status
curl https://rig.dashecorp.com/api/agents
# Queue status
curl https://rig.dashecorp.com/api/queue
Logs¶
# SSH to Dell, connect to vCluster
ssh -i ~/.ssh/dell-stig-1 claude@100.95.212.93
# Conductor logs
kubectl logs -n conductor -l app=atl-e-conductor --tail=50
# Dev-E logs
kubectl logs -n dev-agents -l app=dev-e --tail=50
# PostgreSQL logs
kubectl logs -n conductor -l app=postgres --tail=30
Discord Notifications¶
Conductor-E posts to these channels:
| Channel | What |
|---|---|
| #tasks | Assignments, PR created, merged |
| #dev-e | Dev-E progress, errors |
| #admin | Escalations, human gates, agent offline |
Troubleshooting¶
Dev-E Not Picking Up Work¶
- Check agent health:
curl rig.dashecorp.com/api/agents - Check pod status:
kubectl get pods -n dev-agents - Check logs:
kubectl logs -n dev-agents -l app=dev-e --tail=20 - Check Claude OAuth token expiry (deadline-tracker #17)
Assignment Stuck¶
- Check Conductor queue:
curl rig.dashecorp.com/api/queue - Check if issue has unmet dependencies
- Check if issue is human-gated
- Check assignment status in Conductor logs
Claude Code Hangs¶
Dev-E has a per-issue timeout (default: 30 minutes). If Claude Code hangs:
- Pod restarts via liveness probe failure
- Conductor-E marks assignment as failed
- On second failure: escalates to human
Review Loop¶
If Dev-E keeps iterating on review feedback without converging:
- Max review iterations: 3
- After 3 iterations without approval: mark as failed, escalate
Database Recovery¶
PostgreSQL uses a PVC. If data is lost:
# Connect to PostgreSQL
kubectl exec -it -n conductor postgres-0 -- psql -U conductor
# Check state
SELECT * FROM assignments WHERE status = 'assigned';
SELECT * FROM agents;
Updating Components¶
Update Conductor-E¶
- Push changes to
Stig-Johnny/atl-agent→ CI builds image - Update image tag in
rig-gitops/apps/atl-e/deployment.yaml - Push to
rig-gitops→ FluxCD reconciles
Update Dev-E¶
- Push changes to
Stig-Johnny/dev-e→ CI builds image - Update image tag in
rig-gitops/apps/dev-e/deployment.yaml - Push to
rig-gitops→ FluxCD reconciles
Rotate Secrets¶
# Generate new SealedSecret
kubeseal --format yaml < secret.yaml > sealed-secret.yaml
# Commit to rig-gitops
git add secrets/ && git commit -m "chore: rotate secrets" && git push
Cloudflare Access¶
Dashboard at rig.dashecorp.com is protected by Cloudflare Access:
- Policy: Allow
invotekas@gmail.com - Tunnel: Configured via
cluster-gitopsCloudflare Tunnel config - Zero Trust: No public access without authentication