Skip to content

Operations

Deployment

Prerequisites

  • Dell k3s cluster accessible via Tailscale (100.95.212.93)
  • vCluster Platform running (vcluster.invotek.no)
  • FluxCD installed on host cluster
  • SealedSecrets controller available

Create Rig vCluster

# SSH to Dell
ssh -i ~/.ssh/dell-stig-1 claude@100.95.212.93

# Create vCluster via vCluster Platform CLI or API
vcluster create rig --namespace rig-vcluster

# Get kubeconfig
vcluster connect rig --namespace rig-vcluster -- kubectl get ns

Bootstrap FluxCD

# In rig-gitops repo
flux bootstrap github \
  --owner=Stig-Johnny \
  --repository=rig-gitops \
  --branch=main \
  --path=clusters/rig \
  --personal

Deploy Components

FluxCD reconciles from rig-gitops:

rig-gitops/
├── clusters/rig/
│   └── flux-system/          # FluxCD bootstrap
├── base/
│   ├── namespaces.yaml       # conductor, dev-agents
│   └── postgres/             # PostgreSQL StatefulSet
├── apps/
│   ├── atl-e/                # Conductor Deployment + Service
│   └── dev-e/                # Dev-E Deployment
└── secrets/
    ├── conductor-secrets.yaml  # SealedSecret
    └── dev-e-secrets.yaml      # SealedSecret
    └── ghcr-pull.yaml          # Image pull secret

Monitoring

Health Checks

# Conductor health
curl https://rig.dashecorp.com/health

# Agent status
curl https://rig.dashecorp.com/api/agents

# Queue status
curl https://rig.dashecorp.com/api/queue

Logs

# SSH to Dell, connect to vCluster
ssh -i ~/.ssh/dell-stig-1 claude@100.95.212.93

# Conductor logs
kubectl logs -n conductor -l app=atl-e-conductor --tail=50

# Dev-E logs
kubectl logs -n dev-agents -l app=dev-e --tail=50

# PostgreSQL logs
kubectl logs -n conductor -l app=postgres --tail=30

Discord Notifications

Conductor-E posts to these channels:

Channel What
#tasks Assignments, PR created, merged
#dev-e Dev-E progress, errors
#admin Escalations, human gates, agent offline

Troubleshooting

Dev-E Not Picking Up Work

  1. Check agent health: curl rig.dashecorp.com/api/agents
  2. Check pod status: kubectl get pods -n dev-agents
  3. Check logs: kubectl logs -n dev-agents -l app=dev-e --tail=20
  4. Check Claude OAuth token expiry (deadline-tracker #17)

Assignment Stuck

  1. Check Conductor queue: curl rig.dashecorp.com/api/queue
  2. Check if issue has unmet dependencies
  3. Check if issue is human-gated
  4. Check assignment status in Conductor logs

Claude Code Hangs

Dev-E has a per-issue timeout (default: 30 minutes). If Claude Code hangs:

  1. Pod restarts via liveness probe failure
  2. Conductor-E marks assignment as failed
  3. On second failure: escalates to human

Review Loop

If Dev-E keeps iterating on review feedback without converging:

  • Max review iterations: 3
  • After 3 iterations without approval: mark as failed, escalate

Database Recovery

PostgreSQL uses a PVC. If data is lost:

# Connect to PostgreSQL
kubectl exec -it -n conductor postgres-0 -- psql -U conductor

# Check state
SELECT * FROM assignments WHERE status = 'assigned';
SELECT * FROM agents;

Updating Components

Update Conductor-E

  1. Push changes to Stig-Johnny/atl-agent → CI builds image
  2. Update image tag in rig-gitops/apps/atl-e/deployment.yaml
  3. Push to rig-gitops → FluxCD reconciles

Update Dev-E

  1. Push changes to Stig-Johnny/dev-e → CI builds image
  2. Update image tag in rig-gitops/apps/dev-e/deployment.yaml
  3. Push to rig-gitops → FluxCD reconciles

Rotate Secrets

# Generate new SealedSecret
kubeseal --format yaml < secret.yaml > sealed-secret.yaml

# Commit to rig-gitops
git add secrets/ && git commit -m "chore: rotate secrets" && git push

Cloudflare Access

Dashboard at rig.dashecorp.com is protected by Cloudflare Access:

  • Policy: Allow invotekas@gmail.com
  • Tunnel: Configured via cluster-gitops Cloudflare Tunnel config
  • Zero Trust: No public access without authentication