
How I Run a Personal AI Team Without Chaos
Most people don't need a giant agent framework. They need a setup that works day to day.
This is the setup I use: five roles, clear handoffs, and one rule that matters most, safety before release.
The Five Roles
I work through a Coordinator bot, then route work to four specialist roles. In practice, I steer the Coordinator while shaping it to mirror more of my actions and thought process over time, increasing autonomy in controlled pieces. The point is simple: everyone has one job, so ownership stays obvious.
- Coordinator (bot): turns the request into a scoped task and assigns the right role
- Analyst: verifies context, defines success criteria, and surfaces unknowns
- Worker: builds the solution inside the approved scope
- Guardian: blocks risky actions and verifies quality before anything ships
- Archivist: records decisions, outcomes, and reusable patterns for future runs
Five Roles, Clear Boundaries
I let them design their own avatars via NanoBanana, so do not blame me.
Coordinator
routes work
Traffic control and clear handoffs.
Worker
executes
Builds code and ships outputs using Codex + Gemini.
Analyst
verifies
Checks facts, compares options.
Guardian
protects
Enforces approval and risk gates.
Archivist
remembers
Turns outcomes into durable team memory.
How a Request Moves
A typical request moves like this:
- Coordinator receives the request and sets scope.
- Analyst confirms what "done" means and flags risks early.
- Guardian runs preflight checks before any build starts.
- Worker implements only inside the approved scope.
- Guardian verifies quality and safety before release.
- Archivist captures what happened so the next run starts with less guesswork.
The Actual Prompt Contracts
This is where most articles get vague. The value is in the handoff prompts.
Coordinator dispatch
{
"instructions": {
"system": "You are CoordinatorBot. You do not write implementation code.",
"role": "Create a task packet for exactly one downstream role.",
"user_request": "Make /status_all return strict pass/fail for Worker, Guardian, Archivist, and include timestamp."
},
"task_id": "coord-2026-03-12-001",
"objective": "Implement strict machine-readable status_all response format.",
"acceptance_criteria": [
"Response includes worker|guardian|archivist|coordinator with pass|fail",
"Includes ISO timestamp",
"Backward-compatible human-readable summary still present"
],
"constraints": [
"No breaking API changes",
"No secrets in logs",
"Keep response under 2KB"
],
"assignee": "analyst",
"risk_level": "medium"
}
Analyst output contract
{
"task_id": "coord-2026-03-12-001",
"facts": [
"Current status_all mixes human prose with machine status lines",
"Automation parser fails when role wording changes",
"Coordinator process is online and receiving Telegram commands"
],
"unknowns": [
"Whether downstream consumers require legacy prose fields",
"Whether all roles always report heartbeat data"
],
"recommended_scope": "Add deterministic checks object with pass/fail + last_check; retain existing summary text.",
"risks": [
"Parser regressions if field names change later",
"False negatives when a role is intentionally idle"
],
"confidence": 0.86
}
Worker output contract
{
"task_id": "coord-2026-03-12-001",
"implementation": "Added checks payload to status_all handler with fixed keys and enum pass|fail; preserved existing summary block.",
"tests_run": [
"unit: status formatter returns stable keys",
"integration: Telegram command -> coordinator -> response shape",
"regression: legacy summary still rendered"
],
"artifacts": [
"src/agents/coordinator/status-all.ts",
"scripts/tests/status-all-format.test.ts"
],
"open_issues": [
"Need one follow-up to expose check latency metric"
]
}
Guardian decision contract
{
"task_id": "coord-2026-03-12-001",
"status": "approved|rejected|needs_changes",
"security_findings": [
"PID value exposed in public channel response"
],
"policy_findings": [
"No explicit timeout guard for role health checks"
],
"required_changes": [
"Redact PID from non-admin responses",
"Add 2s timeout + fail-safe status for missing heartbeat"
]
}
Archivist memory write contract
{
"task_id": "coord-2026-03-12-001",
"decision_summary": "Rolled out strict status contract after guardian-mandated PID redaction and timeout guard.",
"why_it_worked": [
"Stable schema reduced parser ambiguity",
"Backward compatibility prevented client breakage",
"Human approval gate blocked unsafe output"
],
"reusable_patterns": [
"Dual-format responses (machine block + human summary)",
"Guardian preflight on observability endpoints",
"Contract tests for chat-command outputs"
],
"rollback_notes": [
"Revert to legacy formatter if Telegram parser error rate > 2% for 15m"
]
}
Do Agents Need Skill Reminders?
Short answer: not every turn, but they do need a reliable skill-loading rule.
- If your runtime injects a skills manifest at startup, agents are usually aware of what they can use.
- If you need strict behavior, include allowed skills in each task packet.
- For high-risk work, hard-code skill allowlists per role.
I use this rule of thumb:
- Coordinator can reference many skills, but only dispatches.
- Worker gets build/test/tooling skills.
- Analyst gets research/verification skills.
- Guardian gets safety/audit skills.
- Archivist gets memory/documentation skills.
That keeps the system predictable without repeating giant skill lists in every message.
What Makes This Production-Grade
This setup works because the boundaries are explicit:
- Roles are clear.
- Handoffs are structured.
- Risky actions are gated.
- Every important step is traceable.
Safe Self-Improvement Per Agent
This is where most systems drift if you are not careful.
A useful reference is self-improving-agent, but only with strict controls.
Rule: Agents can propose improvements, but they cannot approve their own permission changes.
Each role should improve different things:
- Coordinator: routing heuristics and escalation logic
- Analyst: source ranking and uncertainty reporting
- Worker: decomposition, retries, and output robustness
- Guardian: deny rules and policy tests
- Archivist: memory quality, tagging, and retrieval cues
Safe rollout:
- Propose change.
- Run automated checks.
- Guardian reviews risk.
- Human approves.
- Roll out gradually.
- Keep rollback ready.
Implementation Example (TypeScript)
import { z } from "zod";
import { execFile } from "node:child_process";
import { promisify } from "node:util";
const execFileAsync = promisify(execFile);
const ImprovementProposal = z.object({
proposalId: z.string().min(1),
role: z.enum(["coordinator", "analyst", "worker", "guardian", "archivist"]),
changeType: z.enum(["prompt", "routing", "test_policy", "memory_strategy"]),
diff: z.string().min(1),
expectedImpact: z.string().min(1),
riskLevel: z.enum(["low", "medium", "high"]),
requiresPermissionChange: z.boolean().default(false),
});
type Proposal = z.infer<typeof ImprovementProposal>;
type Decision = {
status: "approved" | "rejected";
reason: string;
reviewer: "guardian" | "human";
};
type TelegramUpdate = {
update_id: number;
callback_query?: {
id: string;
data?: string;
message?: { message_id: number; chat: { id: number } };
};
};
async function runCheck(
name: string,
command: string,
args: string[],
): Promise<{ name: string; passed: boolean; output?: string }> {
try {
await execFileAsync(command, args, {
timeout: 8 * 60 * 1000,
maxBuffer: 1024 * 1024 * 4,
});
return { name, passed: true };
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return { name, passed: false, output: message };
}
}
async function runAutomatedChecks(p: Proposal): Promise<{ ok: boolean; failures: string[] }> {
const failures: string[] = [];
const checks: Array<{ name: string; command: string; args: string[] }> = [
{ name: "type-check", command: "pnpm", args: ["type-check"] },
{ name: "unit-tests", command: "pnpm", args: ["test:unit"] },
{ name: "contract-tests", command: "pnpm", args: ["test:contracts"] },
{ name: "policy-lint", command: "pnpm", args: ["lint:policy"] },
];
for (const check of checks) {
const result = await runCheck(check.name, check.command, check.args);
if (!result.passed) {
failures.push(result.name);
}
}
return { ok: failures.length === 0, failures };
}
function guardianReview(p: Proposal): Decision {
if (p.requiresPermissionChange) {
return {
status: "rejected",
reason: "Agents cannot self-approve permission changes.",
reviewer: "guardian",
};
}
if (p.riskLevel === "high") {
return {
status: "rejected",
reason: "High-risk proposals require explicit human approval first.",
reviewer: "guardian",
};
}
return { status: "approved", reason: "Passed policy gate.", reviewer: "guardian" };
}
async function requestHumanApproval(p: Proposal): Promise<Decision> {
const botToken = process.env.TELEGRAM_BOT_TOKEN;
const chatId = process.env.TELEGRAM_CHAT_ID;
if (!botToken || !chatId) {
return {
status: "rejected",
reason: "Missing TELEGRAM_BOT_TOKEN or TELEGRAM_CHAT_ID.",
reviewer: "human",
};
}
const text = [
"Approval request: canary rollout",
`Proposal: ${p.proposalId}`,
`Role: ${p.role}`,
`Type: ${p.changeType}`,
`Risk: ${p.riskLevel}`,
`Impact: ${p.expectedImpact}`,
].join("\n");
await fetch(`https://api.telegram.org/bot${botToken}/sendMessage`, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({
chat_id: chatId,
text,
reply_markup: {
inline_keyboard: [[
{ text: "Approve", callback_data: `approve:${p.proposalId}` },
{ text: "Reject", callback_data: `reject:${p.proposalId}` },
]],
},
}),
});
const timeoutMs = 5 * 60 * 1000;
const start = Date.now();
let offset = 0;
while (Date.now() - start < timeoutMs) {
const poll = await fetch(
`https://api.telegram.org/bot${botToken}/getUpdates?timeout=25&offset=${offset}`,
);
const json = (await poll.json()) as { ok: boolean; result: TelegramUpdate[] };
if (!json.ok) continue;
for (const update of json.result) {
offset = update.update_id + 1;
const cb = update.callback_query;
const data = cb?.data;
if (!data) continue;
const [action, proposalId] = data.split(":");
if (proposalId !== p.proposalId) continue;
if (action === "approve") {
return { status: "approved", reason: "Human approved in Telegram DM.", reviewer: "human" };
}
if (action === "reject") {
return { status: "rejected", reason: "Human rejected in Telegram DM.", reviewer: "human" };
}
}
}
return {
status: "rejected",
reason: "Approval timed out in Telegram DM.",
reviewer: "human",
};
}
async function canaryRollout(p: Proposal): Promise<{ promoted: boolean; rollbackNote?: string }> {
// Replace with deploy logic + metric gates.
const metricsHealthy = true;
if (!metricsHealthy) {
return { promoted: false, rollbackNote: "Canary metrics regressed; rolled back." };
}
return { promoted: true };
}
export async function processSelfImprovement(raw: unknown) {
const proposal = ImprovementProposal.parse(raw);
const checks = await runAutomatedChecks(proposal);
if (!checks.ok) return { ok: false, stage: "checks", failures: checks.failures };
const guardian = guardianReview(proposal);
if (guardian.status === "rejected") return { ok: false, stage: "guardian", decision: guardian };
const human = await requestHumanApproval(proposal);
if (human.status === "rejected") return { ok: false, stage: "human", decision: human };
const rollout = await canaryRollout(proposal);
return rollout.promoted
? { ok: true, stage: "complete" }
: { ok: false, stage: "rollout", note: rollout.rollbackNote };
}
This is the core model:
- proposals are structured
- checks are automatic
- Guardian gates risk
- human approval is mandatory for promotion
- rollout is canary-first with rollback
Never allow:
- self-granted privileges
- unreviewed tool installation
- self-edits to safety policy
- irreversible actions outside approval gates
Where This Is Headed
If you made it this far, you have the full core pattern: role clarity, strict handoffs, safety gates, and canary-first rollout.
I am still actively building this system, so this write-up is intentionally in progress. Next updates will add real production traces, failure cases, and tighter operational guardrails from live runs.
For now, treat this as the current blueprint, not the finished playbook.