How I Run a DeepAgents Personal AI Team Without Chaos

How I Run a DeepAgents Personal AI Team Without Chaos

Most teams do not need a giant agent framework. They need a system that can run every day without surprises.

This is the one I use. It has one Mayor plus five specialist roles, clear handoffs, and one hard rule: nothing ships without safety checks.

Recent updates (March 2026)

This guide is a live document. Here are the major updates and other modifications now running in practice:

  • Interactive scoping: the Analyst can return multiple-choice clarifying questions when a request is vague.
  • Mid-run clarifications: the Worker pauses and asks one direct question instead of guessing.
  • Provider-flexible model config: model provider and model name are env-driven (google_genaiCopy, openaiCopy, anthropicCopy, xaiCopy, openai_compatibleCopy).
  • Startup signal update: Telegram subscribers now get a clear "Village is Online" boot message.
  • Memory changes: conversation history and semantic memory now flow into analysis, followed by periodic fact consolidation.
  • Verification hardening: incomplete execution output gets rejected before any completion claim.

Current stack: LangChain -> LangGraph -> DeepAgentsCopy (github.com/langchain-ai/deepagents).

The team roles

I use a LangGraph Mayor plus five specialist roles. The goal is simple: each role owns one class of work, so responsibility stays obvious.

  • Mayor: scopes work, manages state, and routes to exactly one role at a time.
  • Analyst: verifies context, defines success criteria, and surfaces unknowns.
  • Worker: implements inside approved scope.
  • Guardian: blocks risky output and enforces quality gates.
  • Verifier: checks whether the Worker actually completed required execution before archival.
  • Archivist: records what happened so future runs start with better context.

Six Roles, Clear Boundaries

I let them design their own avatars via NanoBanana, so do not blame me.

Mayor

routes work

Traffic control and clear handoffs.

/google-workspace-mcp/agent-team-orchestration

Worker

executes

Builds code and ships outputs using Codex + Gemini.

/gemini/codex-orchestrator/agent-browser/google-workspace-mcp/obsidian-sync

Analyst

verifies

Checks facts, compares options.

/web-search/tavily-search/agent-browser

Guardian

protects

Enforces approval and risk gates.

/security-monitor/exec-auditor/system-health

Verifier

validates

Checks execution completeness and test outcomes before archival.

/verification-before-completion/test-driven-development/systematic-debugging

Archivist

remembers

Turns outcomes into durable team memory.

/memory-log/decision-journal/knowledge-handoff

Village role handoff diagram with Mayor, analyst, guardian, worker, verifier, and archivist

How a request moves through the system

A normal request looks like this:

  1. Mayor receives the request and sets scope.
  2. Analyst defines "done" and flags risks.
  3. Guardian runs preflight checks.
  4. Worker builds only inside approved boundaries.
  5. Verifier checks output quality and execution completeness.
  6. Archivist writes decisions and outcomes to memory.

No role skips the line.

Prompt contracts that keep handoffs clean

Most write-ups get handoffs wrong by staying abstract. The contracts are what make this reliable.

Mayor dispatch

{
  "task_id": "coord-2026-03-12-001",
  "objective": "Implement strict machine-readable /status_all response",
  "acceptance_criteria": [
    "Include worker|guardian|verifier|archivist|mayor pass|fail",
    "Include ISO timestamp",
    "Keep backward-compatible human summary"
  ],
  "assignee": "analyst",
  "risk_level": "medium"
}

Analyst output

{
  "task_id": "coord-2026-03-12-001",
  "facts": [
    "Current /status_all mixes prose and machine status lines",
    "Parser fails when role wording drifts",
    "Mayor process is online and handling Telegram commands"
  ],
  "unknowns": [
    "Do all consumers need legacy prose fields?",
    "Do all roles always report heartbeat data?"
  ],
  "recommended_scope": "Add deterministic checks object with pass|fail + last_check"
}

Worker output

{
  "task_id": "coord-2026-03-12-001",
  "implementation": "Added stable checks payload to /status_all; preserved summary block",
  "tests_run": [
    "unit: stable formatter keys",
    "integration: Telegram command to response shape",
    "regression: legacy summary still renders"
  ]
}

Guardian decision

{
  "task_id": "coord-2026-03-12-001",
  "status": "needs_changes",
  "security_findings": ["PID exposed in public response"],
  "required_changes": [
    "Redact PID for non-admin views",
    "Add timeout + fail-safe for missing heartbeat"
  ]
}

Archivist memory write

{
  "task_id": "coord-2026-03-12-001",
  "decision_summary": "Rolled out strict status contract after PID redaction and timeout guard",
  "reusable_patterns": [
    "Dual-format responses: machine block + human summary",
    "Guardian preflight for observability endpoints",
    "Contract tests for chat-command outputs"
  ]
}

Do agents still need skill reminders?

In my setup, yes.

Startup manifests help, but they are not enough for reliable execution. Every task packet includes explicit skill scope, and high-risk tasks run with strict allowlists.

Practical role split:

  • Mayor: planning and routing skills only.
  • Analyst: research and verification skills.
  • Worker: build and testing skills.
  • Guardian: safety and audit skills.
  • Archivist: memory and documentation skills.

That keeps behavior predictable without dumping giant skill lists into every message.

Guardrails that actually matter

These are non-negotiable in my current workflow:

  • Secrets come from op runCopy and op://Copy references, not plaintext .envCopy values.
  • Config changes get syntax and compatibility checks before any restart.
  • External actions require explicit approval.
  • Risky outputs do not pass without Guardian review.

Safe self-improvement loop

Agents can propose improvements. They cannot approve their own permission changes.

I use this rollout sequence every time:

  1. Propose change.
  2. Run automated checks.
  3. Guardian risk review.
  4. Human approval.
  5. Canary rollout.
  6. Roll back immediately if metrics regress.

TypeScript example

import { z } from "zod";

const ImprovementProposal = z.object({
  proposalId: z.string().min(1),
  role: z.enum(["mayor", "analyst", "worker", "guardian", "verifier", "archivist"]),
  changeType: z.enum(["prompt", "routing", "test_policy", "memory_strategy"]),
  diff: z.string().min(1),
  expectedImpact: z.string().min(1),
  riskLevel: z.enum(["low", "medium", "high"]),
  requiresPermissionChange: z.boolean().default(false),
});

function guardianReview(p: z.infer<typeof ImprovementProposal>) {
  if (p.requiresPermissionChange) {
    return { status: "rejected", reason: "No self-approved permission changes." };
  }
  if (p.riskLevel === "high") {
    return { status: "rejected", reason: "High-risk changes require explicit human approval." };
  }
  return { status: "approved" };
}

Why this setup works

It is not magic. It is discipline.

  • Roles are clear.
  • Handoffs are structured.
  • Safety gates are enforced.
  • Decisions are traceable.

That is what turns an agent demo into a system you can trust on a normal workday.

What I am updating next

I am still evolving this system. Next revisions will add:

  • more real production traces,
  • concrete failure-case writeups,
  • and tighter operational guardrails from live runs.

For now, this is the current playbook I actually use.