Design a Loop Get the tool
CASE STUDY · THREE OF THE MOST-USED OPEN-SOURCE AI AGENTS

Same loop.
Wildly different harnesses.

Hermes Agent, Kilo Code, and OpenClaw are three of the most-used open-source AI agents running today. Each one's own documentation describes its core loop as a conventional, commodity pattern — and each one is right. Here's what's actually identical, and what isn't. Read the harness-first version on Build A Harness →

3/3 projects that describe their own loop as "conventional" in their own docs — in nearly identical language
8/11 harness layers where none of the three reaches full coverage, mapped against the canonical 11-layer architecture
0/3 ship a formal output/reviewer-pass quality gate before a reply goes out

Three agents, three very different scales.

Each project reports its own headline numbers in its own units — stars, users, tokens — so the table below shows each on its own terms rather than forcing a false comparison.

Hermes Agent
MakerNous Research
LaunchedFeb 2026
LicenseMIT
GitHub stars~140,000 (in 3 mo.)
OpenRouter volume770B tokens/day
Primary interfaceCLI · messaging · IDE · API
Kilo Code
MakerKilo (fork of Roo/Cline)
LaunchedMar 2025
LicenseOpen source + hosted gateway
Users1.5M+ (3M+ "Kilo Coders")
Tokens processed40T+ (cumulative)
OpenRouter volume235B tokens/day
Primary interfaceVS Code · JetBrains · CLI · mobile
OpenClaw
MakerPeter Steinberger
LaunchedNov 2025 (as Clawdbot)
LicenseMIT
GitHub stars~247,000
OpenRouter volume161B tokens/day
Primary interfaceMessaging apps · device nodes
OpenRouter daily token volume — global app rankings
Hermes Agent
770B/day
Kilo Code
235B/day
OpenClaw
161B/day
Live figures from OpenRouter's global app rankings (openrouter.ai/apps), accessed July 2026 — all three now report through OpenRouter, correcting an earlier assumption on this page that Kilo Code was gateway-only. Hermes overtook OpenClaw on this metric back in mid-May 2026 and has grown substantially since. For context, Claude Code (not one of the three agents covered here) ranks between Kilo Code and OpenClaw at 218B/day.

Mapped against the seven-stage loop.

Rather than inventing a one-off comparison, each agent's documented behavior is mapped against the same seven-stage loop used across this site — so the comparison means the same thing here as everywhere else on Design a Loop.

present — a real, documented mechanism partial — an ad hoc or static substitute not described in public docs
Loop stage Hermes Agent Kilo Code OpenClaw
1. Perceive
(World Model)
Raw session history (SQLite) — no typed beliefs or contradiction detection Memory Bank files (context.md / brief.md / history.md) — static, no contradiction detection Static bootstrap files (AGENTS.md / SOUL.md / IDENTITY.md) — persona context only
2. Reason
(Evidence & Hypothesis)
Not described — model reasons directly over assembled context Not described — single tool call per turn, no hypothesis step Not described — model reasons directly over the assembled prompt
3. Decide
(Control State)
Provider/error fallback only — no tiered risk resolver Binary human-approval gate per action type substitutes for tiers Session/global queueing prevents races — not a risk-tier resolver
4. Act
(Execution)
70+ tools, ~28 toolsets, concurrent dispatch — no pre-action risk review Git-based checkpoint before every edit — built-in reversibility Broadest action surface (device/OS tools) — no described pre-action gating
5. Verify Not described Informal "nothing looks broken" check, plus diff-repair as a syntactic self-check Reply shaping filters/dedupes text — cosmetic, not a correctness check
6. Recover Backup-provider fallback on 429/5xx/401; fixed 90-turn budget as a stop condition Checkpoint rollback; diff-repair recovers malformed edits Auto-compaction can trigger a retry — no named strategies beyond that
7. Learn & Repeat
(Memory)
Persistent SQLite+FTS5 across sessions; self-improving skills abstracted from successful runs Memory Bank persists project context; context condensing as the window fills Raw JSONL transcript only — bootstrap files are static, explicitly no self-improvement

Reading the pattern: Execution and Memory/Learning are the only two stages where any agent reaches full (●) coverage — Kilo's checkpoints and Hermes's persistent memory, respectively. Reasoning is not described by any of the three. This isn't a knock on any one project; it's a reasonably faithful reading of what's in their own public documentation, and it's exactly the gap loop engineering as a discipline is meant to close.

See each agent's full, unsimplified step list
Hermes Agent — 8 steps
  1. Generate a task ID, append the user message
  2. Build or reuse the cached system prompt
  3. Check whether preflight compression is needed (>50% context full)
  4. Build API messages per API mode
  5. Inject ephemeral prompt layers (budget/context warnings)
  6. Apply prompt caching markers (Anthropic)
  7. Make an interruptible API call
  8. Parse response: execute tool calls and loop to step 4, or persist + return
Kilo Code — 7 steps
  1. User gives a task, selects an agent mode (code/ask/plan/debug)
  2. Kilo assembles context: system prompt, auto-found files, @-mentions, Memory Bank
  3. Prompt + tools sent to the configured model via the Kilo Gateway
  4. Model replies with reasoning plus typically one tool call
  5. State-changing actions pause for user approval unless auto-approved
  6. Approved action executes; result captured
  7. If complete, present for review; else append observation and loop to step 3
OpenClaw — 7 steps
  1. Intake — Gateway RPC/CLI validates params, returns {runId, acceptedAt} immediately
  2. Queue & resolve — serialize via per-session + global queues, resolve model/auth
  3. Session & workspace prep — resolve workspace, load skills, inject bootstrap files
  4. Prompt assembly — base + skills + bootstrap context, token limits enforced
  5. Model inference ⇄ tool execution — stream text/tool-call deltas, execute, repeat
  6. Reply shaping — assemble final payload, filter internal tokens
  7. Persist & emit — write transcript to JSONL, emit lifecycle:end
Hermes Agent docs

"On its own, this loop is a fairly conventional ReAct-style tool-calling loop — structurally similar to what Claude Code, OpenClaw, and most other agent harnesses do..."

Kilo Code docs

"On its own, this is a conventional ReAct-style tool-calling loop — structurally the same shape used by Cline, Roo Code, Claude Code, Cursor, and most other agent harnesses."

OpenClaw docs

"On its own, this is a conventional ReAct-style tool-calling loop... The loop mechanics are not OpenClaw's differentiator."

Mapped against the eleven-layer harness.

Same method as the loop: each agent's harness is mapped against the same eleven canonical layers used across Build A Harness, rather than an ad hoc feature list.

present partial not described
Harness layer Hermes Agent Kilo Code OpenClaw
1. Caller State Not described Agent "modes" (code/ask/plan/debug) fix tools + instructions per task Messaging channel/persona selects context — not a formal constraint system
2. World Model Raw history only Static Memory Bank files Static persona/bootstrap files
3. Reasoning Not described Not described Not described
4. Control State Not described Binary approval gate, not a tiered resolver Queueing prevents races, isn't a risk control
5. Planning Subagents get their own capped budget via delegate_task — no full dependency graph Subagent delegation for isolated subtasks Not described
6. Execution 70+ tools, ~28 toolsets, 6 backends, concurrent dispatch Git-snapshot checkpoints + diff-repair + tool-call translation Broadest action surface (device/OS tools), weakly gated
7. Verification Not described Informal completion check + diff-repair Not described
8. Recovery Provider fallback, fixed iteration budget Checkpoint rollback, diff-repair Auto-compaction retry only
9. Memory SQLite+FTS5, context compression with a 20-message floor Memory Bank, context condensing Raw JSONL transcript, token-limit + compaction reserve
10. Learning Self-improving skills — the project's headline differentiator Not described Explicitly static — no self-improvement
11. Output & Reviewer Pass Not described Not described — human review substitutes Reply shaping is cosmetic, not a quality gate

Reading the pattern: Execution, Memory, and Learning are the only layers where any agent reaches full coverage — Kilo's checkpoints, and Hermes's persistent memory and self-improving skills, respectively. Reasoning and Output & Reviewer Pass get zero coverage across all three. The rest — Caller State, World Model, Control State, Planning, Verification, and Recovery — are partial at best everywhere: never fully absent, never fully built out either. Which is a fair description of where most production agents are today, popular or not.

What it's actually like to run one.

Beyond the layer-by-layer mapping, each project reads differently in practice:

Practical dimension Hermes Agent Kilo Code OpenClaw
Model / provider breadth 18+ providers, open-weight models 500+ models, 60+ providers via the Kilo Gateway Claude, GPT, DeepSeek, or local — OAuth piggyback on existing subscriptions
Primary reach CLI, 20 messaging platforms, IDE (ACP), API, cron VS Code, JetBrains, CLI, cloud agent, mobile WhatsApp, Telegram, Discord, Signal, iMessage, Slack, Teams, native device apps
Business model Free, MIT; runs on a few-dollar-a-month VPS Free/BYOK or Kilo credits, zero markup on provider rates; $8M seed round Free, MIT, self-hosted; foundation stewardship after the founder joined OpenAI
Known risk Newest of the three (Feb 2026) — memory and self-improving skills lack a long track record Parsing reliability is a constant fight across a 500-model matrix — not a novel loop, an inherited one RCE CVE (CVSS 8.8), 1,000+ malicious marketplace skills reported, prompt-injection susceptible; restricted for state use in China
The takeaway

Every row in this table is a harness decision, not a loop decision. That's the whole thesis of loop engineering: design the cycle deliberately, then spend your real engineering effort on what wraps around it.

Questions, answered.

What do Hermes Agent, Kilo Code, and OpenClaw have in common?

All three run a conventional ReAct-style tool-calling loop: call the model, get tool calls, execute them, append the results, repeat until the model returns plain text. Each project's own documentation says as much, in nearly identical language.

If the loops are the same, why are these products so different?

Because the differentiation lives entirely in the harness: cross-session memory, platform reach, model-matrix reliability, business model, and security posture. See the harness comparison above.

Is one of these three loops better engineered than the others?

Not meaningfully. All three collapse to the same four moves with different names and step counts for the same operations. The engineering effort that shows up in outcomes went into the harness, not the loop.

Where does Build A Harness fit in?

Build A Harness is the open-source tool for designing that harness layer deliberately — world model, control state, verification, recovery — as drawable canvas nodes, instead of accumulating it ad hoc the way all three of these projects did.

Primary documentation.

This comparison draws on each project's own architecture docs and independent reporting, current as of mid-2026. Token-volume figures are live from OpenRouter's global app rankings, accessed July 2026 — those numbers move daily and will drift from this snapshot.

The layer-by-layer coverage mapping above is our own reading of each project's public documentation against Build A Harness's canonical taxonomy, not an official audit or a claim endorsed by any of the three projects.

Don't let your harness
happen by accident.

Three of the most-used agents in production prove the point: the loop is commodity, the harness is everything. Build A Harness implements the harness layer as drawable, composable nodes, so you design it on purpose from the start.