Hermes Agent, Kilo Code, and OpenClaw are three of the most-used open-source AI agents running today. Each one's own documentation describes its core loop as a conventional, commodity pattern — and each one is right. Here's what's actually identical, and what isn't. Read the harness-first version on Build A Harness →
Each project reports its own headline numbers in its own units — stars, users, tokens — so the table below shows each on its own terms rather than forcing a false comparison.
Rather than inventing a one-off comparison, each agent's documented behavior is mapped against the same seven-stage loop used across this site — so the comparison means the same thing here as everywhere else on Design a Loop.
| Loop stage | Hermes Agent | Kilo Code | OpenClaw |
|---|---|---|---|
| 1. Perceive (World Model) |
○Raw session history (SQLite) — no typed beliefs or contradiction detection | ◐Memory Bank files (context.md / brief.md / history.md) — static, no contradiction detection | ◐Static bootstrap files (AGENTS.md / SOUL.md / IDENTITY.md) — persona context only |
| 2. Reason (Evidence & Hypothesis) |
○Not described — model reasons directly over assembled context | ○Not described — single tool call per turn, no hypothesis step | ○Not described — model reasons directly over the assembled prompt |
| 3. Decide (Control State) |
○Provider/error fallback only — no tiered risk resolver | ◐Binary human-approval gate per action type substitutes for tiers | ○Session/global queueing prevents races — not a risk-tier resolver |
| 4. Act (Execution) |
◐70+ tools, ~28 toolsets, concurrent dispatch — no pre-action risk review | ●Git-based checkpoint before every edit — built-in reversibility | ◐Broadest action surface (device/OS tools) — no described pre-action gating |
| 5. Verify | ○Not described | ◐Informal "nothing looks broken" check, plus diff-repair as a syntactic self-check | ○Reply shaping filters/dedupes text — cosmetic, not a correctness check |
| 6. Recover | ◐Backup-provider fallback on 429/5xx/401; fixed 90-turn budget as a stop condition | ◐Checkpoint rollback; diff-repair recovers malformed edits | ◐Auto-compaction can trigger a retry — no named strategies beyond that |
| 7. Learn & Repeat (Memory) |
●Persistent SQLite+FTS5 across sessions; self-improving skills abstracted from successful runs | ◐Memory Bank persists project context; context condensing as the window fills | ○Raw JSONL transcript only — bootstrap files are static, explicitly no self-improvement |
Reading the pattern: Execution and Memory/Learning are the only two stages where any agent reaches full (●) coverage — Kilo's checkpoints and Hermes's persistent memory, respectively. Reasoning is not described by any of the three. This isn't a knock on any one project; it's a reasonably faithful reading of what's in their own public documentation, and it's exactly the gap loop engineering as a discipline is meant to close.
{runId, acceptedAt} immediately"On its own, this loop is a fairly conventional ReAct-style tool-calling loop — structurally similar to what Claude Code, OpenClaw, and most other agent harnesses do..."
"On its own, this is a conventional ReAct-style tool-calling loop — structurally the same shape used by Cline, Roo Code, Claude Code, Cursor, and most other agent harnesses."
"On its own, this is a conventional ReAct-style tool-calling loop... The loop mechanics are not OpenClaw's differentiator."
Same method as the loop: each agent's harness is mapped against the same eleven canonical layers used across Build A Harness, rather than an ad hoc feature list.
| Harness layer | Hermes Agent | Kilo Code | OpenClaw |
|---|---|---|---|
| 1. Caller State | ○Not described | ◐Agent "modes" (code/ask/plan/debug) fix tools + instructions per task | ◐Messaging channel/persona selects context — not a formal constraint system |
| 2. World Model | ○Raw history only | ◐Static Memory Bank files | ◐Static persona/bootstrap files |
| 3. Reasoning | ○Not described | ○Not described | ○Not described |
| 4. Control State | ○Not described | ◐Binary approval gate, not a tiered resolver | ○Queueing prevents races, isn't a risk control |
| 5. Planning | ◐Subagents get their own capped budget via delegate_task — no full dependency graph | ◐Subagent delegation for isolated subtasks | ○Not described |
| 6. Execution | ◐70+ tools, ~28 toolsets, 6 backends, concurrent dispatch | ●Git-snapshot checkpoints + diff-repair + tool-call translation | ◐Broadest action surface (device/OS tools), weakly gated |
| 7. Verification | ○Not described | ◐Informal completion check + diff-repair | ○Not described |
| 8. Recovery | ◐Provider fallback, fixed iteration budget | ◐Checkpoint rollback, diff-repair | ◐Auto-compaction retry only |
| 9. Memory | ●SQLite+FTS5, context compression with a 20-message floor | ◐Memory Bank, context condensing | ◐Raw JSONL transcript, token-limit + compaction reserve |
| 10. Learning | ●Self-improving skills — the project's headline differentiator | ○Not described | ○Explicitly static — no self-improvement |
| 11. Output & Reviewer Pass | ○Not described | ○Not described — human review substitutes | ○Reply shaping is cosmetic, not a quality gate |
Reading the pattern: Execution, Memory, and Learning are the only layers where any agent reaches full coverage — Kilo's checkpoints, and Hermes's persistent memory and self-improving skills, respectively. Reasoning and Output & Reviewer Pass get zero coverage across all three. The rest — Caller State, World Model, Control State, Planning, Verification, and Recovery — are partial at best everywhere: never fully absent, never fully built out either. Which is a fair description of where most production agents are today, popular or not.
Beyond the layer-by-layer mapping, each project reads differently in practice:
| Practical dimension | Hermes Agent | Kilo Code | OpenClaw |
|---|---|---|---|
| Model / provider breadth | 18+ providers, open-weight models | 500+ models, 60+ providers via the Kilo Gateway | Claude, GPT, DeepSeek, or local — OAuth piggyback on existing subscriptions |
| Primary reach | CLI, 20 messaging platforms, IDE (ACP), API, cron | VS Code, JetBrains, CLI, cloud agent, mobile | WhatsApp, Telegram, Discord, Signal, iMessage, Slack, Teams, native device apps |
| Business model | Free, MIT; runs on a few-dollar-a-month VPS | Free/BYOK or Kilo credits, zero markup on provider rates; $8M seed round | Free, MIT, self-hosted; foundation stewardship after the founder joined OpenAI |
| Known risk | Newest of the three (Feb 2026) — memory and self-improving skills lack a long track record | Parsing reliability is a constant fight across a 500-model matrix — not a novel loop, an inherited one | RCE CVE (CVSS 8.8), 1,000+ malicious marketplace skills reported, prompt-injection susceptible; restricted for state use in China |
Every row in this table is a harness decision, not a loop decision. That's the whole thesis of loop engineering: design the cycle deliberately, then spend your real engineering effort on what wraps around it.
All three run a conventional ReAct-style tool-calling loop: call the model, get tool calls, execute them, append the results, repeat until the model returns plain text. Each project's own documentation says as much, in nearly identical language.
Because the differentiation lives entirely in the harness: cross-session memory, platform reach, model-matrix reliability, business model, and security posture. See the harness comparison above.
Not meaningfully. All three collapse to the same four moves with different names and step counts for the same operations. The engineering effort that shows up in outcomes went into the harness, not the loop.
Build A Harness is the open-source tool for designing that harness layer deliberately — world model, control state, verification, recovery — as drawable canvas nodes, instead of accumulating it ad hoc the way all three of these projects did.
This comparison draws on each project's own architecture docs and independent reporting, current as of mid-2026. Token-volume figures are live from OpenRouter's global app rankings, accessed July 2026 — those numbers move daily and will drift from this snapshot.
The layer-by-layer coverage mapping above is our own reading of each project's public documentation against Build A Harness's canonical taxonomy, not an official audit or a claim endorsed by any of the three projects.
Three of the most-used agents in production prove the point: the loop is commodity, the harness is everything. Build A Harness implements the harness layer as drawable, composable nodes, so you design it on purpose from the start.