CASE STUDY · THREE OF THE MOST-USED OPEN-SOURCE AI AGENTS

Same loop.
Wildly different harnesses.

Q: If the loops are the same, why are these products so different?

Because the differentiation lives entirely in the harness built around the loop: what memory persists across sessions, which platforms and devices it reaches, how it handles a heterogeneous model matrix, what business model funds it, and what security posture it ships with. Hermes differentiates on persistent memory and self-improving skills; Kilo Code differentiates on model-agnostic reliability plumbing and pricing; OpenClaw differentiates on messaging-app presence and a persona layer.

Hermes Agent, Kilo Code, and OpenClaw are three of the most-used open-source AI agents running today. Each one's own documentation describes its core loop as a conventional, commodity pattern — and each one is right. Here's what's actually identical, and what isn't. Read the harness-first version on Build A Harness →

3/3 projects that describe their own loop as "conventional" in their own docs — in nearly identical language

8/11 harness layers where none of the three reaches full coverage, mapped against the canonical 11-layer architecture

0/3 ship a formal output/reviewer-pass quality gate before a reply goes out

See the loop ↓ See the harness ↓ Sources ↓

AT A GLANCE

Three agents, three very different scales.

Each project reports its own headline numbers in its own units — stars, users, tokens — so the table below shows each on its own terms rather than forcing a false comparison.

Hermes Agent

MakerNous Research

LaunchedFeb 2026

LicenseMIT

GitHub stars~140,000 (in 3 mo.)

OpenRouter volume770B tokens/day

Primary interfaceCLI · messaging · IDE · API

Kilo Code

MakerKilo (fork of Roo/Cline)

LaunchedMar 2025

LicenseOpen source + hosted gateway

Users1.5M+ (3M+ "Kilo Coders")

Tokens processed40T+ (cumulative)

OpenRouter volume235B tokens/day

Primary interfaceVS Code · JetBrains · CLI · mobile

OpenClaw

MakerPeter Steinberger

LaunchedNov 2025 (as Clawdbot)

LicenseMIT

GitHub stars~247,000

OpenRouter volume161B tokens/day

Primary interfaceMessaging apps · device nodes

OpenRouter daily token volume — global app rankings

Hermes Agent

770B/day

Kilo Code

235B/day

OpenClaw

161B/day

Live figures from OpenRouter's global app rankings (openrouter.ai/apps), accessed July 2026 — all three now report through OpenRouter, correcting an earlier assumption on this page that Kilo Code was gateway-only. Hermes overtook OpenClaw on this metric back in mid-May 2026 and has grown substantially since. For context, Claude Code (not one of the three agents covered here) ranks between Kilo Code and OpenClaw at 218B/day.

THE LOOP

Mapped against the seven-stage loop.

Rather than inventing a one-off comparison, each agent's documented behavior is mapped against the same seven-stage loop used across this site — so the comparison means the same thing here as everywhere else on Design a Loop.

● present — a real, documented mechanism ◐ partial — an ad hoc or static substitute ○ not described in public docs

Loop stage	Hermes Agent	Kilo Code	OpenClaw
1. Perceive (World Model)	○Raw session history (SQLite) — no typed beliefs or contradiction detection	◐Memory Bank files (context.md / brief.md / history.md) — static, no contradiction detection	◐Static bootstrap files (AGENTS.md / SOUL.md / IDENTITY.md) — persona context only
2. Reason (Evidence & Hypothesis)	○Not described — model reasons directly over assembled context	○Not described — single tool call per turn, no hypothesis step	○Not described — model reasons directly over the assembled prompt
3. Decide (Control State)	○Provider/error fallback only — no tiered risk resolver	◐Binary human-approval gate per action type substitutes for tiers	○Session/global queueing prevents races — not a risk-tier resolver
4. Act (Execution)	◐70+ tools, ~28 toolsets, concurrent dispatch — no pre-action risk review	●Git-based checkpoint before every edit — built-in reversibility	◐Broadest action surface (device/OS tools) — no described pre-action gating
5. Verify	○Not described	◐Informal "nothing looks broken" check, plus diff-repair as a syntactic self-check	○Reply shaping filters/dedupes text — cosmetic, not a correctness check
6. Recover	◐Backup-provider fallback on 429/5xx/401; fixed 90-turn budget as a stop condition	◐Checkpoint rollback; diff-repair recovers malformed edits	◐Auto-compaction can trigger a retry — no named strategies beyond that
7. Learn & Repeat (Memory)	●Persistent SQLite+FTS5 across sessions; self-improving skills abstracted from successful runs	◐Memory Bank persists project context; context condensing as the window fills	○Raw JSONL transcript only — bootstrap files are static, explicitly no self-improvement

Reading the pattern: Execution and Memory/Learning are the only two stages where any agent reaches full (●) coverage — Kilo's checkpoints and Hermes's persistent memory, respectively. Reasoning is not described by any of the three. This isn't a knock on any one project; it's a reasonably faithful reading of what's in their own public documentation, and it's exactly the gap loop engineering as a discipline is meant to close.

See each agent's full, unsimplified step list

Hermes Agent — 8 steps

Generate a task ID, append the user message
Build or reuse the cached system prompt
Check whether preflight compression is needed (>50% context full)
Build API messages per API mode
Inject ephemeral prompt layers (budget/context warnings)
Apply prompt caching markers (Anthropic)
Make an interruptible API call
Parse response: execute tool calls and loop to step 4, or persist + return

Kilo Code — 7 steps

User gives a task, selects an agent mode (code/ask/plan/debug)
Kilo assembles context: system prompt, auto-found files, @-mentions, Memory Bank
Prompt + tools sent to the configured model via the Kilo Gateway
Model replies with reasoning plus typically one tool call
State-changing actions pause for user approval unless auto-approved
Approved action executes; result captured
If complete, present for review; else append observation and loop to step 3

OpenClaw — 7 steps

Intake — Gateway RPC/CLI validates params, returns {runId, acceptedAt} immediately
Queue & resolve — serialize via per-session + global queues, resolve model/auth
Session & workspace prep — resolve workspace, load skills, inject bootstrap files
Prompt assembly — base + skills + bootstrap context, token limits enforced
Model inference ⇄ tool execution — stream text/tool-call deltas, execute, repeat
Reply shaping — assemble final payload, filter internal tokens
Persist & emit — write transcript to JSONL, emit lifecycle:end

Hermes Agent docs

"On its own, this loop is a fairly conventional ReAct-style tool-calling loop — structurally similar to what Claude Code, OpenClaw, and most other agent harnesses do..."

Kilo Code docs

"On its own, this is a conventional ReAct-style tool-calling loop — structurally the same shape used by Cline, Roo Code, Claude Code, Cursor, and most other agent harnesses."

OpenClaw docs

"On its own, this is a conventional ReAct-style tool-calling loop... The loop mechanics are not OpenClaw's differentiator."

THE HARNESS

Mapped against the eleven-layer harness.

Same method as the loop: each agent's harness is mapped against the same eleven canonical layers used across Build A Harness, rather than an ad hoc feature list.

● present ◐ partial ○ not described

Harness layer	Hermes Agent	Kilo Code	OpenClaw
1. Caller State	○Not described	◐Agent "modes" (code/ask/plan/debug) fix tools + instructions per task	◐Messaging channel/persona selects context — not a formal constraint system
2. World Model	○Raw history only	◐Static Memory Bank files	◐Static persona/bootstrap files
3. Reasoning	○Not described	○Not described	○Not described
4. Control State	○Not described	◐Binary approval gate, not a tiered resolver	○Queueing prevents races, isn't a risk control
5. Planning	◐Subagents get their own capped budget via delegate_task — no full dependency graph	◐Subagent delegation for isolated subtasks	○Not described
6. Execution	◐70+ tools, ~28 toolsets, 6 backends, concurrent dispatch	●Git-snapshot checkpoints + diff-repair + tool-call translation	◐Broadest action surface (device/OS tools), weakly gated
7. Verification	○Not described	◐Informal completion check + diff-repair	○Not described
8. Recovery	◐Provider fallback, fixed iteration budget	◐Checkpoint rollback, diff-repair	◐Auto-compaction retry only
9. Memory	●SQLite+FTS5, context compression with a 20-message floor	◐Memory Bank, context condensing	◐Raw JSONL transcript, token-limit + compaction reserve
10. Learning	●Self-improving skills — the project's headline differentiator	○Not described	○Explicitly static — no self-improvement
11. Output & Reviewer Pass	○Not described	○Not described — human review substitutes	○Reply shaping is cosmetic, not a quality gate

Reading the pattern: Execution, Memory, and Learning are the only layers where any agent reaches full coverage — Kilo's checkpoints, and Hermes's persistent memory and self-improving skills, respectively. Reasoning and Output & Reviewer Pass get zero coverage across all three. The rest — Caller State, World Model, Control State, Planning, Verification, and Recovery — are partial at best everywhere: never fully absent, never fully built out either. Which is a fair description of where most production agents are today, popular or not.

What it's actually like to run one.

Beyond the layer-by-layer mapping, each project reads differently in practice:

Practical dimension	Hermes Agent	Kilo Code	OpenClaw
Model / provider breadth	18+ providers, open-weight models	500+ models, 60+ providers via the Kilo Gateway	Claude, GPT, DeepSeek, or local — OAuth piggyback on existing subscriptions
Primary reach	CLI, 20 messaging platforms, IDE (ACP), API, cron	VS Code, JetBrains, CLI, cloud agent, mobile	WhatsApp, Telegram, Discord, Signal, iMessage, Slack, Teams, native device apps
Business model	Free, MIT; runs on a few-dollar-a-month VPS	Free/BYOK or Kilo credits, zero markup on provider rates; $8M seed round	Free, MIT, self-hosted; foundation stewardship after the founder joined OpenAI
Known risk	Newest of the three (Feb 2026) — memory and self-improving skills lack a long track record	Parsing reliability is a constant fight across a 500-model matrix — not a novel loop, an inherited one	RCE CVE (CVSS 8.8), 1,000+ malicious marketplace skills reported, prompt-injection susceptible; restricted for state use in China

The takeaway

Every row in this table is a harness decision, not a loop decision. That's the whole thesis of loop engineering: design the cycle deliberately, then spend your real engineering effort on what wraps around it.

FAQ

Questions, answered.

What do Hermes Agent, Kilo Code, and OpenClaw have in common?

All three run a conventional ReAct-style tool-calling loop: call the model, get tool calls, execute them, append the results, repeat until the model returns plain text. Each project's own documentation says as much, in nearly identical language.

If the loops are the same, why are these products so different?

Because the differentiation lives entirely in the harness: cross-session memory, platform reach, model-matrix reliability, business model, and security posture. See the harness comparison above.

Is one of these three loops better engineered than the others?

Not meaningfully. All three collapse to the same four moves with different names and step counts for the same operations. The engineering effort that shows up in outcomes went into the harness, not the loop.

Where does Build A Harness fit in?

Build A Harness is the open-source tool for designing that harness layer deliberately — world model, control state, verification, recovery — as drawable canvas nodes, instead of accumulating it ad hoc the way all three of these projects did.

SOURCES

Primary documentation.

This comparison draws on each project's own architecture docs and independent reporting, current as of mid-2026. Token-volume figures are live from OpenRouter's global app rankings, accessed July 2026 — those numbers move daily and will drift from this snapshot.

Hermes Agent

Kilo Code

Using Agents — Kilo Code docs
Checkpoints — Kilo Code docs
Inside Kilo Code — Tessl
Kilo's own OpenClaw-vs-Hermes page (vendor comparison — used only for corroborating color, not as a primary technical source)

OpenClaw

The layer-by-layer coverage mapping above is our own reading of each project's public documentation against Build A Harness's canonical taxonomy, not an official audit or a claim endorsed by any of the three projects.

DESIGN A LOOP · POWERED BY BUILD A HARNESS

Don't let your harness
happen by accident.

Three of the most-used agents in production prove the point: the loop is commodity, the harness is everything. Build A Harness implements the harness layer as drawable, composable nodes, so you design it on purpose from the start.

Star on GitHub Explore Build A Harness → Read loop engineering →

Same loop.Wildly different harnesses.

Three agents, three very different scales.

Mapped against the seven-stage loop.

Mapped against the eleven-layer harness.

What it's actually like to run one.

Questions, answered.

What do Hermes Agent, Kilo Code, and OpenClaw have in common?

If the loops are the same, why are these products so different?

Is one of these three loops better engineered than the others?

Where does Build A Harness fit in?

Primary documentation.

Don't let your harnesshappen by accident.

Same loop.
Wildly different harnesses.

Don't let your harness
happen by accident.