Design a Loop Get the tool
DESIGN A LOOP · THE DISCIPLINE OF LOOP ENGINEERING

The discipline of
loop engineering.

Every AI agent runs on a loop — perceive, reason, decide, act, verify, recover, repeat. Loop engineering is the discipline of designing that cycle on purpose instead of letting it emerge from whatever glue code happened to ship first. This page covers the discipline. Build A Harness is the open-source tool that builds it.

88% of AI agent projects never reach production — the bottleneck is the loop, not the model
7 stages in a fully engineered agent loop — perceive through learn, then repeat
~1.6% of a production agent is the model's decision logic — the rest is the loop around it

What is loop engineering?

Loop engineering is the discipline of designing the iterative cycle an AI agent runs on every turn: what it perceives, what it believes, what it is allowed to do, how it verifies its own output, and how it recovers when a step fails — before looping back and doing it again.

Every agent that does more than answer a single question runs on a loop, whether anyone designed one or not. Call a model, look at what it produced, decide what to do next, call it again. Left undesigned, that loop degrades in predictable ways: it retries forever when a tool fails, it drifts when early mistakes compound across iterations, it loses track of what it already tried, and it has no way to tell the difference between "done" and "stuck."

Loop engineering is the practice of designing that cycle deliberately — the same way you'd design a state machine, not glue code you patch after something breaks in production. It's the same underlying discipline as harness engineering, viewed through the lens of the cycle rather than the constraint structure around it.

Build A Harness — the tool built for this discipline

Build A Harness is the open-source visual canvas that implements a fully engineered loop as drawable, composable nodes — world model, control state, verification, recovery, all wired into the cycle your agent actually runs on. Apache 2.0. See the tool →

Loop engineering vs prompt engineering

Both matter. But they operate at different points in the cycle and solve different problems. In production, the returns on each are not equal.

Prompt engineering

  • Optimises the input to a single model call
  • Asks: "How should I phrase this?"
  • Scope: one call, one response
  • Reliability gain beyond a baseline: <3%
  • Breaks down once the agent needs to loop more than once

Loop engineering

  • Designs what happens between calls, and how many times the cycle runs
  • Asks: "What should this loop do after each pass, and when should it stop?"
  • Scope: every iteration, every tool call, every failure, every run
  • Reliability gain from loop-level changes: 28–47%
  • The layer that determines whether an agent survives past the demo
In Build A Harness

Every node type on the canvas implements a piece of the loop — not prompt optimisation. It's the tool that makes the 28–47% reliability gain buildable without writing the cycle from scratch. See the node library →

A loop vs a workflow

A workflow is a straight line: input, LLM call, tool call, output, done once. It gets data where it needs to go, and it's the right tool when the number of steps is known ahead of time and the failure modes are predictable.

A loop is designed to run an unknown number of times, carrying state forward from one pass to the next. Where a workflow routes, a loop governs:

Loops are the right tool for any agent operating in an environment where the correct number of steps isn't known in advance — most agentic use cases qualify, even ones that look like simple workflows at first.

In Build A Harness

world_model · control_state · verify_gate · recovery · exp_store — these are the node types that implement what a plain workflow doesn't. Draw them on the canvas; Build A Harness generates the loop. See the tool →

The seven stages of an engineered loop

You don't need every stage for every use case. But each one addresses a specific failure mode that will surface in production if left unhandled. Start with the stages your agent needs today; add the rest as the failure modes appear.

Stage 1 — Perceive World Model Typed beliefs about the environment, contradiction detection, and generation_id tracking so the agent knows what it knows — and when that knowledge was formed. Without it: the next iteration acts on stale or contradictory beliefs without knowing it.
Stage 2 — Reason Evidence & Hypothesis An evidence store, hypotheses from four generation sources, and a value-of-information gate that fires before every action. Without it: the loop commits to the first plausible hypothesis instead of the best-supported one.
Stage 3 — Decide Control State A five-tier state resolver (NORMAL → CAUTIOUS → RESTRICTED → BLOCKED → HALT) evaluated fresh every iteration, plus deadlock detection that stops escalation loops. Without it: no brakes — each pass escalates further when conditions degrade.
Stage 4 — Act Execution VOI-gated actions, a pre-execution review gate across safety, relevance, reversibility, resource, and alignment, and reversibility classification before any action fires. Without it: an iteration takes an irreversible action without assessing the cost first.
Stage 5 — Verify Verification & Review Nine verification layers and an adversarial reviewer pass that check this iteration's output before the loop either continues or exits. Without it: a bad pass feeds straight into the next iteration, or straight out to the user.
Stage 6 — Recover Recovery & Replanning Six named recovery strategies, a typed failure library, and local vs. global replanning scope, so a failed step is handled systematically rather than ending the run. Without it: one failed tool call ends the session instead of triggering a recovery path.
Stage 7 — Learn & Repeat Memory & Experience Context compression so the loop doesn't truncate its own history, and an experience store so future runs warm-start instead of beginning cold — then the cycle repeats from Stage 1. Without it: every run starts cold and long-running loops silently lose the thread of what they've done.
In Build A Harness

All seven stages are implemented as drawable, composable node types — tested across 379 tests. You do not write the loop's infrastructure: Build A Harness ships it. See the full node library →

Design once. Run any framework.

Build A Harness is the open-source tool built for this discipline. It covers the full loop — from drawing the cycle on a visual canvas to compiling and running it in production — without locking you into a single framework.

Step 1 — Design Visual canvas Draw your loop using 27 node types — 14 execution nodes and 13 loop-control nodes that implement the full seven-stage cycle. input · output · llm_call · tool_invoke
agent_role · agent_debate · condition
parallel_fork · parallel_join · hitl_breakpoint
world_model · control_state · verify_gate
recovery · exp_store · reviewer_pass · +8 more
Step 2 — Compile FlowSpec The canvas exports to FlowSpec — a runtime-neutral, portable JSON format. One FlowSpec file runs on any supported framework without rewriting. Open format · version-controlled
Framework-agnostic
Third-party node packs supported
(@buildaharness/nodes/…)
Step 3 — Run Any framework The same FlowSpec compiles to LangGraph, CrewAI, Mastra, or Microsoft Agent Framework. Switch runtimes without touching the loop design. LangGraph (Python / JS)
CrewAI (Python)
Mastra (TypeScript)
MS Agent Framework (C# / Python / Java)
A2A protocol (any A2A runtime)
Step 4 — Observe Langfuse Every iteration is traced automatically. Spans extend to world model generation_id, control state transitions, recovery strategy changes, and reviewer-pass findings. Per-iteration tracing
Compare runs across frameworks
HITL pause and resume
REST · MCP · A2A deployment

Up and running in two commands.

Everything runs locally via Docker. No cloud account, no API key required to start — use Ollama to run a free local model if you prefer.

shell
# 1. Generate secrets and configure environment
git clone https://github.com/3IVIS/buildaharness.git && cd buildaharness
./scripts/setup-env.sh

# 2. Start all nine services
docker compose up
Canvas localhost:3000 Visual loop editor
API localhost:8000 Compiles and runs FlowSpec
Langfuse localhost:3001 Observability dashboard

Apache 2.0. Source on GitHub →

Questions, answered.

What is loop engineering?

The discipline of designing the iterative cycle an AI agent runs on every turn — what it perceives, what it believes, what it is allowed to do, how it verifies its own output, and how it recovers when a step fails — before looping back and doing it again.

What is the difference between loop engineering and prompt engineering?

Prompt engineering optimises a single model call. Loop engineering designs what happens between calls — the state carried forward, the stopping condition, the verification each pass, and the recovery path when a pass fails. In production, loop-level design accounts for the large majority of agent reliability gains; prompt refinement beyond a reasonable baseline accounts for a small fraction.

What is the difference between a loop and a workflow?

A workflow is a straight line: input, LLM call, tool call, output, done once. A loop is designed to run an unknown number of times, carrying state forward and deciding after each pass whether to continue, escalate, retry, or stop. Workflows suit deterministic linear tasks; loops suit agents operating where the right number of steps isn't known in advance.

Is loop engineering the same as harness engineering?

Yes — two names for the same discipline. Harness engineering emphasizes the constraint structure around the model; loop engineering emphasizes the iterative cycle that structure runs on. Build A Harness is the tool that implements both views at once.

What is the "agent loop" people talk about?

The repeating cycle an autonomous agent runs through: observe the current state, reason about it, decide on an action, execute it, check the result, and repeat until a stopping condition is met. Most agent failures — infinite retries, drift, compounding hallucination, silent context loss — trace back to a loop that was never explicitly designed, not to the underlying model.

What is FlowSpec?

FlowSpec is the runtime-neutral JSON format at the centre of Build A Harness. You design a loop once on the visual canvas and it compiles to a FlowSpec file. That single file then runs on LangGraph, CrewAI, Mastra, or Microsoft Agent Framework without rewriting.

How do I get started with loop engineering?

Clone github.com/3IVIS/buildaharness, run ./scripts/setup-env.sh && docker compose up, and open the canvas on localhost:3000. Five ready-made harnesses are included to fork and build on. No cloud account required.

Stop letting the loop
happen by accident.

Build A Harness implements the full seven-stage loop — world model, control state, verification, recovery, and cross-run learning — as drawable, composable nodes. Apache 2.0. Runs locally via Docker.