The Three Feedback Loops of Agentic Software

Why better feedback loops let agents work with more autonomy

Mar 13, 2026

Most teams try to improve agents through prompt work. They rewrite instructions, add context, and ask for a better plan. That helps, especially when the instructions are weak. But once the basics are good enough, the main bottleneck is usually the feedback loop.

If an agent cannot see what happened, check its own work, and recover quickly, you cannot trust it with much autonomy. You end up doing courier work: take the screenshot, paste the log, restate the bug, and ask it to try again. The system looks agentic, but the human is still carrying evidence back and forth.

I think about this as three nested loops. They are not three competing philosophies. They are three layers of feedback that tighten what the agent can do and narrow what the human must do.

- Loop 1: the human verifies after the fact

- Loop 2: the agent verifies its own work

- Loop 3: the human closes the loop on reality

Better feedback loops are what let agents work with more autonomy without losing contact with reality.

Loop 1: The human verifies after the fact

This is where most teams start.

Imagine a small web demo with an SVG orbit and a moving marker. You ask the agent to adjust the motion, line up the marker with a target, or change the animation. The agent edits the code and says it is done. Then you open the app, inspect the result, take a screenshot, and describe what is still wrong.

The loop works. It is also slow.

The human becomes the transport layer between the agent and the finished state. Because the agent cannot inspect the result on its own, it cannot correct itself. Every iteration depends on another round trip through you.

Loop 2: The agent verifies its own work

The second loop starts when the agent can inspect the system directly.

Give the same orbit demo browser automation, screenshots, logs, DOM assertions, and a fast test runner. Now the agent can change the code, open the page, capture the result, check whether the marker reached the target, and try again without waiting for a human to relay the evidence.

That is the key step up from Loop 1. The agent no longer needs the human to act as courier for basic evidence. It can compare the code against the rendered result itself. But the human often still has to do the last piece of work: open the final state, decide whether it really makes sense, and translate that judgment back.

That change is larger than it sounds. Better tools do not merely make the workflow nicer. They increase how much autonomy you can safely allow.

The same pattern holds outside toy demos. A broken onboarding form, a flaky CLI flow, or a staging-only UI regression all become easier when the agent can run the app, inspect structured output, and verify the result directly.

The analogy is simple. A developer can write software in a bare text editor. The same developer will move faster and make fewer mistakes with search, debugging, tests, and fast feedback. Agents are no different. A shell and a pile of text are enough to start. Logs, structured file access, screenshots, and browser tooling are what make self-correction practical.

That is Loop 2. The agent can inspect its own work, verify the result, and retry without waiting for a human to relay the evidence.

If you want agents to do more unsupervised work, start here. Improve what they can see and shorten the time it takes them to check themselves. But do not stop there. The next step is to build a shared playground around the app.

Loop 3: The human closes the loop on reality

Loop 3 starts when you build a shared playground around the app.

Loop 2 is about self-verification. The agent can run checks, inspect output, and correct itself. Loop 3 adds a shared surface for inspection. The agent and the human can open the same exact state on demand, look at the same evidence, and talk about the same thing.

That distinction matters. Passing tests is not the same as shipping the right thing. A screenshot can prove that the marker is on the target. It cannot tell you whether the motion feels awkward, whether the interaction matches the real intent, or whether the change solves the user’s problem.

Take the orbit demo one step further. Add query parameters that pin the marker to an exact angle, set the target, pause the motion, and render known scenarios on demand. Now the agent can open a precise state by itself. It can run checks and capture screenshots. The human can open the same URL and inspect the same state. The agent closes the verification loop; the human closes the product loop.

The form of that playground depends on the app. In a web app, it might be query parameters, fixtures, and debug routes. In a CLI app, it might be a scripted tmux session with fixed terminal size, seeded input, known fixtures, and a command that recreates the flow on demand. The tool changes. The principle does not: build a surface where the agent and the human can inspect the same reality.

That is the job the human should keep. Not micromanaging every step. Not approving every command. The human anchors the work to product intent, taste, and real-world constraints.

What to add in practice

If you want agents to do more useful work, shape the environment around them. In a real repo, that often means adding a few boring, high-leverage affordances:

- Build a shared playground for the app. For a web app, that may mean fixtures, query parameters, and debug routes. For a CLI app, it may mean a scripted tmux session with fixed dimensions, seeded input, and known fixtures.

- Expose important state on purpose. Add seed scripts, stable fixtures, and one-command setup paths that let an agent open a known state without guessing.

- Make verification cheap. Keep one fast command for focused tests, one reliable way to run the app, and one short path to capture logs or screenshots.

- Prefer structured tools over scraping. Return JSON from diagnostics, use stable DOM locators, and expose clear file or API interfaces instead of forcing brittle shell parsing.

None of this is glamorous. It is infrastructure for feedback. But it changes the shape of the work. When the system is legible, the agent can move faster. When checks are cheap, the agent can recover faster. When the human and the agent can inspect the same state, trust gets easier to build.

A small toy project helps because it makes these loops easy to see. But the lesson is not about toys. In real teams, the ceiling on agent autonomy is usually set less by intelligence than by feedback. If agents can inspect state, run checks, and recover on their own, they stop needing a human to ferry evidence between attempts. If they can step into a shared playground built for the app, the human can stop acting as courier and start acting as judge.

Prompting still matters. But once the task is understood, the practical questions are simpler: What can the agent see? What can it verify? What shared playground can it open with you? Better answers to those questions are what turn a clever demo into a reliable way of working.

Working Frontier

Discussion about this post

Ready for more?