The Cold Reviewer

Pushback only works from outside the thread

May 26, 2026

The model agreed. That should worry you.

You spend an hour on a design with an agent. By turn fifteen it has internalized your framing and now defends, as obvious, the choices you made in turn three. When you ask that same thread to review the work, it reviews against assumptions it shares with you. The pushback you get is the pushback of a colleague who was in the room the whole time.

That is not pushback. It is agreement with extra steps.

A reviewer that shares your context shares your bias

Most people respond by pressing harder. *Tell me what’s wrong. What are my blind spots? Push back.* The agent obliges, produces three plausible critiques, and the conversation moves on. The critiques sound right because they are bound by the same frame as the work.

You cannot ask the thread that built the thing to challenge the thing. The framing is in the air.

The fix is structural. No prompt rewording will save you. Launch a subagent — a fresh context with no history of how you got here. Hand it the artifact and ask what breaks. It reads cold. It has no turn-three commitment to defend.

Three things bias a reviewer: the conversation it just had with you, its training, and the artifact itself. The fresh subagent fixes the first. It still flatters by default. It still inherits whatever frame the artifact encodes. This technique moves the bias you can move; the rest stays.

Hand over the artifact, not your interpretation

This is where the technique fails for most people.

The temptation is to brief the subagent the way you would brief a teammate. *We decided to do X because of Y, and I’m worried about Z — what do you think?* Now the subagent inherits your frame. The clean context is wasted: the prompt itself is the laundered version of your bias. You have built a second room and walked in carrying the same furniture.

The discipline is to hand over the artifact and let it speak for itself. *Read this design. What breaks?* Strip the summary, the decision history, the preview of what you fear. Let the subagent encounter the work the way a reviewer would on a Friday afternoon — with only the artifact, and no clue what you wanted to hear.

What comes back is sometimes useless. Sometimes it surfaces a flaw you would have hunted for two days. The signal is highest when you do not preload the answer.

Cold reading has its own failure mode. The subagent will sometimes flag an issue you settled in turn five — a deliberate trade-off it cannot see and reads as a bug. The cost is a minute of triage, not a wrong fix. Brief it only when the artifact cannot stand alone: a design that omits its constraints, code that depends on conventions nowhere written down. Default to letting it read cold and discarding the false positives.

Adversarial review, by default

After a long thinking session, senior engineers used to pull a teammate over for a sanity check. The cost was a colleague’s interruption, and that cost made sanity checks rare — reserved for the work that mattered most.

That cost is gone. Spawn a fresh agent. Hand it the artifact. Ask what breaks. Do this every time the work matters — not because the agent is smarter than you, but because it has not spent the last hour nodding along.

A reviewer in your thread shares the blind spots you built together. A reviewer in another context does not.

Working Frontier

Discussion about this post

Ready for more?