Kill the Pull Request

Code review was designed for human-speed development

May 05, 2026

Code review made sense when humans wrote the code. A developer spent hours on a change, opened a pull request, and a colleague read it. The reviewer caught mistakes, shared context, and guarded quality. Waiting a few hours for feedback cost little compared to writing the code itself.

That ratio has flipped.

The bottleneck moved

An agent produces a candidate change in minutes. The review queue becomes the constraint. A single developer with agents can generate changes faster than reviewers can meaningfully evaluate them. The backlog grows. Context-switching multiplies. Reviewers skim because the volume makes careful reading impossible.

This is a structural problem, not a discipline one. Code review assumed writing was slow and reviewing was fast relative to writing. The opposite is now true. Writing is nearly instant; reviewing still takes the same human time it always did.

You cannot hire enough reviewers to match the pace of agents. You cannot speed up reviewers without gutting the review. Throttling agents uniformly defeats the purpose — but not all changes carry equal risk.

What review actually provided

Before killing the pull request, understand what it gave us.

Code review served four purposes: catching bugs, enforcing style, sharing knowledge, and maintaining quality standards.

Catching bugs is better handled by tests — but only if the tests themselves are trustworthy. A test suite that runs on every commit is more thorough, more consistent, and faster than a human scanning a diff. The risk is that when the agent writes both tests and implementation, it can share the same blind spots across both. TDD ordering — test first, watch it fail, then implement — ensures the test is not vacuous, but it does not ensure the test matches the actual requirement. That gap is why the specification matters more than the diff. A human-written spec, precise enough that test correctness becomes verifiable, is where human attention belongs. The agent proves correctness; the harness verifies the proof is honest — through coverage thresholds, mutation testing, or property-based tests that generate cases the agent did not anticipate.

Enforcing style is partly automated. Linters, formatters, and static analysis handle syntactic style without a human. Architectural consistency — API design, abstraction choices, pattern adherence — is harder to automate and falls under quality standards.

Sharing knowledge is the most underrated purpose. Pull requests are often the only way teams stay aware of what is changing. Kill the PR without replacing that signal and developers lose track of the codebase fast — especially when agents churn out changes at volume. The replacement must be concrete: an auto-generated changelog from merged commits, a notable-changes channel, a daily digest. Something that arrives without anyone asking for it.

That leaves quality standards — the judgment calls. Does this change belong? Is this the right approach? Does it add complexity we will regret? Human attention still matters here. But it need not happen on every change.

Auto-merge by default, human review by exception

The alternative is simple. A change that passes the test suite, type checks, linting, and whatever other automated verification you trust merges and deploys. No pull request. No reviewer. No queue.

A change that touches a critical path — payment processing, authentication, database migrations, core abstractions — pulls in a human automatically. Not because a colleague chose to request a review, but because the system knows which paths demand human judgment.

Defining the critical path requires mapping not just files but dependency chains — a change to a shared utility can affect payments indirectly. The boundary will be wrong sometimes, and changes that should have been reviewed will slip through. That is acceptable if the layered defenses catch the failures quickly. The question is not whether the boundary is perfect but whether the cost of occasional misclassification is lower than the cost of reviewing everything. For most teams, it is.

We already trust automation in constrained domains — CI pipelines, auto-scaling, infrastructure provisioning. Those systems operate within pre-reviewed boundaries. The challenge here is defining boundaries tight enough to make auto-merge safe for novel code changes.

The changes you did not tag

The honest objection concerns the change nobody expected to be dangerous. A config tweak that breaks production. A refactor that subtly shifts behavior. A dependency update that introduces a vulnerability.

Pull requests caught some of these — but inconsistently, and at a cost that scales poorly when change volume increases by an order of magnitude. The question is not whether review has value, but whether that value justifies the bottleneck when alternatives exist.

The real defense is layered. Automated config validation, canary deployments, real-time alerting, fast rollbacks. No single layer is perfect, but the combination covers the same ground more reliably. A reviewer might notice a dangerous config value once in ten times. A validation rule catches it every time.

Acknowledge the limits: not everything rolls back cleanly. Database migrations mutate state. External API contracts change. These belong on the critical path list alongside payments and auth — not because a human will catch every problem, but because an unreviewed mistake here costs too much.

The real objection

The objection is rarely about bugs. It is about trust.

The natural counter: fix review culture, do not kill the mechanism. Make reviews more thorough. But more thorough at what? If the answer is catching bugs and enforcing style, that is work machines already do better. The reviews worth keeping are the ones no machine can replace: is this the right architecture? Is the product heading in the right direction? Is this system decoupled enough to change later? Those deserve human attention. The rest does not.

Humans should plan. They should decide what the product becomes, why a user would care, and what problem the team solves next. They should write the specifications that make agent output verifiable — not scan diffs for off-by-one errors. The pull request, as it exists today, puts the reviewer in the wrong seat: reading implementation details instead of shaping direction.

Some teams face compliance constraints. Regulated industries require documented approval as an audit trail. That is a real constraint, not theater. For those teams, the immediate goal is narrowing approval to what regulators actually require. Longer term, automated verification with immutable audit logs may satisfy the same regulatory intent. The pull request is not the only way to prove a change was vetted.

For everyone else, trust should come from the system, not from a ritual. Build the system that deserves trust, and the ritual becomes unnecessary.

Better, not just faster

Speed is the obvious gain. Code in minutes instead of days. Iterations in seconds instead of hours. But speed alone is the wrong goal. Faster bad code is still bad code.

The pull request never made code better. It made code reviewed. A tired reviewer approving a mediocre diff did not raise the bar — it just added a stamp. The real opportunity is not to ship the same quality faster. It is to ship higher quality at the same pace agents already move.

That means investing the time the review queue used to consume. Write better specifications so agents produce cleaner first drafts. Build stricter test harnesses so regressions die before they merge. Define the standards — naming, structure, error handling — so agents follow them by default. The discipline shifts from inspecting output to defining what good looks like.

Kill the pull request. Invest in verification. Tag the critical paths. Let the agents ship better code, and let the humans shape what gets built.

Working Frontier

Discussion about this post

Ready for more?