Small Threads

The million-token window did not change the discipline

May 19, 2026

Early on, the case for short sessions was practical. Context windows were small. Long sessions overflowed and broke. Splitting work was not a discipline — it was a constraint.

Then the windows grew. A million tokens now. More soon. The constraint is gone, and the temptation is to fill the space: pour the whole task in, every file, every ticket, every stray thought, and let the agent sort it out.

The discipline should stay.

A long session degrades in ways more context cannot repair. The agent picks up false constraints from earlier turns. It rationalizes around a mistake it made in turn three. Decisions made for one slice bleed into slices where they do not belong. You can watch it happen: the agent quotes its own earlier wrong assumption back at you in turn nine as if it were a requirement. The transcript becomes a thicket the agent must carry but cannot prune.

A session is a function

A session has inputs, a scope, and an output. The discipline you apply to a function — one responsibility, narrow interface, verifiable result — applies here. Cram three responsibilities into one session and you get what a thousand-line function gives you: hidden coupling, surprising side effects, untestable behavior.

Engineers do not write the whole system in one function. They do not write the whole feature in one PR when they can avoid it. The same instinct should govern how you scope an agent’s work. If two parts can be done independently, give them to two agents. If one part must finish before the other can start, that is a sequence, not a single session.

The groundwork is its own thread

Some work has true shared groundwork. A schema migration before two features that both depend on the new fields. A refactor that lifts a common abstraction before three callers can use it. Do that work first, in its own session, and capture the result — a merged PR, a written summary, a pinned doc.

The dependent work runs in fresh threads with the groundwork as input, not as conversation history. The context they need is the artifact, not the journey that produced it.

Chunk like an engineer

A senior engineer, given a feature, breaks it into pieces a junior could pick up cold. Each piece has a clear input, a clear output, and a way to verify it. The pieces fit together — in parallel, in sequence, or both — but each one stands on its own.

Apply that to agents. Before starting a session, ask: what is the smallest piece of work that produces something verifiable? That piece is the session. The next piece is another session, started fresh, with whatever the first produced as input.

This sounds like overhead. The overhead is the long session you debug, restart, and re-explain because something went sideways in turn ten. Small threads fail small. Big threads fail expensively.

A bigger window is a capacity, not an instruction. Use it when more is unavoidable. More is not better.

Scope the thread. Capture the output. Start fresh.

Working Frontier

Discussion about this post

Ready for more?