<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Working Frontier]]></title><description><![CDATA[Trying to make sense of the frontier of software, and how to work in it.]]></description><link>https://workingfrontier.nicolaeandrei.com</link><image><url>https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png</url><title>Working Frontier</title><link>https://workingfrontier.nicolaeandrei.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 19 Jun 2026 09:25:19 GMT</lastBuildDate><atom:link href="https://workingfrontier.nicolaeandrei.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Andrei-Mihai Nicolae]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[andrei@nicolaeandrei.com]]></webMaster><itunes:owner><itunes:email><![CDATA[andrei@nicolaeandrei.com]]></itunes:email><itunes:name><![CDATA[Andrei-Mihai Nicolae]]></itunes:name></itunes:owner><itunes:author><![CDATA[Andrei-Mihai Nicolae]]></itunes:author><googleplay:owner><![CDATA[andrei@nicolaeandrei.com]]></googleplay:owner><googleplay:email><![CDATA[andrei@nicolaeandrei.com]]></googleplay:email><googleplay:author><![CDATA[Andrei-Mihai Nicolae]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[What Compounds]]></title><description><![CDATA[The career advice that doesn&#8217;t sound like advice]]></description><link>https://workingfrontier.nicolaeandrei.com/p/what-compounds</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/what-compounds</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 02 Jun 2026 08:00:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most career advice is a trick &#8212; a hack, a framework, a way around the work. The advice that compounds is dull. That is why it works.</p><p>If I could send a few notes back to a younger me, none of them would be technical.</p><h2>Own the work</h2><p>Take responsibility for the work in front of you, and do it well. Over years, the single largest leverage you have in any job is that you can be relied on. That sounds boring. The compounding advantages usually are.</p><p>Be nice to everyone. Not as a strategy. Not because it pays off. Because the alternative is exhausting, and most of the people around you are doing harder work than you can see.</p><p>Take feedback seriously, without taking it personally. Junior engineers do one of two things with feedback: dismiss it as wrong, or absorb it as identity damage. Both are expensive. The skill is separating the signal from the sting &#8212; pulling out what is true and discarding the rest, including the parts that hurt for reasons that have nothing to do with the work.</p><p>Before asking someone for help, do the work yourself first. The exception is a deadline at risk &#8212; then ask early, not late. Outside of that, the struggle *is* the work &#8212; not the friction of being stuck, but the act of forming your own model of the problem before someone else hands you theirs. Skip that and you skip the thing you are actually paid to learn.</p><h2>Serve the organization</h2><p>The job is not to be a good engineer. It is to help the organization move forward. The two are correlated, but not the same thing.</p><p>That means asking the questions other people are not asking. Understanding what the product is for, who pays for it, where it is going. Pushing on decisions that look suspect even when they sit above your level. Plenty of capable engineers spend years optimizing their craft and never wonder whether the work is the right work for the company.</p><p>This is also why job-hopping every twelve months is expensive. The first year you learn where the bathrooms are. The second year, the system actually becomes legible &#8212; you start to see the joints, the politics, the parts that are real and the parts that are scaffolding. That legibility is when your judgment becomes useful. Leave at month thirteen and you burn the investment just before it pays. There are exceptions &#8212; a company unraveling, a role that was misrepresented, a manager you cannot work for &#8212; but make sure the reason is real, not a restless year mistaken for a broken job.</p><p>Stay long enough to be useful in a way only someone who has been there can be.</p><h2>Pick the right thing</h2><p>Most engineers I have watched fail were not unskilled. They were busy doing impressive work on the wrong thing.</p><p>Ask yourself, repeatedly: is this where my time should be right now? The question is uncomfortable, because an honest answer often invalidates the last week of effort. Ask it anyway.</p><p>The same restraint applies to the code itself. Solve the problem you have, not the one you might have at a hundred thousand users you will never reach. Over-engineering is not an architectural sin &#8212; it is an attention tax. Every hour spent hardening the wrong code is an hour not spent on what matters now.</p><p>Ship small, then iterate. A series of small changes beats one big release. Smaller diffs are easier to review, faster to revert, and quicker to expose a wrong assumption. The instinct to ship one big polished thing often hides a fear of being seen mid-thought &#8212; but even when the motive is pure, the mechanics still favor small.</p><p>And resist dogma. DDD is not the answer. Neither is TDD. Neither is the framework you read about last week. None of them survives contact with every problem. Each was forged in a specific context and breaks the moment you carry it somewhere else. The job is to pick the right tool for the work in front of you &#8212; language, framework, architecture, design. People who hold one methodology tightly have mistaken their preference for a principle.</p><h2>Build your leverage</h2><p>Invest in AI, then put in the reps. A cheap subscription gets you curiosity, not fluency. The judgment for when a model will handle a task cleanly, when it will bluff, when to push back, when to switch &#8212; that comes from volume. Pay for the better plan and use it until the tools stop feeling mystical. Calibrated intuition is the goal &#8212; knowing, without thinking about it, what these tools are good and bad at. You get there only by working at the frontier long enough that it stops feeling like the frontier.</p><p>Then use AI as a tutor, not a vending machine. Before writing code, talk to the model. Ask it to walk you through the codebase in your own language. Have it explain a component the way a patient senior engineer would, calibrated to what you already know. The real unfair advantage is not the code it generates &#8212; it is the synthesis it produces for you specifically. Engineers who use AI well are not the ones who type fastest. They are the ones who understand the systems they touch.</p><p>Then sharpen your environment. Engineers waste astonishing amounts of time fighting their tools &#8212; switching windows, hunting for the same command, breaking flow to look up syntax. Spend a couple of days getting your editor, shell, and daily workflow exactly how you want them. The setup feels indulgent. It pays back every working day for years.</p><h2>The shortcut is the trap</h2><p>None of this is clever. None of it is a hack. That is the point. The shortcuts do not compound &#8212; the unglamorous habits do.</p><p>Younger me would have skimmed this looking for the trick. There isn&#8217;t one.</p>]]></content:encoded></item><item><title><![CDATA[The Cold Reviewer]]></title><description><![CDATA[Pushback only works from outside the thread]]></description><link>https://workingfrontier.nicolaeandrei.com/p/the-cold-reviewer</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/the-cold-reviewer</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 26 May 2026 06:01:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The model agreed. That should worry you.</p><p>You spend an hour on a design with an agent. By turn fifteen it has internalized your framing and now defends, as obvious, the choices you made in turn three. When you ask that same thread to review the work, it reviews against assumptions it shares with you. The pushback you get is the pushback of a colleague who was in the room the whole time.</p><p>That is not pushback. It is agreement with extra steps.</p><h2>A reviewer that shares your context shares your bias</h2><p>Most people respond by pressing harder. *Tell me what&#8217;s wrong. What are my blind spots? Push back.* The agent obliges, produces three plausible critiques, and the conversation moves on. The critiques sound right because they are bound by the same frame as the work.</p><p>You cannot ask the thread that built the thing to challenge the thing. The framing is in the air.</p><p>The fix is structural. No prompt rewording will save you. Launch a subagent &#8212; a fresh context with no history of how you got here. Hand it the artifact and ask what breaks. It reads cold. It has no turn-three commitment to defend.</p><p>Three things bias a reviewer: the conversation it just had with you, its training, and the artifact itself. The fresh subagent fixes the first. It still flatters by default. It still inherits whatever frame the artifact encodes. This technique moves the bias you can move; the rest stays.</p><h2>Hand over the artifact, not your interpretation</h2><p>This is where the technique fails for most people.</p><p>The temptation is to brief the subagent the way you would brief a teammate. *We decided to do X because of Y, and I&#8217;m worried about Z &#8212; what do you think?* Now the subagent inherits your frame. The clean context is wasted: the prompt itself is the laundered version of your bias. You have built a second room and walked in carrying the same furniture.</p><p>The discipline is to hand over the artifact and let it speak for itself. *Read this design. What breaks?* Strip the summary, the decision history, the preview of what you fear. Let the subagent encounter the work the way a reviewer would on a Friday afternoon &#8212; with only the artifact, and no clue what you wanted to hear.</p><p>What comes back is sometimes useless. Sometimes it surfaces a flaw you would have hunted for two days. The signal is highest when you do not preload the answer.</p><p>Cold reading has its own failure mode. The subagent will sometimes flag an issue you settled in turn five &#8212; a deliberate trade-off it cannot see and reads as a bug. The cost is a minute of triage, not a wrong fix. Brief it only when the artifact cannot stand alone: a design that omits its constraints, code that depends on conventions nowhere written down. Default to letting it read cold and discarding the false positives.</p><h2>Adversarial review, by default</h2><p>After a long thinking session, senior engineers used to pull a teammate over for a sanity check. The cost was a colleague&#8217;s interruption, and that cost made sanity checks rare &#8212; reserved for the work that mattered most.</p><p>That cost is gone. Spawn a fresh agent. Hand it the artifact. Ask what breaks. Do this every time the work matters &#8212; not because the agent is smarter than you, but because it has not spent the last hour nodding along.</p><p>A reviewer in your thread shares the blind spots you built together. A reviewer in another context does not.</p>]]></content:encoded></item><item><title><![CDATA[Small Threads]]></title><description><![CDATA[The million-token window did not change the discipline]]></description><link>https://workingfrontier.nicolaeandrei.com/p/small-threads</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/small-threads</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 19 May 2026 06:01:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Early on, the case for short sessions was practical. Context windows were small. Long sessions overflowed and broke. Splitting work was not a discipline &#8212; it was a constraint.</p><p>Then the windows grew. A million tokens now. More soon. The constraint is gone, and the temptation is to fill the space: pour the whole task in, every file, every ticket, every stray thought, and let the agent sort it out.</p><p>The discipline should stay.</p><p>A long session degrades in ways more context cannot repair. The agent picks up false constraints from earlier turns. It rationalizes around a mistake it made in turn three. Decisions made for one slice bleed into slices where they do not belong. You can watch it happen: the agent quotes its own earlier wrong assumption back at you in turn nine as if it were a requirement. The transcript becomes a thicket the agent must carry but cannot prune.</p><h2>A session is a function</h2><p>A session has inputs, a scope, and an output. The discipline you apply to a function &#8212; one responsibility, narrow interface, verifiable result &#8212; applies here. Cram three responsibilities into one session and you get what a thousand-line function gives you: hidden coupling, surprising side effects, untestable behavior.</p><p>Engineers do not write the whole system in one function. They do not write the whole feature in one PR when they can avoid it. The same instinct should govern how you scope an agent&#8217;s work. If two parts can be done independently, give them to two agents. If one part must finish before the other can start, that is a sequence, not a single session.</p><h2>The groundwork is its own thread</h2><p>Some work has true shared groundwork. A schema migration before two features that both depend on the new fields. A refactor that lifts a common abstraction before three callers can use it. Do that work first, in its own session, and capture the result &#8212; a merged PR, a written summary, a pinned doc.</p><p>The dependent work runs in fresh threads with the groundwork as input, not as conversation history. The context they need is the artifact, not the journey that produced it.</p><h2>Chunk like an engineer</h2><p>A senior engineer, given a feature, breaks it into pieces a junior could pick up cold. Each piece has a clear input, a clear output, and a way to verify it. The pieces fit together &#8212; in parallel, in sequence, or both &#8212; but each one stands on its own.</p><p>Apply that to agents. Before starting a session, ask: what is the smallest piece of work that produces something verifiable? That piece is the session. The next piece is another session, started fresh, with whatever the first produced as input.</p><p>This sounds like overhead. The overhead is the long session you debug, restart, and re-explain because something went sideways in turn ten. Small threads fail small. Big threads fail expensively.</p><p>A bigger window is a capacity, not an instruction. Use it when more is unavoidable. More is not better.</p><p>Scope the thread. Capture the output. Start fresh.</p>]]></content:encoded></item><item><title><![CDATA[Build Three, Keep One]]></title><description><![CDATA[Exploration got fast. Deliberation didn't.]]></description><link>https://workingfrontier.nicolaeandrei.com/p/build-three-keep-one</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/build-three-keep-one</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 12 May 2026 06:01:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The worst outcome in software is finishing the wrong thing. You spend three months on a feature, ship it, and watch users ignore it. The code works. It solves the wrong problem.</p><p>Teams built entire disciplines around that fear. PRDs, design reviews, estimation, planning poker &#8212; every ceremony existed to prevent expensive work from producing something nobody wanted. The instinct was rational. But the economics that justified it have changed. An agent builds in an afternoon what once took a week. Specification and decision-making now take longer than the code they describe.</p><p>Once building gets fast, the old instincts misfire. You debate an approach for two days that a prototype would answer in two hours. You write a PRD for a feature that a working demo would describe more precisely.</p><h2>Try it</h2><p>The fastest way to be right about something is to try it.</p><p>That sentence used to be an aspiration. Building an MVP still took a team, a quarter, a roadmap. The unit of experimentation was a whole product.</p><p>Now the unit of experimentation is an afternoon. You build two versions of a UI and look at them side by side. You spike an architecture, see that it breaks, and spike a different one &#8212; all in the time a meeting would take to schedule.</p><h2>The discipline</h2><p>When the direction is unclear, build more than one answer.</p><p>Not as a thought experiment. Not as a whiteboard diagram. Working code, end to end, that you can run and touch. Two or three versions. Then pick.</p><p>This sounds wasteful. The old instinct treats all code as expensive: three implementations means three times the cost. That was true. Two of the three will be thrown away &#8212; that is the point. They were fast to make, and their job was never to ship. Their job was to teach you which one should.</p><p>The real cost is not building three things. It is evaluating them. You have to understand each version well enough to compare. That takes attention. But attention spent comparing working code is more productive than attention spent debating abstractions. One produces evidence; the other produces consensus.</p><p>Most teams spec one approach, commit to it, and rationalize afterward. The waste is invisible because nothing was thrown away. The team built the wrong thing once instead of the right thing after two quick tries. The first outcome feels cheaper. It almost never is.</p><h2>Wear the experimentation hat</h2><p>Exploration is a posture, not a phase.</p><p>The team that gets this right does not run a &#8220;discovery sprint&#8221; followed by a &#8220;build sprint.&#8221; They wear the experimentation hat by default. Every ambiguous decision becomes a prototype instead of a debate. Every strong opinion gets verified fast, so opinions carry less weight than evidence.</p><p>That posture changes what seniority looks like. The old senior engineer predicted which approach would work. Their value was taste built over years of expensive mistakes. That taste still matters &#8212; a prototype answers whether something works mechanically, not whether it belongs architecturally.</p><p>But the leverage has shifted. The senior engineer who leans in builds three, tests three, and keeps the one that works. The one who resists still gets the right answer sometimes. The difference is that the prototyping team learns something new every afternoon, while the tasteful team mostly confirms what it already believed.</p><h2>Keep the lesson, not the code</h2><p>The hardest part is throwing work away.</p><p>When code was expensive, working code was too valuable to discard. You shipped what you built because building it had cost too much to abandon. That sunk-cost reflex persists. Teams protect prototypes &#8212; extend them, ship them, pretend they were meant to be real all along. The prototype becomes the product. Exploration collapses back into commitment.</p><p>Resist that. A prototype is a question you ask the codebase. Once you have the answer, the prototype has done its job. Keep the lesson, not the code. The real implementation should be written cleanly, with the benefit of everything the prototypes taught you. Sometimes the best prototype is already clean enough to ship. Often it is not. Either way, the decision is easier when you have seen the alternatives.</p><p>The scarce resource is no longer code. It is clarity about what to build. Prototypes buy clarity. The clarity is what you keep.</p><h2>The new default</h2><p>For years, the best advice was plan carefully before you build. The cost of the wrong answer was too high to skip the planning.</p><p>That advice no longer holds.</p><p>Plan less. Build more. When two approaches seem plausible, try both. When a feature feels uncertain, prove it with a working thing you can touch.</p><p>Build three. Keep one. Throw the rest away.</p>]]></content:encoded></item><item><title><![CDATA[Kill the Pull Request]]></title><description><![CDATA[Code review was designed for human-speed development]]></description><link>https://workingfrontier.nicolaeandrei.com/p/kill-the-pull-request</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/kill-the-pull-request</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 05 May 2026 06:01:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Code review made sense when humans wrote the code. A developer spent hours on a change, opened a pull request, and a colleague read it. The reviewer caught mistakes, shared context, and guarded quality. Waiting a few hours for feedback cost little compared to writing the code itself.</p><p>That ratio has flipped.</p><h2>The bottleneck moved</h2><p>An agent produces a candidate change in minutes. The review queue becomes the constraint. A single developer with agents can generate changes faster than reviewers can meaningfully evaluate them. The backlog grows. Context-switching multiplies. Reviewers skim because the volume makes careful reading impossible.</p><p>This is a structural problem, not a discipline one. Code review assumed writing was slow and reviewing was fast relative to writing. The opposite is now true. Writing is nearly instant; reviewing still takes the same human time it always did.</p><p>You cannot hire enough reviewers to match the pace of agents. You cannot speed up reviewers without gutting the review. Throttling agents uniformly defeats the purpose &#8212; but not all changes carry equal risk.</p><h2>What review actually provided</h2><p>Before killing the pull request, understand what it gave us.</p><p>Code review served four purposes: catching bugs, enforcing style, sharing knowledge, and maintaining quality standards.</p><p><strong>Catching bugs</strong> is better handled by tests &#8212; but only if the tests themselves are trustworthy. A test suite that runs on every commit is more thorough, more consistent, and faster than a human scanning a diff. The risk is that when the agent writes both tests and implementation, it can share the same blind spots across both. TDD ordering &#8212; test first, watch it fail, then implement &#8212; ensures the test is not vacuous, but it does not ensure the test matches the actual requirement. That gap is why the specification matters more than the diff. A human-written spec, precise enough that test correctness becomes verifiable, is where human attention belongs. The agent proves correctness; the harness verifies the proof is honest &#8212; through coverage thresholds, mutation testing, or property-based tests that generate cases the agent did not anticipate.</p><p><strong>Enforcing style</strong> is partly automated. Linters, formatters, and static analysis handle syntactic style without a human. Architectural consistency &#8212; API design, abstraction choices, pattern adherence &#8212; is harder to automate and falls under quality standards.</p><p><strong>Sharing knowledge</strong> is the most underrated purpose. Pull requests are often the only way teams stay aware of what is changing. Kill the PR without replacing that signal and developers lose track of the codebase fast &#8212; especially when agents churn out changes at volume. The replacement must be concrete: an auto-generated changelog from merged commits, a notable-changes channel, a daily digest. Something that arrives without anyone asking for it.</p><p>That leaves <strong>quality standards</strong> &#8212; the judgment calls. Does this change belong? Is this the right approach? Does it add complexity we will regret? Human attention still matters here. But it need not happen on every change.</p><h2>Auto-merge by default, human review by exception</h2><p>The alternative is simple. A change that passes the test suite, type checks, linting, and whatever other automated verification you trust merges and deploys. No pull request. No reviewer. No queue.</p><p>A change that touches a critical path &#8212; payment processing, authentication, database migrations, core abstractions &#8212; pulls in a human automatically. Not because a colleague chose to request a review, but because the system knows which paths demand human judgment.</p><p>Defining the critical path requires mapping not just files but dependency chains &#8212; a change to a shared utility can affect payments indirectly. The boundary will be wrong sometimes, and changes that should have been reviewed will slip through. That is acceptable if the layered defenses catch the failures quickly. The question is not whether the boundary is perfect but whether the cost of occasional misclassification is lower than the cost of reviewing everything. For most teams, it is.</p><p>We already trust automation in constrained domains &#8212; CI pipelines, auto-scaling, infrastructure provisioning. Those systems operate within pre-reviewed boundaries. The challenge here is defining boundaries tight enough to make auto-merge safe for novel code changes.</p><h2>The changes you did not tag</h2><p>The honest objection concerns the change nobody expected to be dangerous. A config tweak that breaks production. A refactor that subtly shifts behavior. A dependency update that introduces a vulnerability.</p><p>Pull requests caught some of these &#8212; but inconsistently, and at a cost that scales poorly when change volume increases by an order of magnitude. The question is not whether review has value, but whether that value justifies the bottleneck when alternatives exist.</p><p>The real defense is layered. Automated config validation, canary deployments, real-time alerting, fast rollbacks. No single layer is perfect, but the combination covers the same ground more reliably. A reviewer might notice a dangerous config value once in ten times. A validation rule catches it every time.</p><p>Acknowledge the limits: not everything rolls back cleanly. Database migrations mutate state. External API contracts change. These belong on the critical path list alongside payments and auth &#8212; not because a human will catch every problem, but because an unreviewed mistake here costs too much.</p><h2>The real objection</h2><p>The objection is rarely about bugs. It is about trust.</p><p>The natural counter: fix review culture, do not kill the mechanism. Make reviews more thorough. But more thorough at what? If the answer is catching bugs and enforcing style, that is work machines already do better. The reviews worth keeping are the ones no machine can replace: is this the right architecture? Is the product heading in the right direction? Is this system decoupled enough to change later? Those deserve human attention. The rest does not.</p><p>Humans should plan. They should decide what the product becomes, why a user would care, and what problem the team solves next. They should write the specifications that make agent output verifiable &#8212; not scan diffs for off-by-one errors. The pull request, as it exists today, puts the reviewer in the wrong seat: reading implementation details instead of shaping direction.</p><p>Some teams face compliance constraints. Regulated industries require documented approval as an audit trail. That is a real constraint, not theater. For those teams, the immediate goal is narrowing approval to what regulators actually require. Longer term, automated verification with immutable audit logs may satisfy the same regulatory intent. The pull request is not the only way to prove a change was vetted.</p><p>For everyone else, trust should come from the system, not from a ritual. Build the system that deserves trust, and the ritual becomes unnecessary.</p><h2>Better, not just faster</h2><p>Speed is the obvious gain. Code in minutes instead of days. Iterations in seconds instead of hours. But speed alone is the wrong goal. Faster bad code is still bad code.</p><p>The pull request never made code better. It made code reviewed. A tired reviewer approving a mediocre diff did not raise the bar &#8212; it just added a stamp. The real opportunity is not to ship the same quality faster. It is to ship higher quality at the same pace agents already move.</p><p>That means investing the time the review queue used to consume. Write better specifications so agents produce cleaner first drafts. Build stricter test harnesses so regressions die before they merge. Define the standards &#8212; naming, structure, error handling &#8212; so agents follow them by default. The discipline shifts from inspecting output to defining what good looks like.</p><p>Kill the pull request. Invest in verification. Tag the critical paths. Let the agents ship better code, and let the humans shape what gets built.</p>]]></content:encoded></item><item><title><![CDATA[Cognitive Debt]]></title><description><![CDATA[Your agents build faster than you can understand]]></description><link>https://workingfrontier.nicolaeandrei.com/p/cognitive-debt</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/cognitive-debt</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 28 Apr 2026 06:01:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before agents, a natural governor kept software complexity in check. Humans wrote the code. Humans are slow. Because we were slow, we kept up with what we built. Building speed and comprehension speed moved roughly together.</p><p>That coupling is gone.</p><p>An agent produces in minutes what a developer once needed days to write. It writes correct code, passes the tests, ships. But the developer who triggered the task understands that code no better than if a stranger had written it. The codebase grew. The team&#8217;s comprehension did not.</p><p>This is cognitive debt. Not technical debt&#8212;the code works. Cognitive debt is the widening gap between what your system does and what your team understands about how it does it.</p><h2>Technical debt has a sibling</h2><p>Technical debt is familiar. You cut a corner, you know you cut it, you plan to fix it later. The debt is visible.</p><p>Cognitive debt is invisible. The code works. Nothing looks wrong. But nobody on the team can explain why the service handles retries the way it does, or what happens when the cache layer fails. The system is correct and opaque.</p><p>Cognitive debt is worse than technical debt: you discover it only when you need the understanding you lack. A production incident hits a module an agent built three months ago. Nobody remembers the design&#8212;it passed CI and shipped. Now you are debugging a system you do not understand, under pressure, with no working picture to guide you.</p><p>Technical debt slows you down. Cognitive debt leaves you lost.</p><h2>The compounding problem</h2><p>Cognitive debt compounds faster than technical debt.</p><p>Every module an agent builds that you do not understand makes your next decision worse. Not because the code is bad&#8212;because your mental model of the system is incomplete. You approve an architectural choice that conflicts with a service an agent built last month. You make integration decisions from an understanding that is months out of date.</p><p>When humans built every module, they understood it. The team&#8217;s collective understanding stayed current with the codebase. At agent speed, the codebase outruns your mental model within weeks. Catching up means reading code you did not write, for a design you did not choose. Most teams will not do it. The debt accumulates in silence.</p><h2>Existing tools fall short</h2><p>Code review shares knowledge, but it was built for human-speed delivery. When agents produce changes faster than reviewers can read them, review becomes a bottleneck or a rubber stamp.</p><p>Documentation is always stale. It is worse now because the rate of change has accelerated while documentation habits stayed frozen.</p><p>Tests prove behavior, not understanding. A passing test suite does not tell you why the system works, how the pieces connect, or what assumptions underpin the design. Full test coverage and full cognitive debt coexist easily.</p><p>AI-generated summaries help, but only at the surface. An agent can walk you through a module, answer questions, diagram dependencies. That aids comprehension. It does not aid navigation. When a production incident strikes, the problem is rarely &#8220;I cannot understand this code.&#8221; The problem is &#8220;I do not know which of forty services to look at first.&#8221; You can ask an agent to explain a module, but you cannot ask it which of three modules you have never read is causing your outage. The navigational problem precedes the explanatory one.</p><p>If full comprehension is impossible, the question becomes: what replaces it?</p><h2>Nobody knows every street</h2><p>You will not solve cognitive debt by understanding everything. Not at agent speed. The volume is too high, the rate of change too fast.</p><p>Think about a city. Nobody knows every street&#8212;not the mayor, not the taxi drivers, not the lifelong residents. But cities work. People navigate them every day, not because they memorized the layout, but because cities are navigable. Street signs, grid systems, neighborhoods with distinct character.</p><p>Your codebase must become a city, not a maze. A maze is complex and unsigned&#8212;you navigate it only by memorizing the path. A city is complex and signed everywhere. The complexity remains, but it is legible. Most codebases built at agent speed will become mazes by default, because agents build for correctness, not human navigation.</p><h2>Making the codebase navigable</h2><p>Forget full comprehension. Navigability becomes the goal&#8212;making a system findable and orientable for a human who did not build it and lacks time to read all of it.</p><p><strong>Explicit architecture.</strong> A top-level document describes the major components, their responsibilities, and how they connect. Each component carries its own doc: the data model it owns, the contracts it exposes, the assumptions it depends on. When an agent adds a service, someone updates both layers. Without them, every exploration starts from scratch.</p><p><strong>Consistent patterns and clear boundaries.</strong> When every service handles errors, retries, and logging the same way, a developer who understands one service can reason about all of them. Pair that consistency with explicit module interfaces, and you get a neighborhood structure: you need not know the whole city, just the block you are working on and where it connects. Agents follow patterns well when told to. Enforce them.</p><p><strong>Decision records.</strong> When someone makes a significant design choice&#8212;human or agent&#8212;write down what was chosen and why. Not a novel. A paragraph. Not in a wiki. Not in a Slack thread. In the repo, next to the code it affects. This history lets future developers understand intent, not just implementation.</p><p><strong>Orientation before implementation.</strong> Before building, read. Before an agent writes a feature, trace the relevant paths in the system. Not to review the code&#8212;to build the understanding that makes future decisions sound. Five minutes of orientation saves hours of debugging.</p><h2>The discipline is new</h2><p>Technical debt has decades of shared language&#8212;linters, refactoring sprints, architecture reviews. Cognitive debt has none. Most teams feel the symptoms&#8212;slow debugging, bad decisions, fragile changes&#8212;but never name the cause.</p><p>Name it. Then build a city, not a maze.</p>]]></content:encoded></item><item><title><![CDATA[Technical Debt Is Dead]]></title><description><![CDATA[The shortcut costs the same as the right solution now]]></description><link>https://workingfrontier.nicolaeandrei.com/p/technical-debt-is-dead</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/technical-debt-is-dead</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 21 Apr 2026 06:02:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Technical debt was always an economic problem. Not a laziness problem, not a skill problem &#8212; an economics problem. Good code took time. The right abstraction, the clean interface, the proper error handling &#8212; all of it cost hours or days of a developer&#8217;s attention. And developer attention was the scarcest resource on any team.</p><p>So teams cut corners. They shipped the quick version because the right version cost too much. They hardcoded the config, skipped the validation, duplicated the logic, and wrote a TODO. Not because they wanted to &#8212; because the alternative was missing the deadline. The tradeoff was real: ship now with debt, or ship later with quality. Every team chose ship. This applied most directly to implementation-level debt &#8212; code quality, test coverage, validation, naming &#8212; the kind that accumulates line by line under deadline pressure.</p><p>That tradeoff no longer exists.</p><h2>The price of good code collapsed</h2><p>An AI agent writes the clean version as fast as the messy version. The proper abstraction costs the same as the shortcut &#8212; minutes, not days. Ask an agent for a well-structured service with proper error handling, clear interfaces, and comprehensive tests. It delivers. Ask it for a quick hack. It delivers that too, just as fast. Agents still need review and still make mistakes &#8212; but a clean first draft that needs minor correction is a different problem than a mess that needs a rewrite.</p><p>When the clean solution and the shortcut cost the same, choosing the shortcut is not pragmatism. It is habit.</p><p>Teams still carry the old instinct. Decades of expensive code trained the instinct to cut corners. Developers learned that &#8220;do it right&#8221; meant &#8220;do it slow.&#8221; That association runs deep. But the underlying economics have changed, and the instinct has not caught up.</p><h2>The old excuses</h2><p>Every excuse for technical debt traces back to the same root: we did not have time.</p><p>We did not have time to write the abstraction. We did not have time to add the validation. We did not have time to write the tests. We shipped what worked and promised to come back later. We rarely came back.</p><p>Time was the constraint, and cutting corners was the release valve. Remove the constraint and the valve is useless.</p><p>An agent does not get tired on a Friday afternoon and skip the edge cases. It does not get bored writing the third integration test. It does not decide that &#8220;good enough&#8221; is good enough because the sprint ends tomorrow. It applies the same rigor to the last test as the first. The discipline failures that created technical debt were human failures rooted in fatigue, boredom, and deadlines. The constraints are gone.</p><h2>What remains</h2><p>Not all technical debt came from time pressure. Some came from ignorance &#8212; the team did not know the right approach when they built it. Some came from changing requirements &#8212; the right approach last year is the wrong approach now. Some came from dependencies &#8212; a library forced an awkward integration.</p><p>These sources still exist. An agent cannot solve a problem the team has not yet understood. It cannot predict next year&#8217;s requirements. It cannot fix a bad dependency API.</p><p>But these are qualitatively different from time-pressure debt and require different solutions &#8212; better discovery, better planning, better vendor choices. Most technical debt, in most codebases, is the accumulated residue of a simple calculation: we could not afford the right solution at the time. That calculation no longer holds.</p><h2>The new standard</h2><p>If good code is cheap, the standard changes. Technical debt stops being the cost of shipping and becomes a choice &#8212; a bad one.</p><p>This means teams need to stop treating debt as normal. Stop planning &#8220;tech debt sprints&#8221; to clean up messes that agents could have avoided. Stop accepting PRs that take shortcuts when the shortcut saves no time.</p><p>The conversation shifts from &#8220;when will we pay down the debt?&#8221; to &#8220;why did we take on debt at all?&#8221;</p><h2>The discipline is different now</h2><p>The old discipline was prioritization. Which debt do we pay down first? How much time do we allocate? What is the interest rate on this shortcut?</p><p>The new discipline is specification. Tell the agent what good looks like. Define the patterns. Enforce the standards. Make the right way the default way, and agents will follow it consistently.</p><p>This is easier and cheaper than paying down debt. Instead of writing clean code yourself &#8212; which is what made it expensive &#8212; you describe what clean means and the agent writes it. The cost of quality shifted from execution to definition.</p><p>Teams that still accumulate technical debt in 2026 are not under pressure. They are under-specified. They left &#8220;good&#8221; undefined, so the agents produce whatever works. The debt is not a tradeoff &#8212; it is a configuration error.</p><h2>The irony</h2><p>For decades, the industry built tools, processes, and entire careers around managing technical debt. Linters to catch it. Refactoring tools to fix it. Sprint ceremonies to prioritize it.</p><p>Now the same technology that made code cheap enough to write well makes most of those tools obsolete. You do not need a tech debt sprint if you never take on the debt. You do not need a refactoring tool if the agent writes it clean the first time.</p><p>The irony: AI agents kill technical debt not by paying it down but by making it pointless to accumulate.</p><p>Good code used to be a luxury. Now it is the default &#8212; if you ask for it.</p><p>Ask for it.</p>]]></content:encoded></item><item><title><![CDATA[Not Every Session Needs a Plan]]></title><description><![CDATA[Plan mode is a tool, not a default]]></description><link>https://workingfrontier.nicolaeandrei.com/p/not-every-session-needs-a-plan</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/not-every-session-needs-a-plan</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 14 Apr 2026 06:00:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most coding agents ship with a plan mode. Some have already removed it &#8212; Amp dropped theirs, deciding it added ceremony without adding value. Others still treat it as the recommended starting point: outline the approach, list the files, describe the changes, get approval, then execute. It sounds disciplined. It also sounds familiar. It is waterfall with a different font.</p><p>The instinct comes from a reasonable place. You want the session to go well. You want to avoid wasted work. So you front-load specification. But the problem is the same one that sank waterfall in teams: you cannot specify what you do not yet understand. A plan written before you have touched the code is a guess dressed as a decision. It feels productive. It is often wrong.</p><h2>Planning is not the problem</h2><p>Planning itself is fine. Boris Cherny, one of the creators of Claude Code, starts most sessions in plan mode:</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/bcherny/status/2007179832300581177)&quot;,&quot;full_text&quot;:&quot;I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit.\n\nMy setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to&quot;,&quot;username&quot;:&quot;bcherny&quot;,&quot;name&quot;:&quot;Boris Cherny&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1902044548936953856/J2jeik0t_normal.jpg&quot;,&quot;date&quot;:&quot;2026-01-02T19:58:58.000Z&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:1305,&quot;retweet_count&quot;:7006,&quot;like_count&quot;:54279,&quot;impression_count&quot;:8039420,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:false}" data-component-name="Twitter2ToDOM"></div><p>and iterates on the plan until he likes it. That works because he treats the plan as a conversation, not a contract. His tasks are pull-request-sized features with known scope. The plan aligns direction. Then he moves.</p><p>The problem starts when planning becomes a default for every session. When you reach for plan mode before a bug fix, before a spike, before an exploration &#8212; before you even know what the task demands. That is not discipline. That is comfort. You are writing a PRD because it feels productive, not because the task needs one.</p><h2>Match the workflow to the task</h2><p>Not every task needs the same starting point.</p><p>If you are building a feature with clear scope &#8212; a new API endpoint, a settings page, a migration &#8212; plan mode earns its keep. You know roughly what the result looks like. A brief plan aligns direction before the work begins.</p><p>If you are fixing a bug where the behavior is already clear, a plan is overhead. A failing test is a better starting point. Pin the contract, confirm the failure, then fix it. I wrote about this approach <a href="https://workingfrontier.nicolaeandrei.com/p/make-the-test-fail-first">recently</a>: make the test fail first, then make the narrowest change that turns it green.</p><p>If you are exploring &#8212; spiking an idea, trying a library, feeling out an architecture &#8212; you do not know enough to plan. Start coding. Let the shape emerge. Planning an exploration is a contradiction.</p><h2>The session is a conversation</h2><p>The deeper issue is how you think about the session itself.</p><p>Plan mode tempts you into a handoff mentality: specify, then execute. But the best sessions are not handoffs. They are conversations. You start with a direction, the model makes something, you react, it adjusts, you push further. The work improves because both sides contribute judgment along the way.</p><p>That is the agile insight applied to a different scale. You do not need a two-week sprint to benefit from iteration. A thirty-minute session works the same way. Start with enough direction to move. Adjust as you learn. Trust that course correction is cheaper than perfect specification.</p><h2>Pick the right tool</h2><p>Plan mode is a tool. Red-green TDD is a tool. Freeform exploration is a tool. The mistake is picking one and applying it everywhere.</p><p>Plan when the scope is known. Test-first when the contract is clear. Explore when it is not. And in every case, stay in the loop. The session is not a spec you hand off. It is a conversation you steer.</p>]]></content:encoded></item><item><title><![CDATA[Software Feels Free Again]]></title><description><![CDATA[How collapsing build costs are killing the subscription trap]]></description><link>https://workingfrontier.nicolaeandrei.com/p/software-feels-free-again</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/software-feels-free-again</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 07 Apr 2026 06:02:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I installed a popular read-later app last month. It worked well. I saved articles, organized links, built a small library. Nothing told me I was on a trial.</p><p>Then the trial ended. The app locked me out. Not &#8220;you can still read what you saved, but you need to upgrade to add more.&#8221; Locked out entirely. My data, behind a paywall, with no warning that the clock had been running.</p><p>I did not pay. I opened my editor and started building a replacement.</p><h2>The reflex that changed</h2><p>Two years ago, that impulse would have been a fantasy. Building a read-later app from scratch meant weeks of work: storage, syncing, parsing, a decent interface. The rational move was to pay, grumble, and move on.</p><p>Now the rational move is different. With an AI agent and a weekend, I had a working app. It did exactly what I needed, nothing more. No pricing tiers, no dark patterns, no trial clock ticking in the background. Mine.</p><p>That is the shift. The cost of building a simple tool for yourself has collapsed. What used to take weeks takes hours. What used to require a team requires one person with a clear idea and good tools.</p><h2>The subscription trap loses its leverage</h2><p>Subscription software depends on a gap between what users want and what they can build. The wider the gap, the more leverage the vendor holds. You pay monthly because the alternative&#8212;building it yourself&#8212;is too expensive, too slow, or too hard.</p><p>That gap is closing fast.</p><p>When building a basic replacement takes a day, the threat changes direction. The vendor no longer holds your data hostage. You hold the option to leave. Not in theory&#8212;users could always leave in theory. In practice, with current tools, they actually can.</p><p>Some subscriptions will survive. Some products carry genuine ongoing costs: server infrastructure, real-time data, large-scale collaboration. A monthly price makes sense when the service costs money to run every month. But most subscription software is not infrastructure. Most of it is static functionality repackaged as a recurring charge&#8212;a tax on the gap between wanting and building.</p><p>That tax just got much harder to collect.</p><h2>Software feels free again</h2><p>For a long time, software felt like something imposed on you.</p><p>You picked from the options available, accepted the pricing, tolerated the dark patterns, and worked around the limitations. If the vendor changed the terms, you adapted or left. That was the relationship: you rented, they decided.</p><p>Once you know you can build, that posture changes.</p><p>You open a new app and notice the pricing page before the features. You calculate how long it would take to build the parts you actually need. The question shifts from &#8220;can I afford this?&#8221; to &#8220;does this earn my use over what I would build myself?&#8221; That makes you a peer, not a captive.</p><p>Software stops feeling like renting and starts feeling like making. You run into a problem and reach for your editor before you reach for the App Store. If someone already built it well, you pay for the craft. If not, you build your own. Either way, the choice is yours.</p><p>Software feels free again. Not free as in price. Free as in agency. Free as in unlocked.</p><h2>What is left to sell</h2><p>If anyone can build a basic version, what justifies charging others?</p><p>The answer is polish. When you build for yourself, you solve your own problem. You skip the edge cases that do not affect you, ignore the platforms you do not use, and tolerate rough spots you understand. That is fine for a personal tool. It is not enough for a product.</p><p>A product means someone else solved the edge cases, tested on devices you do not own, and made it reliable for people whose workflows differ from yours. That work has real value. It deserves to be paid for.</p><p>But it deserves to be paid for once. Polish has ongoing costs&#8212;platform updates, security patches, new devices. But the same shift that collapsed build costs collapsed maintenance costs too. A one-time payment&#8212;the price of a couple of coffees&#8212;is a fair exchange for someone else&#8217;s craft. It says: I could build this myself, but you already did, and you did it well. That is worth a few dollars.</p><p>A monthly subscription that holds your data hostage says something different. It says: you need me, and I intend to keep collecting.</p><h2>The new deal</h2><p>The old deal was simple: you pay monthly because you cannot build it yourself. The new deal is simpler: you pay once for craft, or you build your own.</p><p>One-time payments for polished tools. Subscriptions only where there is genuine ongoing cost. Everything else, build it yourself. Not every case is clear-cut. But when you can build the alternative, the vendor has to earn each month. The tools are good enough. The time is short enough. The ceiling on what one person can attempt has moved.</p><p>That is where software is heading. Not back to some nostalgic era of shareware and hobbyist code. Forward, to a place where building is cheap, agency is real, and the subscription trap has lost the only leverage it ever had.</p>]]></content:encoded></item><item><title><![CDATA[The Cheap Subscription Is Not Enough]]></title><description><![CDATA[Why real fluency with agents requires volume]]></description><link>https://workingfrontier.nicolaeandrei.com/p/the-cheap-plan-is-not-enough</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/the-cheap-plan-is-not-enough</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 31 Mar 2026 06:01:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I used to think a cheap AI subscription was enough to stay current. Pay for a basic plan, use the models when needed, let work cover the rest, and move on. That is enough to sample the tools. It is not enough to build the tacit knowledge that makes them truly useful.</p><h2>Reps build judgment</h2><p>Heavy AI use gives you something harder to describe than a prompt library: a feel for the system.</p><p>You learn when a model will handle a task cleanly and when it will bluff, how much context it can carry, when a vague instruction is fine, when the task needs tighter scaffolding, when to retry, when to switch models, and when to stop. After enough use, you can often predict the shape of the reply before it arrives.</p><p>That judgment does not come from benchmarks or one impressive demo. It comes from volume. You get it by pushing models into real work, watching them fail, trying again, and seeing enough patterns that they stop feeling mystical. They become legible. That is how trust forms. Not blind trust. Working trust.</p><h2>Volume costs money</h2><p>This is the part I underestimated.</p><p>A cheap plan, even with some work usage on top, is enough to keep you curious. It is not enough to make you fluent.</p><p>Fluency requires more usage than most people admit. You need long sessions, failed attempts, exploration, side projects, repeated prompting, and enough room to compare approaches until you understand why one worked and another did not.</p><p>That is where the more expensive plans stop looking indulgent. They start looking like the price of serious practice.</p><p>Cheap models are useful. Open models matter. But if you want tacit knowledge of what frontier AI can actually do, you have to spend real time at the frontier. Otherwise, you build your intuitions around the limitations of weaker systems and mistake that for realism.</p><h2>A rare moment to lean in</h2><p>This is also an extraordinary moment to be alive and to build.</p><p>I do not mean that in a childish sense. It is simply rare to live through a tooling shift that expands what one motivated person can attempt this much. You can prototype faster, explore more directions, and try projects that would have been too tedious or too lonely to begin a few years ago. The tools are uneven. They still fail in stupid ways. The larger fact remains: the ceiling has moved.</p><p>That is why I think engineers who can afford serious usage should stop treating it as optional software spend. This is not a moral argument, and it is not aimed at people who genuinely cannot afford a higher subscription. It is aimed at people who can afford it and still file it under &#8220;nice to have.&#8221;</p><p>For many engineers, a high-usage frontier subscription is one of the best career investments available right now. Not because paying more is virtuous, but because the return is not only output. It is speed, instinct, ambition, and better judgment.</p>]]></content:encoded></item><item><title><![CDATA[Make the Test Fail First]]></title><description><![CDATA[A practical way to use red-green TDD with AI agents]]></description><link>https://workingfrontier.nicolaeandrei.com/p/make-the-test-fail-first</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/make-the-test-fail-first</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 24 Mar 2026 07:02:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At university, and for years after, TDD felt backward to me.</p><p>I understood the theory. I still preferred to write the code first, handle the edge cases, and add tests around what I had built. When implementation was the slow part, that order felt natural. The hard work was getting the code onto the page.</p><p>Working with agents changed that.</p><p>Now an implementation can appear in seconds. The expensive step is no longer producing code. It is deciding whether the result deserves to stay. That changes the job of the test. I no longer think of test-first as a ritual for disciplined programmers. I think of it as a way to pin the contract before the implementation starts drifting.</p><p>That is why red-green TDD feels useful to me again.</p><h2>Make red specific</h2><p>If you already know the behavior you want, start by making the failure concrete.</p><p>Suppose your billing job should skip paused subscriptions, but a paused account is still getting an invoice. A vague prompt would say, `Fix billing for paused subscriptions.` A better prompt would say, `Write a test that proves a paused subscription is invoiced by the monthly job. Run it. Confirm that it fails for the expected reason. Then make the narrowest correct change that makes it pass.`</p><p>That difference matters. The first prompt asks the agent to guess. The second asks it to establish a contract.</p><p>The test must fail, and it must fail for the right reason. That means the agent has to run it before touching the implementation. If it passes immediately, you have learned something important: the bug report is wrong, the fixture is wrong, or the test never touched the path you care about. If it fails because a seed script broke or a factory cannot build the record, you still do not have the contract. Fix the setup first.</p><p>This part is easy to skip, especially with fast code generation. A model can write a plausible test file and jump straight into the change. That is exactly what you do not want. Until the agent has executed the test and seen the right failure, it is still working from a story, not from evidence.</p><p>Red is not paperwork. Red is the moment when the expected behavior stops being prose and becomes something the codebase can reject.</p><h2>Then make green small</h2><p>Once the failure is real, the implementation gets simpler.</p><p>Now the agent is no longer coding against a paragraph. It is coding against an executable example. That narrows the solution space. It also makes wrong answers easier to spot. A change that sounds plausible but does not satisfy the example is not done.</p><p>This is the part that feels different in the agent era. Before, test-first could feel like extra typing before the real work. Now the implementation is often the easy part. The hard part is avoiding an answer that is locally convincing and globally wrong.</p><p>That shift helps in review too. I can ask two concrete questions instead of one fuzzy one: does the test capture the intended behavior, and does the change satisfy it without breaking adjacent cases? That is a much cheaper judgment than trying to reconstruct intent after the fact.</p><h2>Where I would not force it</h2><p>I would not use this pattern for everything.</p><p>If I am exploring a new API, tuning UI feel, or spiking an architecture change, I may not know the right assertion yet. In that kind of work, writing the test first can become theater. The contract is still moving.</p><p>Red-green pays off when the behavior is clear enough to state precisely: a bug, a business rule, a parser edge case, a regression, a transformation, an authorization boundary. In those cases, the test does not slow the work down. It prevents the work from drifting.</p><p>That is also why I care less about strict dogma than about order. I do not need every change to follow a perfect textbook cycle. I do want the contract to become executable before I trust the result.</p><h2>The prompt can stay short</h2><p>In practice, the instruction can be brief: `Use red-green TDD for this bug.`</p><p>That small prompt carries more structure than it seems to. It means: write the focused test, run it before touching the implementation, check that the failure matches the bug, fix the setup if the failure is wrong, make the narrowest correct change that turns the test green, and rerun the checks.</p><p>Sometimes that test is unit-level. Sometimes it is an integration test, a CLI snapshot, or a browser assertion. The important point is not the layer. The important point is that the acceptance criteria exist before the implementation lands.</p><p>I used to think TDD felt unnatural because it asked me to describe the answer before I had written it. With agents, it feels natural for the opposite reason. I am not using the test to help me type. I am using it to decide whether a fast answer is trustworthy.</p><p>When code arrives cheaply, the scarce resource is confidence. A good failing test is one of the fastest ways to buy it.</p>]]></content:encoded></item><item><title><![CDATA[What an AI Agent Should Find in a New Component]]></title><description><![CDATA[A four-question test for whether the component explains itself]]></description><link>https://workingfrontier.nicolaeandrei.com/p/what-an-ai-agent-should-find-in-a</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/what-an-ai-agent-should-find-in-a</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Tue, 17 Mar 2026 07:01:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you want an AI agent to become useful fast in an unfamiliar component, do not start with a bigger prompt. Start with a component that explains itself.</p><p>When an agent lands in `shipping-quotes`, it should be able to answer four questions quickly:</p><p>- What does this component own?</p><p>- How do I work here?</p><p>- How do I verify a change?</p><p>- Where does the tricky context live?</p><p>If the component cannot answer those questions locally, the agent reads too much, touches too much, and asks too many basic questions. Ownership is fuzzy. The right commands are tribal knowledge. The important context lives in a slide deck, a spreadsheet, or somebody&#8217;s memory. So the agent wanders.</p><p>That is not always a model problem. Often, it is a component problem.</p><p>This is also a different problem from feedback after the edit. Before an agent can verify its own work, it has to orient itself in the component.</p><h2>What does this component own?</h2><p>An agent works best when it can reason locally.</p><p>That means the component should have a clear boundary, a small blast radius, and obvious contracts. In `shipping-quotes`, the agent should be able to tell what this component owns, what it depends on, and what it should not touch. It should know whether this component computes carrier quotes, consumes package and destination data, applies zone rules, or also owns checkout totals. Those are different jobs. A good boundary makes that visible in the interfaces, the file layout, and the local docs.</p><p>This matters more than context-window size. Even with a huge context window, a tangled component is still hard to change. The problem is not only how much code the agent can read. The problem is how much uncertainty the component creates. If every small edit may break three adjacent systems, the cost of acting goes up fast.</p><p>The first test is simple: can the agent make a local change without dragging half the repo into scope?</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://workingfrontier.nicolaeandrei.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Working Frontier! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>How do I work here?</h2><p>Once the boundary is clear, the local path should be obvious.</p><p>If I drop an agent into `shipping-quotes`, I want it to find a short local guide. That might be `AGENTS.md`, `CLAUDE.md`, or a tight `README`. The name matters less than the job. The file should say what this component owns, which invariants matter, which commands to run, where the tests live, and what not to edit casually.</p><p>The component should also expose one obvious way to work. There should be one obvious command to run the local flow, one obvious command to exercise the tests, and one obvious place to look when something fails. If the repo offers four test commands, three half-working scripts, and two undocumented setup paths, the agent spends its first half hour on archaeology.</p><p>Good output belongs here too. When something breaks, the error should narrow the problem. &#8220;Missing carrier rate for zone&#8221; is useful. &#8220;Package dimensions exceed supported service limits&#8221; is useful. A stack trace with no domain signal is not. Legible components produce legible failures.</p><h2>How do I verify a change?</h2><p>Before an agent can fix a component, it needs a fast way to learn what the component does.</p><p>That is why tests matter so much. They do not only verify the change after the edit. They orient the agent before the edit. A strong suite shows what inputs matter, which edge cases the team cares about, and how wide the surface really is.</p><p>In `shipping-quotes`, a small suite might reveal the expected output for a domestic order, what happens to oversized packages, how missing carrier rates fail, and which rounding rules matter. In five minutes, that tells the agent more than a long design doc full of abstractions.</p><p>If a component is hard to understand, the fix is often not more prose. It is better tests. A good test file is both a guardrail and a map.</p><h2>Where does the tricky context live?</h2><p>The hard part of a component is often not the syntax. It is the domain logic.</p><p>That context should live where the agent can read it: in or near the component. Examples, fixtures, schemas, expected outputs, notes on invariants, and short explanations of weird cases all help. They are discoverable, versioned, searchable, and easy for an agent to inspect.</p><p>If the meaning of `shipping-quotes` lives in a slide deck, a spreadsheet, or somebody&#8217;s memory, the agent starts blind. The operator then has to shuttle basic context into the prompt by hand. That is slow, fragile, and easy to get wrong.</p><p>This does not mean every component needs a Markdown essay. It means the important context should exist in agent-readable form and sit beside the work. A canonical fixture is better than a paragraph. A sample output is better than a vague comment. A short glossary beside the code is better than a forgotten deck.</p><h2>The standard to aim for</h2><p>A good operator should be able to drop an agent into `shipping-quotes` without writing a custom tour guide every time.</p><p>The agent should find a clear boundary, one local path, tests that teach the terrain, and context that lives beside the code. If it cannot, a bigger prompt will not solve the underlying problem. The component is still illegible.</p><p>That is the standard I care about. Not whether an agent can eventually muddle through, but whether it can land in an unfamiliar component and become useful fast.</p><p>If you want better work from agents, stop treating the prompt as the only interface. The component is an interface too.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://workingfrontier.nicolaeandrei.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Working Frontier! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Three Feedback Loops of Agentic Software]]></title><description><![CDATA[Why better feedback loops let agents work with more autonomy]]></description><link>https://workingfrontier.nicolaeandrei.com/p/the-three-feedback-loops-of-agentic</link><guid isPermaLink="false">https://workingfrontier.nicolaeandrei.com/p/the-three-feedback-loops-of-agentic</guid><dc:creator><![CDATA[Andrei-Mihai Nicolae]]></dc:creator><pubDate>Fri, 13 Mar 2026 11:34:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ouag!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd27fafa-f800-429d-991e-df6e0f8b5394_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most teams try to improve agents through prompt work. They rewrite instructions, add context, and ask for a better plan. That helps, especially when the instructions are weak. But once the basics are good enough, the main bottleneck is usually the feedback loop.</p><p>If an agent cannot see what happened, check its own work, and recover quickly, you cannot trust it with much autonomy. You end up doing courier work: take the screenshot, paste the log, restate the bug, and ask it to try again. The system looks agentic, but the human is still carrying evidence back and forth.</p><p>I think about this as three nested loops. They are not three competing philosophies. They are three layers of feedback that tighten what the agent can do and narrow what the human must do.</p><p>- Loop 1: the human verifies after the fact</p><p>- Loop 2: the agent verifies its own work</p><p>- Loop 3: the human closes the loop on reality</p><p>Better feedback loops are what let agents work with more autonomy without losing contact with reality.</p><h2>Loop 1: The human verifies after the fact</h2><p>This is where most teams start.</p><p>Imagine a small web demo with an SVG orbit and a moving marker. You ask the agent to adjust the motion, line up the marker with a target, or change the animation. The agent edits the code and says it is done. Then you open the app, inspect the result, take a screenshot, and describe what is still wrong.</p><p>The loop works. It is also slow.</p><p>The human becomes the transport layer between the agent and the finished state. Because the agent cannot inspect the result on its own, it cannot correct itself. Every iteration depends on another round trip through you.</p><h2>Loop 2: The agent verifies its own work</h2><p>The second loop starts when the agent can inspect the system directly.</p><p>Give the same orbit demo browser automation, screenshots, logs, DOM assertions, and a fast test runner. Now the agent can change the code, open the page, capture the result, check whether the marker reached the target, and try again without waiting for a human to relay the evidence.</p><p>That is the key step up from Loop 1. The agent no longer needs the human to act as courier for basic evidence. It can compare the code against the rendered result itself. But the human often still has to do the last piece of work: open the final state, decide whether it really makes sense, and translate that judgment back.</p><p>That change is larger than it sounds. Better tools do not merely make the workflow nicer. They increase how much autonomy you can safely allow.</p><p>The same pattern holds outside toy demos. A broken onboarding form, a flaky CLI flow, or a staging-only UI regression all become easier when the agent can run the app, inspect structured output, and verify the result directly.</p><p>The analogy is simple. A developer can write software in a bare text editor. The same developer will move faster and make fewer mistakes with search, debugging, tests, and fast feedback. Agents are no different. A shell and a pile of text are enough to start. Logs, structured file access, screenshots, and browser tooling are what make self-correction practical.</p><p>That is Loop 2. The agent can inspect its own work, verify the result, and retry without waiting for a human to relay the evidence.</p><p>If you want agents to do more unsupervised work, start here. Improve what they can see and shorten the time it takes them to check themselves. But do not stop there. The next step is to build a shared playground around the app.</p><h2>Loop 3: The human closes the loop on reality</h2><p>Loop 3 starts when you build a shared playground around the app.</p><p>Loop 2 is about self-verification. The agent can run checks, inspect output, and correct itself. Loop 3 adds a shared surface for inspection. The agent and the human can open the same exact state on demand, look at the same evidence, and talk about the same thing.</p><p>That distinction matters. Passing tests is not the same as shipping the right thing. A screenshot can prove that the marker is on the target. It cannot tell you whether the motion feels awkward, whether the interaction matches the real intent, or whether the change solves the user&#8217;s problem.</p><p>Take the orbit demo one step further. Add query parameters that pin the marker to an exact angle, set the target, pause the motion, and render known scenarios on demand. Now the agent can open a precise state by itself. It can run checks and capture screenshots. The human can open the same URL and inspect the same state. The agent closes the verification loop; the human closes the product loop.</p><p>The form of that playground depends on the app. In a web app, it might be query parameters, fixtures, and debug routes. In a CLI app, it might be a scripted tmux session with fixed terminal size, seeded input, known fixtures, and a command that recreates the flow on demand. The tool changes. The principle does not: build a surface where the agent and the human can inspect the same reality.</p><p>That is the job the human should keep. Not micromanaging every step. Not approving every command. The human anchors the work to product intent, taste, and real-world constraints.</p><h2>What to add in practice</h2><p>If you want agents to do more useful work, shape the environment around them. In a real repo, that often means adding a few boring, high-leverage affordances:</p><p>- Build a shared playground for the app. For a web app, that may mean fixtures, query parameters, and debug routes. For a CLI app, it may mean a scripted tmux session with fixed dimensions, seeded input, and known fixtures.</p><p>- Expose important state on purpose. Add seed scripts, stable fixtures, and one-command setup paths that let an agent open a known state without guessing.</p><p>- Make verification cheap. Keep one fast command for focused tests, one reliable way to run the app, and one short path to capture logs or screenshots.</p><p>- Prefer structured tools over scraping. Return JSON from diagnostics, use stable DOM locators, and expose clear file or API interfaces instead of forcing brittle shell parsing.</p><p>None of this is glamorous. It is infrastructure for feedback. But it changes the shape of the work. When the system is legible, the agent can move faster. When checks are cheap, the agent can recover faster. When the human and the agent can inspect the same state, trust gets easier to build.</p><p>A small toy project helps because it makes these loops easy to see. But the lesson is not about toys. In real teams, the ceiling on agent autonomy is usually set less by intelligence than by feedback. If agents can inspect state, run checks, and recover on their own, they stop needing a human to ferry evidence between attempts. If they can step into a shared playground built for the app, the human can stop acting as courier and start acting as judge.</p><p>Prompting still matters. But once the task is understood, the practical questions are simpler: What can the agent see? What can it verify? What shared playground can it open with you? Better answers to those questions are what turn a clever demo into a reliable way of working.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://workingfrontier.nicolaeandrei.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://workingfrontier.nicolaeandrei.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>