Agentic Coding in Practice

May 15, 2026

Tony Karrer

Join us on June 12: Agentic Coding in Practice

Presenters will demo how their teams are actually wiring up agents, skills, rules, hooks, and review loops to make AI coding tools work inside real engineering processes, from spec to PR to QA. This session is designed for senior engineering, product, and QA leaders who want practical, ready-to-apply examples, not theory. Register here

Almost every CTO or VPE we talk to is asking some version of the same question: “Are we picking the right tools, building out the right workflows, and putting our people in the right places?” In other words: how should engineering teams actually operate now, with agents writing most of the code?

Teams are landing in pretty different places. Some are pushing toward full autonomy with minimal human review. Others are keeping tight control, using AI more as a reviewer or assistant. Most are somewhere in the middle.

But which workflow you pick isn’t really the challenge. The inputs are.

The bottleneck has moved. Writing code is getting cheaper, but building reliable systems requires more than just code. Specs, tests, and review are now the limiting factors, and agents amplify whatever you feed them. Strong inputs get strong results. Weak ones fail faster, at scale, and with less visibility into why.

One theme for the second half of 2026: engineering leaders need much better visibility into what other teams are actually doing. Too much of this is still being figured out in private, and some of the public conversation is noise.

What’s actually working

From what we’re seeing across teams and practitioner write-ups, a few patterns are emerging. Not as a single “right way,” but as things that consistently hold up.

Front-load the thinking: spec, tests, then code

The teams getting reliable output are putting more effort into shaping the problem up front (acceptance criteria, edge cases, tests) before letting agents implement. The shift is subtle but important: less time writing code, more time defining what “correct” looks like.

Parallelize aggressively, but with boundaries

Running multiple agents in parallel is becoming common: one exploring, one implementing, one cleaning up. (Stripe built a harness that ships >1,000 agent-driven PRs per week. ) But the teams doing this well isolate each agent in its own worktree, container, or sandbox, so mistakes don’t cascade. Parallelism helps, but only if it’s contained.

Where this breaks

A few failure modes show up just as consistently.

Tests written after the fact

This is amplification working against you. When you ask an agent to write tests after the implementation, the existing code becomes the spec. Agents don’t push back. They complete the task you gave them, even if the task is wrong. The result: tests that lock in whatever’s broken. Discipline matters more with agents, not less: tests first, then implementation.

Context can be a liability

Many teams are leaning on rules files (CLAUDE. md, AGENTS. md) to guide behavior. But they go stale quickly. Codebases evolve, and these files rarely keep up. Without regular review, they can make things worse, with agents following outdated or rigid instructions. Structure is necessary for good results, but bad structure gets amplified.

What actually changes for teams

Coding agents don’t remove the need for senior engineering judgment. They concentrate it. The work shifts toward defining problems clearly, validating outputs, and keeping the whole system reliable. Teams with weak specs, inconsistent tests, or overloaded review processes feel more pain, not less. Teams with strong fundamentals move faster.

The work moves up the stack, and the pressure moves with it. And the systems you build around the agents matter more than the agents themselves.

Reading list

  • Embracing the parallel coding agent lifestyle – by Simon Willison. Concrete patterns for running several coding agents at once. Covers worktrees and Docker for isolation, what kinds of work to delegate to parallel sessions, and what to supervise more tightly.
  • How Stripe built “minions” – by Steve Kaliski via Lenny’s Newsletter. Inside Stripe’s production agent harness shipping ~1,300 PRs/week from Slack reactions. Covers the harness layers, how they handle code review at that scale, and what they had to build vs. adopt off-the-shelf.
  • Why Testing After with AI Is Even Worse – by Matti Bar-Zeev. Why asking an agent to write tests after the implementation produces tests that validate existing bugs instead of catching them. Makes the case for TDD with agents, not against it.
  • How System Prompts Define Agent Behavior – by Srihari Sriraman and Drew Breunig (nilenso). A close read of system prompts across Claude Code, Cursor, Codex, Gemini, and others. Shows the same model producing dramatically different workflows depending on the prompt wrapped around it.
  • Your CLAUDE. md Is Making Your Agent Dumber – by Cordero Core. Recent research finding that LLM-generated CLAUDE. md / AGENTS. md files actively decrease agent success rates compared to having no context file at all. Practical guidance on what to do about it.
  • Ralph Wiggum as a “Software Engineer” – by Geoffrey Huntley. Walks through “Ralph,” a bash-loop technique for autonomous coding agents. Concrete on what kinds of projects it suits and where senior engineering judgment stays non-negotiable.
 

Product meets Engineering in the AI Era

March 13, 2026

Tony Karrer

Join us on April 10: Product and Engineering Working Together in the Agentic Coding Era

We’ve assembled four product and engineering leaders to share exactly how they’ve retooled their processes. This virtual mini-conference is designed for CPOs, VPs of Product, CTOs, and Heads of Engineering who want practical, ready-to-apply examples — not theory. Register here

CPOs, VPs of Product, and CTOs are experiencing a common challenge: while agentic coding tools accelerate product development, they also introduce new friction between product and engineering. A product manager (PM) creates a spec that tells engineering what they want built, and then one of two things happens:

  • The engineer appropriately asks the agentic coding tool what questions it has. The agent immediately surfaces 15 questions, 12 of which need input from product. You have a cycle time hit and more context switching.
  • The engineer doesn’t surface the questions and builds it anyway. After PR reviews and QA, they realize the implementation does the wrong thing.

One theme for the first half of 2026: product and engineering leaders need to reduce this new friction.

What changed

A PM’s spec has two audiences.

First, people:

  • Reviewers (customers, leadership, other PMs) who need to confirm the product intent.
  • Engineers who need to reason about tradeoffs, durability, and how it fits the architecture.

Second, agents:

  • The agentic coding tool that will try to execute what you wrote, literally, at speed.

So what do we do?

PMs should use codebase-aware tools before handoff

I would highly recommend that product leaders and product managers try out the new Claude Desktop app, which bundles Claude, Claude Cowork, and Claude Code into a more PM-friendly interface. You can use it for a LOT more product needs than creating specs – see the additional reading below.

To get your PMs onboard, consider using the tool to ask:

“What does the product do today in scenario X?”

If you have Claude Desktop connected to your code, it often can answer those types of questions. It also will provide you the answer to:

“Given this draft spec, what questions do we need to answer before someone starts work?”

This helps PMs clarify ambiguity so you avoid the new friction points.

It’s time to change the default from “PMs don’t have visibility into the repo.” That policy actively works against speed and alignment. By giving the AI tooling access to the code base, PMs are empowered with insight while maintaining the separation of responsibilities with engineering.

Side note: Markdown is quickly becoming the shared format for specs because it’s easy to diff, easy to reuse, and plays nicely with repos and agent workflows. Pick a Markdown editor you like (Obsidian is a good choice) and make it part of the standard toolkit.

PRDs and Tickets => Specs

You may want to start calling PRDs / Tickets or other definitions of what’s to be built “specs” internally, not because PRD is wrong, but because it communicates a shift: the output is meant to be fed into an agentic coding tool w/ more specifics.

The upcoming virtual mini-conference and the additional reading has lots of help on this front, for example – acceptance criteria and edge cases are critical.

AI supports PMs but does not replace their judgment; it should enhance decision-making efficiency. Use AI to accelerate drafting, decomposition, and edge case discovery. But the final tradeoffs, priorities, and product decisions still belong to the PM. And us engineers still get to rely on PM judgment to know what to build.

Engineering still has to engineer

A clear spec does not eliminate engineering responsibilities. Strong teams do two things consistently:

  1. Architecture and technical planning: fit the spec into the system in a durable way (constraints, data flows, integration points, performance, security).
  2. Task shaping: break the spec into finer-grained development tasks that are independently testable, so agentic execution stays controlled and reviewable.

A good spec allows the engineers to focus on the work that actually requires engineering judgment.

Reading list