If you’re an engineering or product leader, you’re probably already getting the question: “Are AI tools getting us the 30% productivity boost that is happening in other organizations?”

You likely don’t have a good, honest answer to that question. And to get there you need a bit of patience and to face an age old problem for software engineering – how do we measure it?

One caution at the start – let adoption mature. In almost every rollout I’ve seen, the first 3-6 months are a time of rapid improvement:

  • Engineers are learning how best to use the tools, including where they help, how to prompt, and how to sanity-check outputs.
  • Teams are still evolving rules and example prompts, and figuring out what approach to use in different scenarios.
  • Tooling, tests, and repo structures are still tuned for human-only workflows.

AI Tool adoption is the biggest knowledge and skills change for engineers and engineering teams ever in any of our careers. Competence takes time. Early on, your measure should focus on adoption and use to enable coaching, not trying to push too hard on other measures. But that doesn’t get you off the hook from figuring out how to answer the measurement question. Side note: if you haven’t yet incorporated AI coding tools into your SDLC, check out our recent blog post 2-week spike to ramp up on AI Coding Tools.

Want to learn more? We’re hosting a special two-hour deep dive for engineering and product leaders about how to measure the real impact of AI coding tools, what metrics actually matter, and how high-performing teams are handling the transition.

AI Coding Tool Metrics: DORA and CTOs Deep Dive
Friday, January 9, 2026 • 8–10 AM PST / 11 AM–1 PM EST

Reserve Your Spot

Can’t attend live? Register anyway and we’ll send you the full session recording.
All registrants receive the full recording.
This two-hour, high-impact mini-conference includes:
  • A DORA researcher sharing new findings on how high-performing teams are adopting AI-assisted development — what’s changing in their workflows and which metrics actually correlate with better outcomes.
  • Two CTOs breaking down how they measure AI tools inside their organizations: the utilization, quality, and satisfaction metrics they track, what surprised them, and how they manage the non-code work.
  • A moderated discussion among CTOs and attendees to surface real questions and compare approaches.
You’ll learn:
  • What metrics leading organizations are using — and which tools help you capture them.
  • How to find where your team sits on the AI adoption curve, and what to do if you’re behind.
  • Where AI tools create hidden value that doesn’t show up as “more code.”
This is the first time the LA CTO Forum has opened one of its online sessions to a broader audience. Don’t miss this opportunity!

What most teams actually track

Once you’re past the initial rollout, most orgs end up tracking some subset of these:

  • Utilization: AI tool usage (DAU/WAU, sessions or prompts per dev), percentage of committed code that’s AI-generated, and percentage of PRs or tickets that are AI-assisted.
  • Throughput: rates of PRs, Tickets, Story Points, Cycle Time with and without use of AI tools and use of AI tools with Productivity improvement often based on qualitative estimates.
  • Quality: commit acceptance rates, rework rates, and incident/defect trends over time for AI-touched work versus non-AI.
  • Developer satisfaction

That said, you quickly run into the same problem we’ve always had with developer measurement and the AI coding tools just layer complexity on top.

I will also point out that the widely varying studies that you read plays directly into this and the fact that you are likely measuring immature adoption.

High-value AI work that doesn’t result in “more lines of code”

The other trap is that a lot of the best AI use cases don’t include code generation and may not affect “throughput” numbers:

  1. Errors, stack traces, and debugging

    Using an assistant to explain logs, propose hypotheses, and narrow in on fixes is incredibly valuable. The final fix might be three lines of code, but the time saved in root cause analysis is where the win lives.

  2. Understanding existing codebases

    Having an agent walk an engineer through modules, data flows, and edge cases is gold for onboarding and cross-team work, and really day-to-day work as well. The output might be a short design note, a diagram, or just a better mental model, but often not code itself.

  3. Requirements analysis and development strategy

    Turning fuzzy business goals into crisp acceptance criteria, edge cases, migration plans, and trade-off analyses is real engineering work. Good use of AI here usually means more iterating and more thinking up front. This work itself is not yet code.

  4. Code review assistance

    AI can act as a second set of eyes: flagging missing tests, odd edge cases, or inconsistencies with past patterns. It may not change the size of the diff, but it can quietly improve quality and shorten the path from PR to deployment.

If you rely too heavily on Lines of Code produced, you will fall into all the old traps and you will especially undervalue these use cases.

The new friction AI introduces

Even when AI tools are helping, they create some early friction that can make metrics look worse before they look better:

  1. Requirements friction

    Once engineers get good with AI, they tend to ask more – and better – questions about requirements and acceptance criteria. Tickets that used to be “good enough” start getting challenged. That’s healthy, but in the short term it can make cycle times look longer and frustrate product managers who weren’t expecting that level of scrutiny.

  2. Code review overload

    If you think of AI as multiplying your number of junior developers, your ratio just shifted dramatically. You now have far more “entry-level” code being submitted for review review. Without changes to review practices and guardrails, senior and mid-level engineers get swamped in AI-generated diffs and everything slows down.

This is why you can’t just stare at velocity charts and “% AI-generated code” and call it a day. You have to look at the whole system: how long work takes end-to-end, how quality and incidents move, how much time seniors spend reviewing, and whether the non-code work (requirements, debugging, comprehension) is getting easier.

Pragmatic measurement stance for 2026

If you’re getting pressure to “show me the numbers,” a reasonable stance looks like:

  • Acknowledge that you need at least 3–6 months of adoption maturity before any hard conclusions.
  • Track a small set of utilization and quality signals, and compare AI and non-AI work within the same teams over time.
  • Explicitly call out the non-code use cases you care about—debugging, codebase understanding, requirements, code review—and capture their impact with a mix of targeted metrics and narrative examples.
  • Use external studies as framing, not as your baseline; your systems, codebase, and people will be different.

Reading list