When AI writes the code: The hidden bottleneck slowing software releases

At JPMorgan Chase , managers now track how often developers use AI coding tools like GitHub Copilot, grouping engineers by usage level and factoring that into performance. The logic is simple: more AI usage should mean faster code, faster releases, and a stronger competitive edge. But the data tells a different story. A 2026 survey of over 1,100 developers by Sonar shows that while 72% of engineers now use AI tools daily and generate code faster than ever, release cycles are getting longer, and failure rates are rising. The issue is not AI itself, but where the work has moved. AI has shifted the bottleneck from writing code to reviewing and verifying it, and most engineering organizations have not yet adapted their processes, team structure, or metrics to this new reality.

Key takeaway

AI accelerates code generation, but shifts the bottleneck to review and verification, slowing overall release cycles
96% of developers do not fully trust AI-generated code, yet only 48% consistently verify it before committing, allowing unreviewed code to accumulate rapidly
88% of developers report increased technical debt, driven by duplicated and unnecessary AI-generated code that adds long-term complexity
57% of developers are concerned about exposing sensitive data through AI tools, and in many organizations this risk remains unmanaged
Measuring productivity by lines of code is no longer effective; DORA metrics provide a more accurate view of system performance
In regulated industries, AI cannot replace engineers with domain and compliance knowledge; gaps here surface as production failures, not development delays

The bottleneck moved, and most teams missed it

Before AI coding tools, writing code was the primary constraint in software delivery. Turning a business requirement into working logic took time. Teams reasonably assumed that if they could speed up writing, they would speed up everything.

AI removed the writing bottleneck. But it did not remove the work. It shifted it downstream.

The new constraint is verification and comprehension. AI generates code in seconds. But that code still requires a human to read it, understand it, test it, and confirm that it does what the business actually needs – not just what the prompt described. Because reading and validating machine-generated code is significantly harder than reading code you wrote yourself, the process is now slower at the review stage than it was before AI accelerated the writing stage.

This is the core problem most AI adoption strategies are missing. They measure input, how fast code is generated, and assume output follows. It does not.

The data makes this concrete. Currently, 96% of developers admit they do not fully trust AI-generated code. Yet only 48% verify it before committing to the main codebase. The gap between distrust and verification is not negligence. It is exhaustion.

61% of developers say AI tools produce code that looks correct but contains unreliable logic. When engineers write code themselves, they understand every decision in it. When a machine writes a block of code, errors hide inside logic that appears syntactically clean. Finding those errors requires reading the entire block with the scrutiny you would apply to unfamiliar code written by someone else, which is cognitively expensive and time-consuming.

In practical terms: in a team of 20 engineers each committing two AI-generated functions per day, that 48% verification rate means approximately 20 unreviewed functions entering the codebase daily. Over a two-week sprint, that is 200 functions – each a potential defect, security gap, or compliance violation that will surface in production rather than in review.

This is why teams report feeling busier despite AI being designed to save time. The work did not disappear. It relocated to a stage where it is harder to catch and more expensive to fix.

The hidden costs accumulating in your codebase

The volume problem compounds the quality problem.

Because AI generates code quickly, it generates low-quality code at a higher volume than any development team previously had to manage. At a recent software engineering conference in London, principal engineer Joy Ebertz described this directly: AI produces a high volume of poorly structured code, and the primary consequence is duplicated, unnecessary logic that accumulates in the codebase and makes every future change harder to implement safely. 88% of developers report that AI has negatively impacted their technical debt.

The more important point is the one that gets less attention. Not all codes carry the same consequences when it fails. An AI-generated script that restarts a server on a monthly schedule can be imperfect, and the business absorbs the cost. An AI-generated function inside a core payment processing system, or a patient’s record database cannot carry the same tolerance for error. Engineering leaders need to make this distinction explicit and structural, defining clearly which systems require rigorous human verification and which can tolerate a faster, lighter review. Without that distinction, developers default to inconsistency under deadline pressure.

The security dimension of uncontrolled AI usage compounds this further, and it is largely invisible in most organizations.

Small and mid-sized businesses report the highest speed gains from AI tools. They also report the highest defect rates, the most rework, and the most security exposure. Developers at smaller organizations frequently use personal accounts to access AI tools, pasting proprietary code, internal business logic, and customer data into public systems to get answers. There is often no policy governing this, no visibility into what is being shared, and no audit trail. 57% of developers are concerned that their AI tool usage is exposing sensitive data. In most organizations, that concern is entirely unmanaged.

Large enterprises move more slowly with AI adoption but achieve more stable results. They enforce approved tooling, route AI requests through controlled environments, and require generated code to pass automated quality checks before it reaches production. The speed advantage is smaller. The risk exposure is dramatically lower.

For engineering leaders at growing companies, the question is not whether to use AI. It is whether the governance structure around AI usage matches the sensitivity of the systems being built.

AI does not know your business and that gap shows up in production

AI coding tools understand programming languages. They do not understand your compliance requirements, your data architecture, your customer contracts, or the specific business rules your system was built to enforce.

Research by METR, a technical AI safety organization, tested AI coding tools across different project environments. Their finding is counterintuitive: in large, complex codebases, AI tools slowed developers down. The primary reason is that AI lacks context about the specific constraints (regulatory, architectural, and operational) that govern how code in that system must behave. Without that context, the AI produces technically valid code that fails operationally.

In industries where that failure has regulatory consequences, banking under ISO 27001 and PCI-DSS, healthcare under patient data privacy frameworks, public sector under procurement and security requirements, the gap between what AI generates and what the system requires is not a minor inefficiency. It is a liability.

This is not an argument against using AI in regulated environments. It is an argument for where human judgment must remain in the loop. Engineers with deep domain knowledge (who understand why a system is built the way it is, not just how it functions) are the only reliable check on AI output in these contexts. Teams that remove this layer of verification to accelerate delivery will recover the time they saved during a production incident.

The transition from AI-generated code to production-ready code in a regulated system is not a technical handoff. It is a judgment call that requires someone who understands the business well enough to know when technically correct is not actually correct.

You are measuring the wrong thing and AI is making it worse

Because AI makes generating code effortless, measuring developer performance by lines of code produced has become not just inaccurate but actively counterproductive. If engineers are assessed on volume, they will use AI to generate volume. The metric will look healthy. The system underneath it will not be.

The most reliable way to evaluate whether AI is improving your engineering operation is through DORA metrics – four measurements that track system performance rather than developer activity:

Deployment frequency measures how often new updates reach production. Elite teams deploy multiple times per day. If AI is improving your delivery process, this number should increase over time.
Change lead time measures how long it takes from a committed code change to that change running in production. This captures the full cycle, including review, testing, and deployment, not just the writing phase. If AI is accelerating writing but review is getting slower, this metric will show it.
Change failure rate measures how often new deployments cause a production incident. Industry benchmarks for elite teams sit below 5%. If your team is using AI heavily and this number is rising, you are shipping defects faster, not delivering value faster.
Failed deployment recovery time measures how quickly your team restores service after a production failure. This reflects the quality of your system architecture, which AI-generated technical debt degrades over time if left unmanaged.

These metrics tell you whether AI is improving your engineering system or just making parts of it faster while degrading others.

The answer to what AI should handle follows these measurements. The organizations seeing durable improvement from AI adoption have made a specific structural decision: they use AI for predictable, well-defined, low risk work such as documentation, standard test scaffolding, boilerplate configuration, routine code patterns. 70% of developers find AI highly effective for generating documentation precisely because the requirements are fully specified, and the failure modes are low risk.

Where AI consistently underperforms is in work requiring contextual judgment: system architecture decisions, compliance boundary enforcement, business rule implementation, and code that sits at the intersection of technical logic and operational consequence. These are the tasks where the 96% trust gap and the 61% hidden error rate matter most, and where the cost of undetected failure is highest.

The practical implication is not a blanket policy on AI usage. It is a deliberate mapping of which tasks in your delivery process belong in each category, and a governance structure that enforces the distinction rather than leaving it to individual developer for judgment under deadline pressure.

Why the answer is not more AI: It is better judgment

The shift in where verification sits in the delivery process has a direct implication for team composition that most AI adoption strategies do not address.

Junior developers do not have the domain knowledge required to catch AI’s mistakes in a complex, regulated system. They can identify syntax errors and obvious logic failures. They cannot reliably identify a compliance gap in a credit risk model or a patient with data handling error in a healthcare workflow, because catching those errors requires knowing what correct behavior should look like, not just whether the code runs.

This is why, at Synodus, 87% of our engineering team is at the middle or senior level. It is not a credential preference. It is a functional requirement for the industries we work in.

When we worked with BOC Aviation, we spent time understanding their credit processes before writing a line of code. The result was an automated script that reduced their processing time from ten days to two days – a gain that required getting the business logic right, not just getting the code written fast. When we took over a project for Arctx, we reduced development costs by 50% and delivered three times faster than the previous engagement not because we used more AI, but because senior engineers understood the system well enough to make the right architectural decisions before the work began.

AI is a productive tool in both of those contexts. It is not the source of the result. The engineers who understood the business well enough to direct it correctly are.

The organizations that will build durable engineering advantages from AI are not the ones that adopt it most aggressively. They are the ones that adopt it most deliberately with clear policies on where AI belongs, governance structures that match their risk profile, measurement frameworks that capture system performance rather than individual output, and senior engineers who understand the business well enough to validate what the machine produces.

AI will not run your business for you. The teams that understand its limits are the ones building systems that will still be running cleanly in three years.