The Messy Reality of AI-Assisted Coding Efficiency

Published: Jul 22, 2025, 11:00 pm Updated: Jul 24, 2025, 1:13 pm

The narrative around AI coding tools started on a positive note. Theoretically, it could speed up entire workloads, and the initial studies did show just that. However, later on, opposing research started to show the apparent deception, and that AI actually makes coding slower. Worse, it makes all the more repetitive.

So… which is it really? Unfortunately, there are actually different technical assessments for this complex subject. We did not just know later about AI’s failings after being optimistic at first. Because the basic question still remains: are we even putting the results into their proper perspectives in the first place?

The Initial Boom About AI-Assisted Development

Early successes weren’t confined to mainstream IDE plug-ins. Consumer-facing apps such as Candy AI, a virtual-companion platform whose chat flows are built almost entirely from layered system prompts show the same acceleration pattern.

A technical teardown on Hugging Face explains that Candy AI stitches GPT-based “prompt memory,” vector-database recall, and real-time text-to-speech into a modular stack, all orchestrated by carefully engineered prompts rather than hard-coded dialogue trees. The same guide notes that the architecture has already scaled to thousands of users worldwide without the need for exponential head-count growth, underscoring how disciplined prompt engineering can substitute for traditional rule-based development when the product is an AI companion.

On the opposite side of the things, going into a bit deeper with the research from MIT, Princeton, and the University of Pennsylvania, 4,800+ developers from Microsoft, Accenture, and a Fortune 100 company, found a 26% increase in completed tasks when using GitHub Copilot. The study also revealed a 13.55% increase in code commits and a 38.38% increase in builds, with less experienced developers showing both higher adoption rates and greater productivity gains. Quality measures remained stable or improved, with pull request approval rates increasing by approximately 10%.

This general conclusion is supported further by another survey by Stack Overflow during the same year. At least 81% of the participants cited that increasing productivity is the biggest benefit of using AI tools, with 71% agreeing that it is also very helpful (probably even more helpful) in speeding up learning. GitHub's own research also found that 46% of code was completed by Copilot in enabled files, with developers generating over three billion accepted lines of AI-assisted code. The source even strongly pointed at economic implications, as the researchers estimate AI-powered developer tools could boost global GDP by over $1.5 trillion through productivity gains.

Okay, before anybody screams conflict of interest bias here (especially with Stack Overflow’s infographics), let me first point out that there is still a significant amount of variation across the companies that participated. We are looking at consensus results. In fact, even if it was buried in the appendix, the first research showed Accenture at a staggering -17.40% build success rate, and the tool had a low adoption rate across the board. But the overall effect still shows that AI-assisted coding had a somewhat promising start.

Opposing Negative Results, But Not the Way You Think

Of course, following the narrative we already introduced earlier, the productivity gains aren't evenly distributed. In reality, they follow a clear experience “gradient” that opens much of the contradictory evidence against AI coding productivity later on.

The 4800+ developer study, for example, actually had junior developers with the most dramatic improvements, at least 25% productivity boosts across the board. The senior developers, on the other hand, had at most just 16%. At the more extreme end, the infamous METR study of just 16 veteran developers working on familiar, large-scale open source project had their workflow slow down by a significant 19% rate.

Despite the sample size criticisms, the METR methodology was rigorous: randomized controlled trial with 246 real-world tasks, screen recordings, and tasks averaging two hours each. The initial prediction was a 25% speedup, and indeed, most of them still think that AI at least provided 20% assistance despite the reported slowdown.

So, what happened? Well, developers in the METR study accepted less than 44% of AI-generated code suggestions, with 75% reading every line of AI output and 56% making major modifications. Apparently, when you know a codebase intimately and have (reasonably) high quality standards, AI's suggestions often create more review overhead.

One METR participant noted he "wasted at least an hour first trying to solve a specific issue with AI" before reverting all changes and implementing it manually. This inadvertently confirms the "dopamine shortcut button" phenomenon described by another participant: "Do you keep pressing the button that has a 1% chance of fixing everything? It's a lot more enjoyable than the grueling alternative." A bias of implementation, sure, but a notable part of the potential issue nonetheless.

The Trust Paradox

Perhaps the most revealing aspect of AI coding adoption is the massive gap between usage and trust. The same Stack Overflow survey earlier revealed that only 43% trust their accuracy, which is a figure and point of the studies that's barely budged from previous surveys.

This distrust shows up in behavior rather directly. Google's 2024 DORA report found that a 25% increase in AI adoption produced mixed results: while correlating with a 7.5% increase in documentation quality, 3.4% increase in code quality, and 3.1% faster code reviews, it also correlated with a 1.5% drop in delivery speed and 7.2% drop in system stability. Though over 75% of developers use AI for daily work responsibilities and more than one-third report moderate to extreme productivity gains, 39% express little to no trust in AI-generated code.

As hinted earlier, professional developers are even twice as likely as their peers to cite lack of trust or understanding of the codebase as top challenges with AI tools. This isn't user error; it's a fundamental mismatch between expectations of what AI can do, and the complex, context-heavy nature of production codebases.

In addition, this perception gap extends beyond individual developers. Economics experts predicted AI would improve productivity by 39% and machine learning experts forecasted 38% gains (referenced by the METR study). Both dramatically overestimated actual impact as shown by later opinions that created more nuance about the once overly positive AI-assisted coding consensus.

AI Efficiency is a Matter of Mental Malleability?

Case in point: AI coding productivity isn't about the tools themselves, but how they match your context. Because from what we can see so far, AI-assisted coding seems to still serve its purpose well for users that have yet to establish their complete cadence and workflow.

After all, junior and mid-level developers consistently see immediate gains and build confidence with AI tools, while senior developers still benefit when working outside their expertise zones (for example, unfamiliar codebases or greenfield projects). The METR study's finding that developers with high prior experience showed greater slowdowns is more of an observation that forcing AI adoption on experts working familiar systems often backfires.

That being said, the behavioral warning signs still matter equally. When developers start experimenting beyond what's productive, which was a pattern METR identified as contributing to slowdown, the so-called dopamine hit of "maybe this time AI will solve everything" becomes extremely counterproductive. Organizations seeing declining PR merge rates or increasing rework cycles should treat these as red flags, not temporary adjustment periods