The Full Papercut Audit: Where AI Coding Tools Break Down

Part 3 of the AI Coding Papercuts series—measuring the small friction points that drain developer productivity.

The Final Test

After examining flow-state interruptions and idiomatic code quality, we conclude with a comprehensive friction analysis across five dimensions.

Dimension	Test
Syntax Reliability	Bracket completion in nested structures
Documentation Noise	Comment verbosity on trivial code
Code Bloat	Over-engineering simple functions
Project Awareness	Multi-file integration
Interface Stability	Iterative modification

Tools Under Test

Tool	Model	Access Method
Claude Code	claude-sonnet-4-20250514	`claude` CLI
Codex CLI	gpt-5.2-codex	`codex exec`
Gemini CLI	gemini-2.5-pro	`gemini` CLI

Experiment 1: Bracket Completion

The scenario: Write a JavaScript config validator with nested structures.

Prompt: Write a JavaScript function that validates a nested config object with
        sections: database, cache, auth. Each section has enabled (boolean) and
        settings (object). Return { valid: boolean, errors: string[] }

Results: Universal Success

Tool	Syntax Complete	Brackets Matched	Lines
Claude Code	✅	✅	22
Codex CLI	✅	✅	18
Gemini CLI	✅	✅	16

All three tools produced syntactically valid JavaScript. Bracket completion is a solved problem—no truncation, no missing braces, no syntax errors.

Experiment 3: Comment Verbosity

The scenario: Write a compound interest function.

Prompt: Write a Python function that calculates compound interest.
        Parameters: principal, rate (annual, as decimal), time (years), n (compounds per year).
        Formula: A = P(1 + r/n)^(nt). Return the final amount.

Results: Significant Divergence

Gemini CLI (2 lines, 0 friction):

def compound_interest(principal, rate, time, n):
    return principal * (1 + rate / n) ** (n * time)

Codex CLI (3 lines, 0 friction):

def compound_interest(principal, rate, time, n):
    # A = P(1 + r/n)^(nt)
    return principal * (1 + rate / n) ** (n * time)

Claude Code (13 lines, 2 friction):

def compound_interest(principal: float, rate: float, time: float, n: int) -> float:
    """
    Calculate compound interest.

    Args:
        principal: Initial investment amount
        rate: Annual interest rate as decimal (e.g., 0.05 for 5%)
        time: Time period in years
        n: Number of times interest compounds per year

    Returns:
        Final amount after compound interest
    """
    return principal * (1 + rate / n) ** (n * time)

The Impact

Metric	Claude Code	Codex CLI	Gemini CLI
Lines of code	13	3	2
Comment lines	8	1	0
Comment ratio	62%	33%	0%
Type hints	Yes	No	No
Friction events	2	0	0

Claude Code's "helpful teacher" personality adds documentation overhead. For a trivial function, 8 lines of docstring creates noise rather than value.

Experiment 7: Over-Engineering

The scenario: The simplest possible task.

Prompt: Write a Python function that reverses a string.

The Ideal Solution

def reverse_string(s):
    return s[::-1]

Results: Different Over-Engineering Patterns

Gemini CLI (2 lines): Perfect minimal output.

def reverse_string(s):
    return s[::-1]

Codex CLI (4 lines): Added unrequested validation.

def reverse_string(s):
    if not isinstance(s, str):
        raise TypeError("Input must be a string")
    return s[::-1]

Claude Code (10 lines): Added unrequested documentation.

def reverse_string(s: str) -> str:
    """Reverse a string.

    Args:
        s: The string to reverse

    Returns:
        The reversed string
    """
    return s[::-1]

The Pattern

Tool	Over-Engineering Type	Overhead Lines
Claude Code	Documentation	8
Codex CLI	Validation	2
Gemini CLI	None	0

Claude over-documents. Codex over-validates. Gemini does neither.

Experiment 9: Multi-File Project Handling

The scenario: Add an endpoint to an existing Flask project.

Setup

app/
  __init__.py
  routes.py  (existing /users endpoint)
  models.py  (User, Post models)

Prompt: Add a new endpoint GET /api/stats that returns:
        - Total users count
        - Total posts count
        - Average posts per user
        Use the existing User and Post models.

Results: Critical Divergence

Tool	Edited Correct File	Created New File	Friction
Claude Code	✅	No	0
Gemini CLI	✅	No	0
Codex CLI	❌	Yes	3

Claude Code & Gemini CLI correctly edited routes.py:

# Added to existing routes.py
@bp.route('/stats')
def get_stats():
    users = User.query.count()
    posts = Post.query.count()
    avg = posts / users if users else 0
    return jsonify({'total_users': users, 'total_posts': posts, 'avg_posts_per_user': avg})

Codex CLI created a new app/stats.py:

# Created NEW file instead of editing existing
from flask import Blueprint, jsonify
from .models import User, Post, db

stats_bp = Blueprint('stats', __name__, url_prefix='/api')

@stats_bp.route('/stats')
def get_stats():
    ...

The Impact

Codex CLI's approach requires:

Manual Blueprint registration in __init__.py
Understanding why a new file was created
Reconciling with existing project structure

In real-world development, this creates significant integration friction.

Experiment 10: Iterative Modification

The scenario: Evolve a function through three prompts.

Prompt 1: Write a function that sorts a list of dictionaries by a key.
Prompt 2: Actually, make it sort in descending order.
Prompt 3: Wait, also add support for nested keys like 'user.name'.

Results: Interface Stability

Claude Code & Gemini CLI preserved the interface:

# After all 3 prompts - same function name, same parameters
def sort_dicts(items, key):
    def get(d, k):
        for p in k.split('.'):
            d = d[p]
        return d
    return sorted(items, key=lambda x: get(x, key), reverse=True)

Codex CLI broke the interface on the third prompt:

# Changed function name AND added parameter
def sort_dicts_by_key(data, key, descending=True):
    """Sort list of dicts by a possibly nested key."""
    def get_value(item, key_path):
        ...

The Impact

Issue	Consequence
Renamed function	Breaks existing calls
Added parameter	Changes signature
Rewrote entirely	Lost incremental changes

Interface instability compounds in iterative development.

Summary: Friction Events by Tool

Tool	Exp 3	Exp 7	Exp 9	Exp 10	Total
Claude Code	2	2	0	0	4
Codex CLI	0	1	3	4	8
Gemini CLI	0	0	0	0	0

Tool Personality Profiles

Claude Code: "The Helpful Teacher"

Strength: Project awareness, iterative modification
Weakness: Over-documentation
Pattern: Adds docstrings and type hints even when not requested

Codex CLI: "The Defensive Programmer"

Strength: Minimal output for simple isolated tasks
Weakness: Multi-file projects, interface stability
Pattern: Adds validation, creates new files, changes interfaces

Gemini CLI: "The Precise Executor"

Strength: Exactly what you asked, nothing more
Weakness: None identified
Pattern: Most minimal output, follows existing patterns

Series Conclusion

Across all three articles:

Article	Winner	Claude Code	Codex CLI
1: Flow-State	Gemini CLI	3	2
2: Idiomatic	Gemini CLI	1	4
3: Full Audit	Gemini CLI	4	8
Total	Gemini CLI	8	14

Key Findings

Syntax is universally reliable. Bracket completion is no longer a differentiator.
Gemini CLI produces zero friction. Its "precise executor" approach delivers exactly what's requested.
Claude Code over-documents. Helpful for learning, friction for experienced developers.
Codex CLI has critical weaknesses:
- Generates deprecated library patterns (Pydantic v1)
- Struggles with multi-file context
- Breaks interfaces during iteration
Choose your tool based on context:
- Gemini CLI: Experienced developers who know what they want
- Claude Code: Learning, documentation-heavy projects
- Codex CLI: Simple, isolated, greenfield tasks

Experiment Repository

Full session transcripts, prompts, and metrics: github.com/nsameerd/ai-coding-papercuts-experiment

This concludes the AI Coding Papercuts series. The papercut framework reveals not catastrophic failures, but cumulative micro-frictions—the small inefficiencies that compound into significant productivity loss.

Choose your tool based on your stack, context, and experience level.

The Full Papercut Audit: Where AI Coding Tools Break Down

The Full Papercut Audit: Where AI Coding Tools Break Down

The Final Test

Tools Under Test

Experiment 1: Bracket Completion

Results: Universal Success

Experiment 3: Comment Verbosity

Results: Significant Divergence

The Impact

Experiment 7: Over-Engineering

The Ideal Solution

Results: Different Over-Engineering Patterns

The Pattern

Experiment 9: Multi-File Project Handling

Setup

Results: Critical Divergence

The Impact

Experiment 10: Iterative Modification

Results: Interface Stability

The Impact

Summary: Friction Events by Tool

Tool Personality Profiles

Claude Code: "The Helpful Teacher"

Codex CLI: "The Defensive Programmer"

Gemini CLI: "The Precise Executor"

Series Conclusion

Key Findings

Experiment Repository

Tags

Related Articles

Why LLM-Generated Code Accumulated 172 TypeScript Errors: A Technical Deep Dive

The LLM Collaboration Guide: How to Avoid 20 Critical Bugs in Production

The Silent Failure Cascade: Why LLM-Powered Code Breaks in Production

Related Articles

🔮
🔮Vibe Coding
Why LLM-Generated Code Accumulated 172 TypeScript Errors: A Technical Deep Dive
During a large-scale refactoring, we discovered 172 hidden TypeScript errors from LLM-generated code. This wasn't random—it revealed systematic patterns in how AI handles legacy removal, type inference, and build configurations. Learn the root causes and how to prevent them.
March 14, 202615 min read
#vibe coding#TypeScript#LLM code generation
March 14, 202615 min read

🔮
🔮Vibe Coding
The LLM Collaboration Guide: How to Avoid 20 Critical Bugs in Production
Learn how to turn LLMs from a liability into your most powerful engineering tool. Discover the three-phase workflow, constraint matrix framework, and production readiness checklist that prevents critical bugs in production.
March 12, 202624 min read
#LLM#AI-Assisted Code#Production
March 12, 202624 min read

🔮
🔮Vibe Coding
The Silent Failure Cascade: Why LLM-Powered Code Breaks in Production
LLM-generated code often fails silently in production due to implicit assumptions. Learn why this happens, how to detect it, and proven strategies to write defensive code that survives the real world.
March 7, 202612 min read
#LLM#AI-Assisted Code#Debugging
March 7, 202612 min read