Skip to main content
Back to Articles

The Full Papercut Audit: Where AI Coding Tools Break Down

A comprehensive friction analysis across five dimensions: bracket completion, comment verbosity, over-engineering, multi-file projects, and iterative modification. Gemini CLI achieves perfect zero friction.

February 5, 202612 min readBy Mathematicon

The Full Papercut Audit: Where AI Coding Tools Break Down

Part 3 of the AI Coding Papercuts series—measuring the small friction points that drain developer productivity.


The Final Test

After examining flow-state interruptions and idiomatic code quality, we conclude with a comprehensive friction analysis across five dimensions.

Dimension Test
Syntax Reliability Bracket completion in nested structures
Documentation Noise Comment verbosity on trivial code
Code Bloat Over-engineering simple functions
Project Awareness Multi-file integration
Interface Stability Iterative modification

Tools Under Test

Tool Model Access Method
Claude Code claude-sonnet-4-20250514 claude CLI
Codex CLI gpt-5.2-codex codex exec
Gemini CLI gemini-2.5-pro gemini CLI

Experiment 1: Bracket Completion

The scenario: Write a JavaScript config validator with nested structures.

Prompt: Write a JavaScript function that validates a nested config object with
        sections: database, cache, auth. Each section has enabled (boolean) and
        settings (object). Return { valid: boolean, errors: string[] }

Results: Universal Success

Tool Syntax Complete Brackets Matched Lines
Claude Code âś… âś… 22
Codex CLI âś… âś… 18
Gemini CLI âś… âś… 16

All three tools produced syntactically valid JavaScript. Bracket completion is a solved problem—no truncation, no missing braces, no syntax errors.


Experiment 3: Comment Verbosity

The scenario: Write a compound interest function.

Prompt: Write a Python function that calculates compound interest.
        Parameters: principal, rate (annual, as decimal), time (years), n (compounds per year).
        Formula: A = P(1 + r/n)^(nt). Return the final amount.

Results: Significant Divergence

Gemini CLI (2 lines, 0 friction):

def compound_interest(principal, rate, time, n):
    return principal * (1 + rate / n) ** (n * time)

Codex CLI (3 lines, 0 friction):

def compound_interest(principal, rate, time, n):
    # A = P(1 + r/n)^(nt)
    return principal * (1 + rate / n) ** (n * time)

Claude Code (13 lines, 2 friction):

def compound_interest(principal: float, rate: float, time: float, n: int) -> float:
    """
    Calculate compound interest.

    Args:
        principal: Initial investment amount
        rate: Annual interest rate as decimal (e.g., 0.05 for 5%)
        time: Time period in years
        n: Number of times interest compounds per year

    Returns:
        Final amount after compound interest
    """
    return principal * (1 + rate / n) ** (n * time)

The Impact

Metric Claude Code Codex CLI Gemini CLI
Lines of code 13 3 2
Comment lines 8 1 0
Comment ratio 62% 33% 0%
Type hints Yes No No
Friction events 2 0 0

Claude Code's "helpful teacher" personality adds documentation overhead. For a trivial function, 8 lines of docstring creates noise rather than value.


Experiment 7: Over-Engineering

The scenario: The simplest possible task.

Prompt: Write a Python function that reverses a string.

The Ideal Solution

def reverse_string(s):
    return s[::-1]

Results: Different Over-Engineering Patterns

Gemini CLI (2 lines): Perfect minimal output.

def reverse_string(s):
    return s[::-1]

Codex CLI (4 lines): Added unrequested validation.

def reverse_string(s):
    if not isinstance(s, str):
        raise TypeError("Input must be a string")
    return s[::-1]

Claude Code (10 lines): Added unrequested documentation.

def reverse_string(s: str) -> str:
    """Reverse a string.

    Args:
        s: The string to reverse

    Returns:
        The reversed string
    """
    return s[::-1]

The Pattern

Tool Over-Engineering Type Overhead Lines
Claude Code Documentation 8
Codex CLI Validation 2
Gemini CLI None 0

Claude over-documents. Codex over-validates. Gemini does neither.


Experiment 9: Multi-File Project Handling

The scenario: Add an endpoint to an existing Flask project.

Setup

app/
  __init__.py
  routes.py  (existing /users endpoint)
  models.py  (User, Post models)
Prompt: Add a new endpoint GET /api/stats that returns:
        - Total users count
        - Total posts count
        - Average posts per user
        Use the existing User and Post models.

Results: Critical Divergence

Tool Edited Correct File Created New File Friction
Claude Code âś… No 0
Gemini CLI âś… No 0
Codex CLI ❌ Yes 3

Claude Code & Gemini CLI correctly edited routes.py:

# Added to existing routes.py
@bp.route('/stats')
def get_stats():
    users = User.query.count()
    posts = Post.query.count()
    avg = posts / users if users else 0
    return jsonify({'total_users': users, 'total_posts': posts, 'avg_posts_per_user': avg})

Codex CLI created a new app/stats.py:

# Created NEW file instead of editing existing
from flask import Blueprint, jsonify
from .models import User, Post, db

stats_bp = Blueprint('stats', __name__, url_prefix='/api')

@stats_bp.route('/stats')
def get_stats():
    ...

The Impact

Codex CLI's approach requires:

  • Manual Blueprint registration in __init__.py
  • Understanding why a new file was created
  • Reconciling with existing project structure

In real-world development, this creates significant integration friction.


Experiment 10: Iterative Modification

The scenario: Evolve a function through three prompts.

Prompt 1: Write a function that sorts a list of dictionaries by a key.
Prompt 2: Actually, make it sort in descending order.
Prompt 3: Wait, also add support for nested keys like 'user.name'.

Results: Interface Stability

Claude Code & Gemini CLI preserved the interface:

# After all 3 prompts - same function name, same parameters
def sort_dicts(items, key):
    def get(d, k):
        for p in k.split('.'):
            d = d[p]
        return d
    return sorted(items, key=lambda x: get(x, key), reverse=True)

Codex CLI broke the interface on the third prompt:

# Changed function name AND added parameter
def sort_dicts_by_key(data, key, descending=True):
    """Sort list of dicts by a possibly nested key."""
    def get_value(item, key_path):
        ...

The Impact

Issue Consequence
Renamed function Breaks existing calls
Added parameter Changes signature
Rewrote entirely Lost incremental changes

Interface instability compounds in iterative development.


Summary: Friction Events by Tool

Tool Exp 1 Exp 3 Exp 7 Exp 9 Exp 10 Total
Claude Code 0 2 2 0 0 4
Codex CLI 0 0 1 3 4 8
Gemini CLI 0 0 0 0 0 0

Tool Personality Profiles

Claude Code: "The Helpful Teacher"

  • Strength: Project awareness, iterative modification
  • Weakness: Over-documentation
  • Pattern: Adds docstrings and type hints even when not requested

Codex CLI: "The Defensive Programmer"

  • Strength: Minimal output for simple isolated tasks
  • Weakness: Multi-file projects, interface stability
  • Pattern: Adds validation, creates new files, changes interfaces

Gemini CLI: "The Precise Executor"

  • Strength: Exactly what you asked, nothing more
  • Weakness: None identified
  • Pattern: Most minimal output, follows existing patterns

Series Conclusion

Across all three articles:

Article Winner Claude Code Codex CLI Gemini CLI
1: Flow-State Gemini CLI 3 2 0
2: Idiomatic Gemini CLI 1 4 0
3: Full Audit Gemini CLI 4 8 0
Total Gemini CLI 8 14 0

Key Findings

  1. Syntax is universally reliable. Bracket completion is no longer a differentiator.

  2. Gemini CLI produces zero friction. Its "precise executor" approach delivers exactly what's requested.

  3. Claude Code over-documents. Helpful for learning, friction for experienced developers.

  4. Codex CLI has critical weaknesses:

    • Generates deprecated library patterns (Pydantic v1)
    • Struggles with multi-file context
    • Breaks interfaces during iteration
  5. Choose your tool based on context:

    • Gemini CLI: Experienced developers who know what they want
    • Claude Code: Learning, documentation-heavy projects
    • Codex CLI: Simple, isolated, greenfield tasks

Experiment Repository

Full session transcripts, prompts, and metrics: github.com/nsameerd/ai-coding-papercuts-experiment


This concludes the AI Coding Papercuts series. The papercut framework reveals not catastrophic failures, but cumulative micro-frictions—the small inefficiencies that compound into significant productivity loss.

Choose your tool based on your stack, context, and experience level.

Share this article

Related Articles