The Full Papercut Audit: Where AI Coding Tools Break Down
Part 3 of the AI Coding Papercuts series—measuring the small friction points that drain developer productivity.
The Final Test
After examining flow-state interruptions and idiomatic code quality, we conclude with a comprehensive friction analysis across five dimensions.
| Dimension | Test |
|---|---|
| Syntax Reliability | Bracket completion in nested structures |
| Documentation Noise | Comment verbosity on trivial code |
| Code Bloat | Over-engineering simple functions |
| Project Awareness | Multi-file integration |
| Interface Stability | Iterative modification |
Tools Under Test
| Tool | Model | Access Method |
|---|---|---|
| Claude Code | claude-sonnet-4-20250514 | claude CLI |
| Codex CLI | gpt-5.2-codex | codex exec |
| Gemini CLI | gemini-2.5-pro | gemini CLI |
Experiment 1: Bracket Completion
The scenario: Write a JavaScript config validator with nested structures.
Prompt: Write a JavaScript function that validates a nested config object with
sections: database, cache, auth. Each section has enabled (boolean) and
settings (object). Return { valid: boolean, errors: string[] }
Results: Universal Success
| Tool | Syntax Complete | Brackets Matched | Lines |
|---|---|---|---|
| Claude Code | âś… | âś… | 22 |
| Codex CLI | âś… | âś… | 18 |
| Gemini CLI | âś… | âś… | 16 |
All three tools produced syntactically valid JavaScript. Bracket completion is a solved problem—no truncation, no missing braces, no syntax errors.
Experiment 3: Comment Verbosity
The scenario: Write a compound interest function.
Prompt: Write a Python function that calculates compound interest.
Parameters: principal, rate (annual, as decimal), time (years), n (compounds per year).
Formula: A = P(1 + r/n)^(nt). Return the final amount.
Results: Significant Divergence
Gemini CLI (2 lines, 0 friction):
def compound_interest(principal, rate, time, n):
return principal * (1 + rate / n) ** (n * time)
Codex CLI (3 lines, 0 friction):
def compound_interest(principal, rate, time, n):
# A = P(1 + r/n)^(nt)
return principal * (1 + rate / n) ** (n * time)
Claude Code (13 lines, 2 friction):
def compound_interest(principal: float, rate: float, time: float, n: int) -> float:
"""
Calculate compound interest.
Args:
principal: Initial investment amount
rate: Annual interest rate as decimal (e.g., 0.05 for 5%)
time: Time period in years
n: Number of times interest compounds per year
Returns:
Final amount after compound interest
"""
return principal * (1 + rate / n) ** (n * time)
The Impact
| Metric | Claude Code | Codex CLI | Gemini CLI |
|---|---|---|---|
| Lines of code | 13 | 3 | 2 |
| Comment lines | 8 | 1 | 0 |
| Comment ratio | 62% | 33% | 0% |
| Type hints | Yes | No | No |
| Friction events | 2 | 0 | 0 |
Claude Code's "helpful teacher" personality adds documentation overhead. For a trivial function, 8 lines of docstring creates noise rather than value.
Experiment 7: Over-Engineering
The scenario: The simplest possible task.
Prompt: Write a Python function that reverses a string.
The Ideal Solution
def reverse_string(s):
return s[::-1]
Results: Different Over-Engineering Patterns
Gemini CLI (2 lines): Perfect minimal output.
def reverse_string(s):
return s[::-1]
Codex CLI (4 lines): Added unrequested validation.
def reverse_string(s):
if not isinstance(s, str):
raise TypeError("Input must be a string")
return s[::-1]
Claude Code (10 lines): Added unrequested documentation.
def reverse_string(s: str) -> str:
"""Reverse a string.
Args:
s: The string to reverse
Returns:
The reversed string
"""
return s[::-1]
The Pattern
| Tool | Over-Engineering Type | Overhead Lines |
|---|---|---|
| Claude Code | Documentation | 8 |
| Codex CLI | Validation | 2 |
| Gemini CLI | None | 0 |
Claude over-documents. Codex over-validates. Gemini does neither.
Experiment 9: Multi-File Project Handling
The scenario: Add an endpoint to an existing Flask project.
Setup
app/
__init__.py
routes.py (existing /users endpoint)
models.py (User, Post models)
Prompt: Add a new endpoint GET /api/stats that returns:
- Total users count
- Total posts count
- Average posts per user
Use the existing User and Post models.
Results: Critical Divergence
| Tool | Edited Correct File | Created New File | Friction |
|---|---|---|---|
| Claude Code | âś… | No | 0 |
| Gemini CLI | âś… | No | 0 |
| Codex CLI | ❌ | Yes | 3 |
Claude Code & Gemini CLI correctly edited routes.py:
# Added to existing routes.py
@bp.route('/stats')
def get_stats():
users = User.query.count()
posts = Post.query.count()
avg = posts / users if users else 0
return jsonify({'total_users': users, 'total_posts': posts, 'avg_posts_per_user': avg})
Codex CLI created a new app/stats.py:
# Created NEW file instead of editing existing
from flask import Blueprint, jsonify
from .models import User, Post, db
stats_bp = Blueprint('stats', __name__, url_prefix='/api')
@stats_bp.route('/stats')
def get_stats():
...
The Impact
Codex CLI's approach requires:
- Manual Blueprint registration in
__init__.py - Understanding why a new file was created
- Reconciling with existing project structure
In real-world development, this creates significant integration friction.
Experiment 10: Iterative Modification
The scenario: Evolve a function through three prompts.
Prompt 1: Write a function that sorts a list of dictionaries by a key.
Prompt 2: Actually, make it sort in descending order.
Prompt 3: Wait, also add support for nested keys like 'user.name'.
Results: Interface Stability
Claude Code & Gemini CLI preserved the interface:
# After all 3 prompts - same function name, same parameters
def sort_dicts(items, key):
def get(d, k):
for p in k.split('.'):
d = d[p]
return d
return sorted(items, key=lambda x: get(x, key), reverse=True)
Codex CLI broke the interface on the third prompt:
# Changed function name AND added parameter
def sort_dicts_by_key(data, key, descending=True):
"""Sort list of dicts by a possibly nested key."""
def get_value(item, key_path):
...
The Impact
| Issue | Consequence |
|---|---|
| Renamed function | Breaks existing calls |
| Added parameter | Changes signature |
| Rewrote entirely | Lost incremental changes |
Interface instability compounds in iterative development.
Summary: Friction Events by Tool
| Tool | Exp 1 | Exp 3 | Exp 7 | Exp 9 | Exp 10 | Total |
|---|---|---|---|---|---|---|
| Claude Code | 0 | 2 | 2 | 0 | 0 | 4 |
| Codex CLI | 0 | 0 | 1 | 3 | 4 | 8 |
| Gemini CLI | 0 | 0 | 0 | 0 | 0 | 0 |
Tool Personality Profiles
Claude Code: "The Helpful Teacher"
- Strength: Project awareness, iterative modification
- Weakness: Over-documentation
- Pattern: Adds docstrings and type hints even when not requested
Codex CLI: "The Defensive Programmer"
- Strength: Minimal output for simple isolated tasks
- Weakness: Multi-file projects, interface stability
- Pattern: Adds validation, creates new files, changes interfaces
Gemini CLI: "The Precise Executor"
- Strength: Exactly what you asked, nothing more
- Weakness: None identified
- Pattern: Most minimal output, follows existing patterns
Series Conclusion
Across all three articles:
| Article | Winner | Claude Code | Codex CLI | Gemini CLI |
|---|---|---|---|---|
| 1: Flow-State | Gemini CLI | 3 | 2 | 0 |
| 2: Idiomatic | Gemini CLI | 1 | 4 | 0 |
| 3: Full Audit | Gemini CLI | 4 | 8 | 0 |
| Total | Gemini CLI | 8 | 14 | 0 |
Key Findings
-
Syntax is universally reliable. Bracket completion is no longer a differentiator.
-
Gemini CLI produces zero friction. Its "precise executor" approach delivers exactly what's requested.
-
Claude Code over-documents. Helpful for learning, friction for experienced developers.
-
Codex CLI has critical weaknesses:
- Generates deprecated library patterns (Pydantic v1)
- Struggles with multi-file context
- Breaks interfaces during iteration
-
Choose your tool based on context:
- Gemini CLI: Experienced developers who know what they want
- Claude Code: Learning, documentation-heavy projects
- Codex CLI: Simple, isolated, greenfield tasks
Experiment Repository
Full session transcripts, prompts, and metrics: github.com/nsameerd/ai-coding-papercuts-experiment
This concludes the AI Coding Papercuts series. The papercut framework reveals not catastrophic failures, but cumulative micro-frictions—the small inefficiencies that compound into significant productivity loss.
Choose your tool based on your stack, context, and experience level.