Skip to main content
Back to Articles

The LLM Collaboration Guide: How to Avoid 20 Critical Bugs in Production

Learn how to turn LLMs from a liability into your most powerful engineering tool. Discover the three-phase workflow, constraint matrix framework, and production readiness checklist that prevents critical bugs in production.

March 12, 202624 min readBy Mathematicon

The LLM Collaboration Guide: How to Avoid 20 Critical Bugs in Production

You ask Claude to "migrate 68 payment endpoints to DynamoDB."

Hours later, it delivers 15 DAOs and 3,800 lines of perfect, compiling code. The happy path works in staging. You deploy.

Then, production hits you with 20 critical bugs: race conditions, duplicate donations, lost commissions, and data divergence. The kind of bugs that lose money and break user trust.

Was it Claude's fault? No. It was the usage pattern.

This guide breaks down the exact strategy you need to turn LLMs from a liability into your most powerful engineering tool—not by being smarter about prompts, but by understanding what LLMs can and cannot do.


Part 1: The Expectation Gap

To use an LLM effectively, you have to understand its mental model.

What LLMs are actually great at:

  • Pattern Replication: "Here's one API endpoint, build me 100 more just like it."
  • Syntax & Boilerplate: Generating CRUD operations, SDK usage, and framework code.
  • Refactoring: Cleaning up messy code and enforcing consistency.
  • Explanation: Breaking down complex code into digestible pieces.

What LLMs are surprisingly bad at (unless you guide them):

  • Domain Logic: It doesn't know that "money" is special.
  • Failure Mode Thinking: It won't anticipate network failures, race conditions, or webhook retries unless you explicitly say so.
  • Hidden Requirements: If you don't say "atomic," it won't use atomic operations.
  • Trade-off Analysis: It can't weigh the cost of eventual consistency vs. immediate consistency without guidance.

The Core Insight:

Claude isn't thinking about 1,000 concurrent users. It's thinking about making your code compile and pass basic logic tests. It's your job to bridge the gap between "code that works" and "code that survives production."

This isn't a limitation of the model—it's a feature. Claude optimizes for what you ask for. The problem is that developers often forget to ask for what matters.


Part 2: The Anatomy of a Failure

Here's the flawed workflow that leads to production disasters:

  1. The Vague Ask: "Migrate 68 endpoints to DynamoDB."
  2. Claude's Naive Execution: Copies existing patterns, assumes single-user scenarios, optimizes for the happy path.
  3. The Superficial Review: You check if it compiles and passes basic unit tests (which don't cover concurrency).
  4. The Production Meltdown: Race conditions, data corruption, and idempotency failures emerge under real load.
The Assumption The Painful Reality
"Claude knows what matters." Claude only knows what you explicitly tell it.
"Happy path = production ready." Production is 1000s of users + constant failures.
"Code that compiles is correct." Correctness requires atomicity, idempotency, and consistency.
"Tests pass = ship it." Tests that only cover the happy path are a false sense of security.

The Real Cost of Ignoring This:

When financial data is corrupted, the consequences aren't just "bugs"—they're:

  • Lost revenue (duplicate charges, missing commissions)
  • User trust erosion (payment inconsistencies)
  • Compliance violations (audit trail corruption)
  • Emergency on-call escalations (fixing production data)

Part 3: The Winning Strategy—A Three-Phase Workflow

Stop treating LLMs like a magic black box. Start treating them like a brilliant but naive junior engineer who needs context, constraints, and a rigid quality gate.

Phase 1: Pre-Code (Your Most Important Job)

Don't: "Build a wallet system." Do: Give them the "Why" and the "What If."

The Constraint Matrix Framework

Before you ask Claude to write a single line of code, organize your requirements into a structured matrix. This forces you to think clearly, and gives Claude the context it needs.

Category Question to Ask Yourself Example Constraint to Give Claude
Atomicity Can two of these operations happen at the exact same millisecond? "Balance updates MUST use atomic ADD operations. Read-modify-write is forbidden."
Idempotency Can the external caller send this request more than once? "Razorpay sends webhooks at least once. Use payment_id as the idempotency key. If the key exists, return 200 OK without processing."
Consistency What happens if we write to one database but not the other? "DynamoDB is the source of truth. MongoDB is a read replica. Never dual-write; use transactional outbox pattern."
Ordering Does the sequence of events matter? "Webhooks can arrive out of order. Use processed_at timestamp and ignore events older than last processed."
Durability What if the process crashes mid-operation? "Jobs must be persisted to a queue before processing. If a worker crashes, it can resume without data loss."
Visibility Can you observe what's happening? "Log every payment state transition. On error, emit a structured log with enough info to replay the transaction."

Your Pre-Code Prompt Should Look Like This:

"I need you to build a payment wallet system. Before you write any code, answer these questions so I know you understand the domain:

1. Failure Modes Analysis

  • List 5 ways this system could fail in production.
  • For each, explain how it would manifest (e.g., "Race condition on balance: two concurrent deposits both read balance=100, both add 50, both write back 150 instead of 200").
  • Propose the specific design pattern or database feature that prevents it.

2. Critical Constraints

  • Atomicity: All balance updates MUST use atomic database operations. Read-modify-write is forbidden.
  • Idempotency: Razorpay webhooks can arrive 3+ times. Design so duplicate webhooks result in 200 OK with no state change.
  • Multi-step operations: If we debit a wallet but fail to create a ledger entry, the retry must be safe.

3. Edge Cases

  • What happens if a webhook arrives out of order?
  • What happens if a webhook for a deleted user arrives?
  • What happens if the same webhook arrives while the first one is still processing?

Only after you've thoroughly analyzed these, write the code."

Why This Matters:

This prompt forces Claude to think like an architect before it codes like a builder. The output is a detailed analysis that you can review and debate before getting 3,000 lines of code that needs rewriting.


Phase 2: Code (Claude's Execution with Constraints)

With clear constraints, Claude's output transforms dramatically. Instead of a naive read-modify-write, it might say:

"I was going to use a simple read-modify-write pattern for the wallet balance, but given the high concurrency constraint and the requirement for atomicity, I will use DynamoDB's UpdateCommand with a SET operation instead. This ensures that all increments are atomic at the database level, preventing race conditions."

Claude can produce production-grade code. But it needs you to point toward production-grade thinking.


Phase 3: Post-Code (The Production Readiness Checklist)

Don't just merge the PR. Run it through a Safety Checklist before touching production.

The Production Readiness Checklist:

Atomicity

  • All monetary operations use atomic DB commands (no read-modify-write). Verify using UpdateCommand, conditional writes, or transactions.
  • Concurrency tests (e.g., 100 simultaneous deposits to the same wallet) pass without data loss.
  • No in-memory state mutations that could be corrupted by concurrent requests.

Idempotency

  • External IDs (like payment_id) are used as unique keys or idempotency keys.
  • Sending the same webhook twice results in only one state change (same response, no duplicate side effects).
  • The idempotency check and state mutation are performed in a single atomic operation (or idempotency check happens before any side effects).

Durability

  • Jobs are stored in a persistent queue (not in-memory). If a worker crashes, it can resume without data loss.
  • Multi-step operations are logged at each step. If a step fails, the retry is safe (either idempotent or compensating).
  • No temporary state in memory. All state transitions are persisted.

Testing

  • Failure mode tests (network timeouts, duplicate webhooks, out-of-order events) are written and passing.
  • Load tests (100-1000 concurrent requests) are written and passing.
  • Data consistency tests (comparing primary DB vs. secondary DB) are automated.

Observability

  • Critical state transitions are logged (payment received, balance updated, commission queued).
  • Errors are tracked with enough context to replay the transaction (user ID, payment ID, timestamp, error reason).
  • Alerting is configured for anomalies (e.g., more than 1% of webhooks failing).

If a box isn't checked, you don't deploy.


Part 4: Six Essential Strategies for LLM Success

Here are tactical, immediately-applicable strategies you can use right now.

Strategy 1: Explicit Constraints

Replace vague requests with constraint-heavy ones.

Instead of:

"Migrate the webhook handler."

Use:

"Migrate the webhook handler with these constraints:
- Razorpay sends duplicate webhooks for the same payment_id. Must be idempotent.
- Cannot update balance unless payment status is confirmed. Use optimistic locking.
- If the webhook processing fails after recording the payment, the retry must detect this and return 200 OK."

Strategy 2: Force Failure Mode Thinking

Before Claude writes code, ask it to brainstorm failure modes. This shifts its focus from construction to defense.

Ask Claude:

"Before you write the code for the wallet, conduct a pre-mortem. List 5 specific technical failures that could happen in production and how you'd prevent them."

Expected output: Claude lists race conditions, duplicate processing, schema mismatches, etc. You review and debate. Only then does it write code.

Strategy 3: The Self-Review

After code is written, ask Claude to critique its own work.

Ask Claude:

"Review the WalletService code you just wrote. Critique it with these lenses:

The Race Condition Hunter: Find every instance of read-modify-write. For each, either replace it with an atomic operation or explain why it's safe.

The Idempotency Auditor: Trace a duplicate webhook. Show exactly where the code checks the idempotency key. Is the check and mutation atomic? If not, highlight the race window.

The Failure Injector: Simulate a crash mid-operation. Show me exactly what state gets left behind and how the retry handles it.

Provide findings ranked by severity (Critical, High, Medium)."

Claude will often catch its own mistakes and fix them unprompted.

Strategy 4: Tests for Failure, Not Just Function

Don't ask for tests that prove the code works. Ask for tests that prove the code survives.

Instead of:

"Write unit tests for the payment function."

Use:

"Write a test that sends the same webhook 3 times concurrently (not sequentially).
It should:
1. Only process the payment once
2. Return 200 OK for all 3 requests
3. Only charge the user once
4. Log exactly once to the audit trail"

This forces Claude to think about concurrency, idempotency, and side effects—the hard problems.

Strategy 5: Test-First Specification

Instead of asking for code, ask for the tests first. This defines correctness before implementation.

Prompt:

"I need a processPayment function. Before you write the function, write the complete test suite that would prove it's production-ready.

The test suite must include:

  1. Happy Path: Single payment processes successfully.
  2. Concurrency Test: 100 concurrent requests for the same payment ID. Assert only one succeeds, balance increases exactly once.
  3. Idempotency Test: Call twice with the same idempotency key. Second call returns same result without mutating state again.
  4. Failure Handling: Mock a database failure during the commission step. Assert payment is marked pending/failed, and retry doesn't create duplicates.
  5. Timeout Handling: Simulate a timeout. Assert the operation either completes or fails cleanly (no partial state).

Write these tests using Jest. I will review the tests. Once approved, write the code to make them pass."

This technique flips the interaction: the tests become the specification, ensuring rigor before any business logic is generated.

Strategy 6: Architectural Decision Record (ADR)

Before Claude implements a complex architectural choice, ask it to defend the design.

Prompt:

"We're considering DynamoDB (primary) + MongoDB (secondary read store). Before you code, write an Architectural Decision Record that covers:

  • Context: Why we need two databases
  • Decision: Use transactional outbox pattern to sync DynamoDB → MongoDB
  • Consequences:
    • Positive: Read scalability
    • Negative: Eventual consistency, ~1 second lag
    • Risks: If outbox poller crashes, data diverges
  • Failure Scenarios: What if outbox processor crashes mid-flight? (Requires idempotent replay)

Based on this ADR, is this the right choice? If so, generate the code."

This ensures architectural decisions are explicit and understood before they're baked into code.


Part 5: Case Study Deep-Dives

Case Study 1: The Double-Donation Bug

The Bug: A webhook from Razorpay arrived, was processed, but the TCP connection dropped before 200 OK was sent. Razorpay, thinking it failed, resent the webhook. Our handler, seeing what looked like a new request, processed it again, doubling the donation.

The Faulty Code (Claude's First Attempt):

async function handlePaymentWebhook(event) {
  const existing = await db.get({ paymentId: event.id });
  if (!existing) {
    await db.create({ paymentId: event.id, amount: event.amount });
    await queueCommission(event.userId);
  }
  return { status: 'ok' };
}

Why It Failed: The check for existing and the create are not atomic. Here's what happens:

  1. Webhook 1 arrives, checks for existing (none found)
  2. Webhook 1 creates the record
  3. Webhook 1 queues the commission
  4. Network drops, 200 OK not sent
  5. Webhook 2 (retry) arrives
  6. Webhook 2 checks for existing (NOW it's found from Webhook 1)
  7. Webhook 2 returns 200 OK (looks successful)

Wait, that actually works correctly? No—the problem is different. If the network drops before the create happens:

  1. Webhook 1 arrives, checks for existing (none)
  2. Network drops
  3. Webhook 2 (retry) arrives, checks for existing (still none)
  4. Webhook 2 creates and queues
  5. Webhook 1 eventually times out on the client side
  6. Original Webhook 1 gets retried again → creates duplicate

The Fix (With Proper Constraint Prompt):

"Razorpay uses an 'at-least-once' delivery model. Your handler must be idempotent. Use payment_id as the primary key. If you try to insert a record with a key that already exists, DynamoDB will reject it. This gives you atomic idempotency for free—no check-then-act, just atomic insert."

Claude's Fixed Code:

async function handlePaymentWebhook(event) {
  try {
    // This put will fail if the payment_id already exists
    await dynamodb.putItem({
      TableName: 'payments',
      Item: {
        paymentId: { S: event.id },
        amount: { N: event.amount.toString() },
        status: { S: 'PROCESSING' },
        createdAt: { S: new Date().toISOString() }
      },
      ConditionExpression: 'attribute_not_exists(paymentId)'
    }).promise();

    // If we reach here, we're the only one processing this payment
    await queueCommission(event.userId);

    // Update status to COMPLETED
    await dynamodb.updateItem({
      TableName: 'payments',
      Key: { paymentId: { S: event.id } },
      UpdateExpression: 'SET #status = :completed',
      ExpressionAttributeNames: { '#status': 'status' },
      ExpressionAttributeValues: { ':completed': { S: 'COMPLETED' } }
    }).promise();

    return { status: 'ok', idempotent: false };

  } catch (error) {
    if (error.code === 'ConditionalCheckFailedException') {
      // This is a duplicate, just return success
      console.log(`Webhook ${event.id} already processed, returning 200 OK`);
      return { status: 'ok', idempotent: true };
    }
    throw error;
  }
}

The Lesson: The constraint "use atomic operations" led Claude to a fundamentally different design. Without that constraint, it wrote the obvious pattern (check-then-act) that's broken in distributed systems.


Case Study 2: The Race Condition on Commission Calculation

The Bug: Two users donated concurrently. The commission calculation read the total donations count, added 1, and wrote it back. Both read 100, both wrote 101. Lost one commission.

The Faulty Code:

async function recordDonation(userId, amount) {
  const stats = await db.get('donation_stats');
  stats.totalCount += 1;
  stats.totalAmount += amount;
  await db.put('donation_stats', stats);

  await queueCommission(userId, calculateCommission(stats.totalCount));
}

Why It Failed: Read-modify-write on shared state.

Timeline:

  1. User A: Read stats (totalCount=100)
  2. User B: Read stats (totalCount=100)
  3. User A: Increment to 101, write back
  4. User B: Increment to 101, write back ← Lost the increment from User A

The Fix (With Atomic Constraint):

"Commission calculation must use atomic database operations. Never read-modify-write shared state. Use database-native increment operations like DynamoDB's ADD or Postgres' UPDATE ... SET counter = counter + 1."

Claude's Fixed Code:

async function recordDonation(userId, amount) {
  // Atomic increment
  const result = await dynamodb.updateItem({
    TableName: 'donation_stats',
    Key: { statsId: { S: 'global' } },
    UpdateExpression: 'ADD totalCount :inc, totalAmount :amt',
    ExpressionAttributeValues: {
      ':inc': { N: '1' },
      ':amt': { N: amount.toString() }
    },
    ReturnValues: 'ALL_NEW'
  }).promise();

  const newCount = parseInt(result.Attributes.totalCount.N);
  await queueCommission(userId, calculateCommission(newCount));
}

The Lesson: "Atomic operations" is a constraint that Claude can understand and apply. It leads to fundamentally safer code under concurrency.


Case Study 3: The Eventual Consistency Data Drift

The Bug: Payments are written to DynamoDB (primary), then async-synced to MongoDB (search index). A payment sync job crashed. For 2 hours, searches returned stale results. Users couldn't find their recent donations in the public leaderboard.

The Faulty Code:

async function recordPayment(payment) {
  // Write to primary
  await dynamodb.put(payment);

  // Async sync to secondary
  eventBus.emit('payment_recorded', payment);

  // Return immediately
  return { status: 'recorded' };
}

// Async consumer (in a background service)
eventBus.on('payment_recorded', async (payment) => {
  await mongodb.insert(payment);
  // No retry logic, no circuit breaker
});

Why It Failed: The async consumer had no retry logic. If MongoDB was down for 5 minutes, that batch of payments never synced. The event was lost.

The Fix (With Durability Constraint):

"All multi-step operations must persist state at each step. Use the transactional outbox pattern: write the primary record and a log entry in the same transaction. Then, a persistent queue worker processes the log, syncs to secondaries, and marks entries complete. If the worker crashes, it resumes from the last checkpoint."

Claude's Fixed Code:

async function recordPayment(payment) {
  // Use transaction: write payment + outbox entry atomically
  const transaction = {
    Put: {
      TableName: 'payments',
      Item: marshalDynamoItem(payment)
    },
    Put: {
      TableName: 'outbox',
      Item: {
        id: { S: `${payment.id}-sync` },
        event: { S: 'payment_recorded' },
        payload: { S: JSON.stringify(payment) },
        status: { S: 'PENDING' },
        createdAt: { S: new Date().toISOString() }
      }
    }
  };

  await dynamodb.transactWriteItems({
    TransactItems: [transaction.Put, transaction.Put]
  }).promise();

  return { status: 'recorded' };
}

// Outbox processor (persistent, with retries)
async function processOutbox() {
  const pending = await dynamodb.query({
    TableName: 'outbox',
    KeyConditionExpression: '#status = :pending',
    ExpressionAttributeNames: { '#status': 'status' },
    ExpressionAttributeValues: { ':pending': { S: 'PENDING' } }
  }).promise();

  for (const item of pending.Items) {
    try {
      const payload = JSON.parse(item.payload.S);
      await mongodb.insert(payload);

      // Mark as processed
      await dynamodb.updateItem({
        TableName: 'outbox',
        Key: { id: item.id },
        UpdateExpression: 'SET #status = :done',
        ExpressionAttributeNames: { '#status': 'status' },
        ExpressionAttributeValues: { ':done': { S: 'PROCESSED' } }
      }).promise();
    } catch (error) {
      // Exponential backoff retry, log error, continue
      console.error(`Failed to sync ${item.id}`, error);
    }
  }
}

// Run processor every 10 seconds
setInterval(processOutbox, 10000);

The Lesson: "Durability" as a constraint led Claude to the transactional outbox pattern—a sophisticated design that ensures no writes are lost, even if secondary systems fail.


Part 6: Failure Mode Taxonomy

Before you even talk to Claude, you need to know what kinds of failures exist in your domain. Here's a taxonomy of the top 10.

# Failure Mode Example Prevention
1 Race Condition Two concurrent deposits both read balance=100, both write 150 (lose one deposit) Atomic database operations, optimistic locking, or pessimistic locking
2 Duplicate Webhook Processing Razorpay sends webhook 3 times due to retries. We process all 3, charge user 3x Idempotency keys, unique constraints on external IDs
3 Partial Failure We write to DynamoDB, but fail to enqueue the commission. User sees payment recorded but no commission Transactional outbox, two-phase commit, or compensating transactions
4 Out-of-Order Events Commission webhook arrives before donation webhook. System crashes. Commission is lost. Event versioning, causal ordering, or replaying from a log
5 Lost In-Flight Request Client sends payment request, gets no response (network timeout). Doesn't know if it was processed. Retries. We charge twice. Idempotency keys on the client side, request deduplication server-side
6 Data Divergence Primary DB (DynamoDB) and secondary DB (MongoDB) get out of sync due to failed syncs. Search results are stale. Transactional outbox, event sourcing, or periodic reconciliation
7 Cascading Failure Payment service goes down. Commission service waits for payment service. Both queue up. OOM. Circuit breakers, timeouts, graceful degradation
8 Phantom Read Process checks if a user exists, proceeds. Meanwhile, user is deleted. Process crashes with foreign key error. Serializable isolation level, row-level locks, or application-level validation
9 Silent Data Loss Job queue consumer crashes. Messages are lost without retry. Persistent queues, acknowledgment-based processing, dead letter queues
10 Invisible State Inconsistency Donation is in audit log, but user's balance never updated. System appears healthy. Data consistency checks, reconciliation jobs, alerts on divergence

Your job: For each feature Claude builds, identify which of these apply. Then explicitly tell Claude about them.


Part 7: The Pre-Code Checklist

Before you even open a chat with Claude, print this out and fill it in:

Feature: [What are you building?]

Failure Modes That Apply (from the taxonomy above):

  • Race Condition
  • Duplicate Processing
  • Partial Failure
  • Out-of-Order Events
  • Lost Requests
  • Data Divergence
  • Cascading Failure
  • Phantom Reads
  • Silent Data Loss
  • State Inconsistency

Constraints I Will Explicitly State:

  • Atomic operations required
  • Idempotency required (key: _______)
  • Durability required (persistent queue: yes/no)
  • Ordering required (yes/no)
  • Consistency level (strong/eventual, SLA: ______)
  • Concurrency level (peak: ______ requests/sec)

Tests I Will Demand:

  • Happy path test
  • Concurrency test (N concurrent requests)
  • Failure mode test (e.g., crash mid-operation)
  • Idempotency test (duplicate request)
  • Load test (peak concurrency)

Production Readiness Checklist Items (from Part 3):

  • Atomicity
  • Idempotency
  • Durability
  • Testing
  • Observability

Once all checkboxes are ticked, you're ready to ask Claude.


Part 8: Integration with Your Development Workflow

Step 1: Pre-Code Phase (30 minutes)

  1. Print the Pre-Code Checklist
  2. Identify failure modes from the taxonomy
  3. Write the Constraint Matrix
  4. Draft the pre-code prompt for Claude

Step 2: Claude's Analysis Phase (15 minutes)

  1. Paste the pre-code prompt
  2. Claude returns failure mode analysis
  3. You review and iterate

Step 3: Code Generation (30 minutes)

  1. Ask Claude to write code with constraints
  2. Ask Claude to self-review
  3. Get the test suite

Step 4: Post-Code Review (1 hour)

  1. Run through the Production Readiness Checklist
  2. Run the concurrency and failure mode tests
  3. Check observability/logging

Step 5: Merge & Deploy

Only deploy once every checklist item is checked.


Part 9: Common Mistakes to Avoid

Mistake 1: Trusting the Happy Path

"The code compiles and basic tests pass, so it's ready."

Reality: The happy path is the 1% of scenarios your code encounters. 99% of production is failures, retries, and edge cases.

Fix: Always write failure mode tests before deployment.


Mistake 2: Vague Constraints

"Build a payment system that's scalable."

Reality: "Scalable" means nothing to Claude. It could mean "fast" or "handles 1000 users" or "handles race conditions." Be specific.

Fix: Use the Constraint Matrix. Name the specific failure modes and the specific database operations required.


Mistake 3: Reviewing Code, Not Design

"I'll review the code and make sure it looks good."

Reality: If the design is wrong (e.g., read-modify-write for shared state), no amount of code review fixes it.

Fix: Review the design before the code. Use the Pre-Mortem prompt.


Mistake 4: Skipping Concurrency Tests

"The unit tests pass, so I'm shipping."

Reality: Race conditions only appear under concurrency. Unit tests don't run concurrently by default.

Fix: Demand concurrency tests. Require tests that fire 100 concurrent requests to the same resource.


Mistake 5: Over-Trusting Claude

"Claude's a senior engineer, I don't need to review this."

Reality: Claude is a pattern matcher, not a domain expert. It doesn't know your business, your users, or your failure modes.

Fix: Claude is a tool, not a peer. Use the constraints and checklists to guide it.


Conclusion: The New Skill

The developers getting the most out of LLMs aren't the ones who can write the most complex prompts. They're the ones who understand systems design.

They know that the code is just the final output of a process that requires:

  1. Thinking about failure modes
  2. Specifying hard constraints
  3. Verifying against a strict checklist
  4. Testing failure scenarios
  5. Observing the system in production

Claude can generate 3,800 lines of code in an hour. But it can't know that money is special unless you tell it.

The real skill isn't asking Claude to code. It's knowing what to ask for.


Quick Reference: The 30-Second Summary

  1. BEFORE CODING: Give Claude explicit constraints and ask it to identify failure modes.
  2. DURING CODING: Have Claude self-review against those constraints.
  3. AFTER CODING: Run the code through a rigorous Production Safety Checklist.
  4. BEFORE DEPLOYING: Verify every single item on that checklist.

Do this, and you'll ship production-ready code. Ignore this, and you'll find the bugs yourself—in production.


Appendix: Templates You Can Copy/Paste

The Pre-Mortem Prompt

Claude, before you write any code for [FEATURE], conduct a project pre-mortem.

Imagine it is 3 months from now. Our system has just caused a major production incident.

1. List 5 specific technical failures that could have caused this incident.
2. For each, explain how it would happen in the code.
3. For each, propose a specific design pattern or database feature that prevents it.

Only after you've given me this analysis, start coding.

The Self-Review Prompt

Review the code you just wrote. Critique it with these lenses:

**The Race Condition Hunter:**
Identify every instance where you read a value, modify it in memory, and write it back.
Flag any loops that contain database writes.

**The Idempotency Auditor:**
Trace the path of a duplicate request. Show exactly where the code checks for idempotency.
Is the check and state change atomic? If not, highlight the race window.

**The Failure Injector:**
Look at the multi-step operation. Simulate crashes at each step.
If the process crashes mid-way, will retry logic handle it correctly?

Provide a bulleted list of findings, ranked by severity (Critical, High, Medium).

The Concurrency Test Prompt

Write a test that proves this system handles concurrency correctly.

The test must:
1. Fire off N concurrent requests to [OPERATION]
2. Assert that the result is identical to N sequential requests
3. Assert that shared state (balance, count, etc.) is correct

For example:
- 100 concurrent deposits to the same wallet
- Assert: balance increased by exactly 100x the deposit amount
- Assert: ledger has exactly 100 entries

Write using Jest.

The Production Readiness Checklist

[ ] All monetary/critical operations use atomic DB commands
[ ] Concurrency tests (100+ simultaneous operations) pass
[ ] Idempotency tests (duplicate requests) pass
[ ] Failure mode tests (timeouts, crashes) pass
[ ] Load tests (peak concurrency) pass
[ ] Observability: Critical operations are logged
[ ] Observability: Errors include enough context to replay
[ ] Alerting: Configured for anomalies
[ ] Data consistency: Reconciliation job exists and runs
[ ] Every item checked: Ready to deploy

Published: March 12, 2026 Author: Mathematicon Engineering Tags: #LLM #AI-Assisted-Code #Production #Systems-Design #Database #Concurrency #Best-Practices

Share this article

Related Articles