I Replaced My Code Review Checklist with Claude Code. Here's What It Catches (And What It Misses).

I used to dread Monday morning code reviews. A stack of 6-8 PRs from the weekend, each one a context switch. By PR #4, I was skimming. By PR #6, I was approving things I shouldn’t have been. We all know the feeling.

Three months ago, I started using Claude Code as a first-pass reviewer before I look at any PR myself. Not to replace human review — but to handle the mechanical parts so I can focus on the parts that actually need a human brain. The results have been surprising, in both directions.

Here’s an honest breakdown of what Claude Code catches, what it completely misses, and the exact prompts I use to get senior-level feedback.

The Setup: How I Review with Claude Code

My workflow is simple. Before opening a PR in my browser, I run Claude Code on the branch:

# Checkout the PR branch
git checkout feature/user-preferences

# Ask Claude Code to review the changes
claude -p "Review the changes in this branch compared to main.
Focus on:
1. Bugs and logic errors
2. Security issues
3. Performance problems
4. Code style inconsistencies
5. Missing edge cases

Be specific. Reference file names and line numbers.
For each issue, explain WHY it's a problem and suggest a fix.
Rate severity: CRITICAL / WARNING / SUGGESTION."

This takes 30-60 seconds and produces a structured review. I read Claude’s feedback, then open the PR knowing exactly where to focus my attention.

What Claude Code Catches Brilliantly

1. Null Safety and Edge Cases

This is Claude Code’s superpower in reviews. It’s relentless about checking every code path for potential null values, empty arrays, and boundary conditions.

In one PR, a developer added a payment processing function:

fun processRefund(orderId: String) {
    val order = orderRepository.findById(orderId)
    val payment = order.payment
    paymentGateway.refund(payment.transactionId, payment.amount)
}

Claude flagged three issues in four lines:

findById can return null — no null check
order.payment can be null if order was free (zero amount)
No check for already-refunded payments

A human reviewer might catch the first one. The second and third require knowing the domain model. Claude caught all three because it read the Order data class and saw that payment was nullable.

2. Security Vulnerabilities

Claude Code is genuinely good at catching OWASP-style vulnerabilities. It consistently flags:

SQL injection: String concatenation in queries
XSS: Unescaped user input in templates
Path traversal: User-controlled file paths
Missing auth checks: Endpoints without authentication middleware
Exposed secrets: API keys, tokens, connection strings in code

In one review, it caught a path traversal vulnerability that had been in our codebase for months:

app.get('/download/:filename', (req, res) => {
    const file = path.join(uploadsDir, req.params.filename);
    res.sendFile(file);
});

Claude’s review: “CRITICAL: Path traversal vulnerability. A request to /download/../../etc/passwd would escape the uploads directory. Validate that the resolved path starts with uploadsDir after path.resolve().”

That’s exactly the kind of thing that slips past tired human reviewers.

3. Inconsistent Patterns

When your codebase has established patterns, Claude Code notices when a PR breaks them. It catches:

Using callbacks when the rest of the codebase uses async/await
Direct database access when other code uses the repository pattern
Inline error handling when there’s a global error handler
Different naming conventions in the same module

SUGGESTION: This function uses try/catch with manual error responses,
but all other endpoints in this module use the asyncHandler wrapper
(see src/middleware/asyncHandler.ts). Using asyncHandler would maintain
consistency and ensure errors are logged through the central handler.

This is the kind of feedback that takes a human reviewer deep familiarity with the codebase. Claude gets it from reading the surrounding files.

4. Missing Test Coverage

Claude Code reliably spots when new code paths lack tests. It’s especially good at noticing:

New branches (if/else) without corresponding test cases
Error paths that are never tested
New public methods without any test
Modified logic where existing tests don’t cover the change

WARNING: The new validateEmail() function has 4 distinct code paths
(valid email, empty string, missing @, domain without TLD) but
the test file only covers the valid case and empty string. Missing
tests for: malformed emails without @, and domains without TLD.

What Claude Code Misses Completely

1. Business Logic Correctness

Claude Code can tell you if code will crash. It cannot tell you if code does the right thing. This is the biggest gap.

A developer implemented a discount calculation:

def calculate_discount(user, cart_total):
    if user.is_premium:
        return cart_total * 0.15
    if cart_total > 100:
        return cart_total * 0.10
    return 0

Claude reviewed this and found no issues. But the business requirement was: premium users get 15% OR the cart discount, whichever is higher — not 15% always. The code gives premium users 15% even when the cart discount would be 0% (cart under $100). For a $50 cart, a premium user should get $7.50 (15%), which is correct. But for a $200 cart, a premium user gets $30 (15%) when they should get $30 (15%) or $20 (10%), whichever is higher — so 15% is actually correct here.

Bad example. Let me give a better one. The real issue was: the PM wanted discounts to stack (15% + 10% = 25% for premium users with carts over $100). Claude couldn’t know that because it was in a Jira ticket, not in the code. This is the fundamental limitation — Claude reviews code against code, not code against requirements.

2. Architecture and Design Decisions

Claude can tell you if a function is well-written. It struggles to tell you if a function should exist at all.

In one PR, a developer created a new NotificationService that duplicated functionality from our existing EventBus. Claude reviewed the code quality of NotificationService and found it clean. But the right review feedback was: “We already have EventBus for this. Don’t create a parallel system.”

Architecture reviews require understanding the system’s history and direction. Why did we choose EventBus? What problems does it solve? Where is the system heading? Claude doesn’t have that context unless you explicitly provide it.

3. Performance at Scale

Claude catches obvious performance issues — N+1 queries, missing indexes, O(n²) loops on large collections. But it misses performance problems that only manifest at scale.

It won’t flag:

A database query that works fine with 1,000 rows but kills the DB at 1 million
A caching strategy that works in single-server but fails in distributed deployments
Memory allocation patterns that cause GC pressure under high throughput
API response sizes that are fine in dev but massive in production

These require production experience and system knowledge that no AI currently has.

4. Team and Project Context

Claude doesn’t know:

That a developer is junior and needs more detailed feedback
That this code area has caused production incidents before (and needs extra scrutiny)
That the team agreed last week to stop using a certain library
That a module is being deprecated and new code shouldn’t extend it
Political/team dynamics around code ownership

These are human judgment calls that can’t be delegated.

The Prompt Patterns That Get Senior-Level Feedback

Pattern 1: The Focused Review

Instead of “review this code,” give Claude a specific lens:

"Review src/api/payments.ts focusing ONLY on error handling.
For each function:
- What errors can occur?
- Are they all caught?
- Are error messages helpful for debugging?
- Could any error leave the system in an inconsistent state?"

Focused reviews produce 3x better feedback than general reviews.

Pattern 2: The Adversarial Review

Ask Claude to think like an attacker:

"You are a security auditor. Review these changes assuming that all
user inputs are malicious. For each endpoint:
- Can inputs be crafted to break the system?
- Are there any data leaks in error messages or responses?
- Can the rate limiting be bypassed?
- Are there any IDOR vulnerabilities?"

This consistently surfaces security issues that generic reviews miss.

Pattern 3: The Regression Check

Ask Claude to think about what could break:

"These changes modify the user authentication flow. Analyze:
- What existing features depend on this auth flow?
- Could any of them break with these changes?
- Are there any backwards compatibility issues?
- What should the QA team test after this merge?"

This mimics what experienced developers do mentally — trace the impact of changes through the system.

Pattern 4: The Teaching Review

For junior developers’ PRs, ask Claude to explain:

"Review this PR as if the author is a junior developer.
For each issue:
- Explain WHY it's a problem (not just WHAT to change)
- Link it to a general principle or best practice
- Show the corrected code
- Rate: must-fix vs nice-to-have"

This produces review comments that teach, not just critique.

My Workflow: The Two-Pass System

Here’s how human and AI review work together:

Pass 1 — Claude Code (5 minutes):

Run the review prompt above
Catches: bugs, security issues, style, missing tests, edge cases
Output: structured list of issues with severity ratings

Pass 2 — Human (15 minutes, focused):

Read Claude’s findings, verify each one
Then focus on what Claude can’t do:
- Does this solve the right problem?
- Does the approach fit our architecture?
- Will this scale?
- Is this the right abstraction level?
- How does this affect the team’s velocity?

Total time: 20 minutes (down from 40 minutes of unfocused human review)

The quality is higher because my 15 minutes of human review are focused on the hard parts, not wasted on catching missing null checks.

The Numbers After 3 Months

Metric	Before	After
Average review time	40 min	20 min
Bugs caught before merge	~60%	~85%
Security issues caught	~40%	~90%
Style/consistency issues	~50%	~95%
Architecture issues caught	~70%	~70%
PRs reviewed per day	4-5	8-10

The biggest wins are in security and consistency — areas where tiredness and familiarity blindness hurt human reviewers the most. Architecture review quality stayed the same because that’s purely a human skill.

When NOT to Use Claude Code for Review

Highly sensitive code (crypto implementations, auth systems) — these need specialist human review
Architecture decisions — use Claude for analysis, but decisions need human judgment
First PRs from new team members — they need human mentorship, not AI feedback
Controversial changes — code that touches team dynamics needs human EQ

Key Takeaways

Use Claude Code as first-pass, not final-pass. It handles mechanical review so you can focus on judgment.
Be specific in your prompts. Focused reviews beat general reviews 3x.
Claude excels at: null safety, security, consistency, test coverage. Trust it here.
Claude misses: business logic, architecture, scale, team context. Don’t delegate these.
The two-pass system saves 50% of review time while improving quality.
Security reviews are Claude’s killer feature. It never gets tired of checking for injection attacks.

Code review isn’t going away. But the tedious parts — the parts where humans are worst — are exactly where AI is best. Let Claude handle those. Keep the human parts human.

Code review with Claude Code is covered in Phase 10: Team Collaboration of the Claude Code Mastery course. Phases 1-3 are free.