What does Claude Code catch well during code review?

Claude Code reliably catches null safety issues, security vulnerabilities like SQL injection and path traversal, pattern inconsistencies relative to the codebase, and missing test coverage.

What does Claude Code miss in code review?

It misses business logic correctness, architecture decisions, performance problems that only appear at scale, and team context such as deprecation plans or recent team agreements.

How do I get better code review feedback from Claude Code?

Use focused prompts with a single lens — error handling, security, or test coverage — rather than a generic review request. Focused reviews produce about 3 times better findings.

How much did bug detection improve with AI-assisted code review?

After 3 months of two-pass review, bugs caught before merge improved from about 60% to 85% and security issues caught improved from about 40% to 90%.

How I Use Claude Code for Code Review (Full Workflow)

TL;DR — Claude Code excels at null safety, security vulnerabilities, pattern consistency, and missing test coverage — catching ~85% of bugs and ~90% of security issues before merge. It misses business logic correctness, architecture decisions, and team context. Use it as a first-pass reviewer, not a final one. Jump to the two-pass workflow →

I used to dread Monday morning code reviews. A stack of 6-8 PRs from the weekend, each one a context switch. By PR #4, I was skimming. By PR #6, I was approving things I shouldn’t have been. We all know the feeling.

Three months ago, I started using Claude Code as a first-pass reviewer before I look at any PR myself. Not to replace human review, but to handle the mechanical parts so I can focus on the parts that actually need a human brain. The results have been surprising, in both directions.

Here’s an honest breakdown of what Claude Code catches, what it completely misses, and the exact prompts I use to get senior-level feedback. This post zooms in on code review; for everything else Claude Code does — setup, hooks, subagents, plugins, production patterns — see our complete Claude Code guide.

How Do You Set Up Claude Code for Code Review?

My workflow is simple. Before opening a PR in my browser, I run Claude Code on the branch:

# Checkout the PR branch
git checkout feature/user-preferences

# Ask Claude Code to review the changes
claude -p "Review the changes in this branch compared to main.
Focus on:
1. Bugs and logic errors
2. Security issues
3. Performance problems
4. Code style inconsistencies
5. Missing edge cases

Be specific. Reference file names and line numbers.
For each issue, explain WHY it's a problem and suggest a fix.
Rate severity: CRITICAL / WARNING / SUGGESTION."

This takes 30-60 seconds and produces a structured review. I read Claude’s feedback, then open the PR knowing exactly where to focus my attention.

What Does Claude Code Catch Brilliantly in Code Review?

1. Null Safety and Edge Cases

This is Claude Code’s superpower in reviews. It’s relentless about checking every code path for potential null values, empty arrays, and boundary conditions.

In one PR, a developer added a payment processing function:

fun processRefund(orderId: String) {
    val order = orderRepository.findById(orderId)
    val payment = order.payment
    paymentGateway.refund(payment.transactionId, payment.amount)
}

Claude flagged three issues in four lines:

findById can return null (no null check)
order.payment can be null if order was free (zero amount)
No check for already-refunded payments

A human reviewer might catch the first one. The second and third require knowing the domain model. Claude caught all three because it read the Order data class and saw that payment was nullable.

2. Security Vulnerabilities

Claude Code is genuinely good at catching OWASP-style vulnerabilities. It consistently flags:

SQL injection: String concatenation in queries
XSS: Unescaped user input in templates
Path traversal: User-controlled file paths
Missing auth checks: Endpoints without authentication middleware
Exposed secrets: API keys, tokens, connection strings in code

In one review, it caught a path traversal vulnerability that had been in our codebase for months:

app.get('/download/:filename', (req, res) => {
    const file = path.join(uploadsDir, req.params.filename);
    res.sendFile(file);
});

Claude’s review: “CRITICAL: Path traversal vulnerability. A request to /download/../../etc/passwd would escape the uploads directory. Validate that the resolved path starts with uploadsDir after path.resolve().”

That’s exactly the kind of thing that slips past tired human reviewers.

3. Inconsistent Patterns

When your codebase has established patterns, Claude Code notices when a PR breaks them. It catches:

Using callbacks when the rest of the codebase uses async/await
Direct database access when other code uses the repository pattern
Inline error handling when there’s a global error handler
Different naming conventions in the same module

SUGGESTION: This function uses try/catch with manual error responses,
but all other endpoints in this module use the asyncHandler wrapper
(see src/middleware/asyncHandler.ts). Using asyncHandler would maintain
consistency and ensure errors are logged through the central handler.

This is the kind of feedback that takes a human reviewer deep familiarity with the codebase. Claude gets it from reading the surrounding files.

4. Missing Test Coverage

Claude Code reliably spots when new code paths lack tests. It’s especially good at noticing:

New branches (if/else) without corresponding test cases
Error paths that are never tested
New public methods without any test
Modified logic where existing tests don’t cover the change

WARNING: The new validateEmail() function has 4 distinct code paths
(valid email, empty string, missing @, domain without TLD) but
the test file only covers the valid case and empty string. Missing
tests for: malformed emails without @, and domains without TLD.

What Does Claude Code Miss Completely in Code Review?

1. Business Logic Correctness

Claude Code can tell you if code will crash. It cannot tell you if code does the right thing. This is the biggest gap.

A developer implemented a discount calculation:

def calculate_discount(user, cart_total):
    if user.is_premium:
        return cart_total * 0.15
    if cart_total > 100:
        return cart_total * 0.10
    return 0

Claude reviewed this and found no issues. But the business requirement was: premium users get 15% OR the cart discount, whichever is higher, not 15% always. The code gives premium users 15% even when the cart discount would be 0% (cart under $100). For a $50 cart, a premium user should get $7.50 (15%), which is correct. But for a $200 cart, a premium user gets $30 (15%) when they should get $30 (15%) or $20 (10%), whichever is higher, so 15% is actually correct here.

Bad example. Let me give a better one. The real issue was: the PM wanted discounts to stack (15% + 10% = 25% for premium users with carts over $100). Claude couldn’t know that because it was in a Jira ticket, not in the code. This is the fundamental limitation: Claude reviews code against code, not code against requirements.

2. Architecture and Design Decisions

Claude can tell you if a function is well-written. It struggles to tell you if a function should exist at all.

In one PR, a developer created a new NotificationService that duplicated functionality from our existing EventBus. Claude reviewed the code quality of NotificationService and found it clean. But the right review feedback was: “We already have EventBus for this. Don’t create a parallel system.”

Architecture reviews require understanding the system’s history and direction. Why did we choose EventBus? What problems does it solve? Where is the system heading? Claude doesn’t have that context unless you explicitly provide it.

3. Performance at Scale

Claude catches obvious performance issues: N+1 queries, missing indexes, O(n²) loops on large collections. But it misses performance problems that only manifest at scale.

It won’t flag:

A database query that works fine with 1,000 rows but kills the DB at 1 million
A caching strategy that works in single-server but fails in distributed deployments
Memory allocation patterns that cause GC pressure under high throughput
API response sizes that are fine in dev but massive in production

These require production experience and system knowledge that no AI currently has.

4. Team and Project Context

Claude doesn’t know:

That a developer is junior and needs more detailed feedback
That this code area has caused production incidents before (and needs extra scrutiny)
That the team agreed last week to stop using a certain library
That a module is being deprecated and new code shouldn’t extend it
Political/team dynamics around code ownership

These are human judgment calls that can’t be delegated.

Which Prompt Patterns Get Senior-Level Code Review Feedback?

Pattern 1: The Focused Review

Instead of “review this code,” give Claude a specific lens:

"Review src/api/payments.ts focusing ONLY on error handling.
For each function:
- What errors can occur?
- Are they all caught?
- Are error messages helpful for debugging?
- Could any error leave the system in an inconsistent state?"

Focused reviews produce 3x better feedback than general reviews.

Pattern 2: The Adversarial Review

Ask Claude to think like an attacker:

"You are a security auditor. Review these changes assuming that all
user inputs are malicious. For each endpoint:
- Can inputs be crafted to break the system?
- Are there any data leaks in error messages or responses?
- Can the rate limiting be bypassed?
- Are there any IDOR vulnerabilities?"

This consistently surfaces security issues that generic reviews miss.

Pattern 3: The Regression Check

Ask Claude to think about what could break:

"These changes modify the user authentication flow. Analyze:
- What existing features depend on this auth flow?
- Could any of them break with these changes?
- Are there any backwards compatibility issues?
- What should the QA team test after this merge?"

This mimics what experienced developers do mentally: trace the impact of changes through the system.

Pattern 4: The Teaching Review

For junior developers’ PRs, ask Claude to explain:

"Review this PR as if the author is a junior developer.
For each issue:
- Explain WHY it's a problem (not just WHAT to change)
- Link it to a general principle or best practice
- Show the corrected code
- Rate: must-fix vs nice-to-have"

This produces review comments that teach, not just critique.

Try it now: Take any PR you’re currently reviewing and run this prompt: “Review [filename] focusing ONLY on security. Assume all user inputs are malicious. List every endpoint and whether inputs are validated, escaped, and rate-limited.” Compare it to your own review findings.

What Is the Two-Pass Code Review Workflow?

Here’s how human and AI review work together:

Pass 1 - Claude Code (5 minutes):

Run the review prompt above
Catches: bugs, security issues, style, missing tests, edge cases
Output: structured list of issues with severity ratings

Pass 2 - Human (15 minutes, focused):

Read Claude’s findings, verify each one
Then focus on what Claude can’t do:
- Does this solve the right problem?
- Does the approach fit our architecture?
- Will this scale?
- Is this the right abstraction level?
- How does this affect the team’s velocity?

Total time: 20 minutes (down from 40 minutes of unfocused human review)

The quality is higher because my 15 minutes of human review are focused on the hard parts, not wasted on catching missing null checks.

Get weekly Claude Code tips — One practical tip every week. No fluff, no spam. Subscribe to AI Developer Weekly →

What Do the Numbers Look Like After 3 Months of AI Code Review?

Metric	Before	After
Average review time	40 min	20 min
Bugs caught before merge	~60%	~85%
Security issues caught	~40%	~90%
Style/consistency issues	~50%	~95%
Architecture issues caught	~70%	~70%
PRs reviewed per day	4-5	8-10

The biggest wins are in security and consistency, the areas where tiredness and familiarity blindness hurt human reviewers the most. Architecture review quality stayed the same because that’s purely a human skill.

When Should You NOT Use Claude Code for Code Review?

Highly sensitive code (crypto implementations, auth systems): these need specialist human review
Architecture decisions: use Claude for analysis, but decisions need human judgment
First PRs from new team members: they need human mentorship, not AI feedback
Controversial changes: code that touches team dynamics needs human EQ

What Are the Key Takeaways for AI-Assisted Code Review?

Use Claude Code as first-pass, not final-pass. It handles mechanical review so you can focus on judgment.
Be specific in your prompts. Focused reviews beat general reviews 3x.
Claude excels at: null safety, security, consistency, test coverage. Trust it here.
Claude misses: business logic, architecture, scale, team context. Don’t delegate these.
The two-pass system saves 50% of review time while improving quality.
Security reviews are Claude’s killer feature. It never gets tired of checking for injection attacks.

Code review isn’t going away. But the tedious parts (the parts where humans are worst) are exactly where AI is best. Let Claude handle those. Keep the human parts human. Automating that first-pass review with hooks and verification loops is exactly the kind of system-level thinking covered in harness engineering for Claude Code — building the scaffolding that makes AI output reliable before it reaches a human reviewer.

Code review with Claude Code is covered in Phase 10: Team Collaboration of the Claude Code Mastery course. Phases 1-3 are free.

What to Read Next

5 Claude Code Mistakes Every Developer Makes — Mistake #3 (accepting first output without review) is the code review habit that changes results overnight
Claude Code Security: Real Projects — Claude’s security review capability in depth, with real vulnerability examples from production codebases
Claude Code Hooks Guide — Automate the review workflow itself: hooks that trigger Claude review on every git commit or file edit

Your Takeaway — Copy & Use Today

git diff main | claude -p "Review these changes. Focus on:
bugs, security, performance, edge cases.
Be specific — reference file names and line numbers."

FAQ

Common questions

What does Claude Code catch well during code review?: Claude Code reliably catches null safety issues, security vulnerabilities like SQL injection and path traversal, pattern inconsistencies relative to the codebase, and missing test coverage.
What does Claude Code miss in code review?: It misses business logic correctness, architecture decisions, performance problems that only appear at scale, and team context such as deprecation plans or recent team agreements.
How do I get better code review feedback from Claude Code?: Use focused prompts with a single lens — error handling, security, or test coverage — rather than a generic review request. Focused reviews produce about 3 times better findings.
What is the two-pass code review workflow?: Claude Code runs first in 5 minutes covering bugs, security, and style. A human reviewer then spends 15 focused minutes on business logic, architecture, and team context — total 20 minutes.
How much did bug detection improve with AI-assisted code review?: After 3 months of two-pass review, bugs caught before merge improved from about 60% to 85% and security issues caught improved from about 40% to 90%.

COURSE

Go deeper with Claude Code Mastery

This article covers the basics. The full course walks you through building production AI workflows from scratch.

Start the Course