TL;DR — Claude Code excels at null safety, security vulnerabilities, pattern consistency, and missing test coverage — catching ~85% of bugs and ~90% of security issues before merge. It misses business logic correctness, architecture decisions, and team context. Use it as a first-pass reviewer, not a final one. Jump to the two-pass workflow →
I used to dread Monday morning code reviews. A stack of 6-8 PRs from the weekend, each one a context switch. By PR #4, I was skimming. By PR #6, I was approving things I shouldn’t have been. We all know the feeling.
Three months ago, I started using Claude Code as a first-pass reviewer before I look at any PR myself. Not to replace human review, but to handle the mechanical parts so I can focus on the parts that actually need a human brain. The results have been surprising, in both directions.
Here’s an honest breakdown of what Claude Code catches, what it completely misses, and the exact prompts I use to get senior-level feedback. This post zooms in on code review; for everything else Claude Code does — setup, hooks, subagents, plugins, production patterns — see our complete Claude Code guide.
How Do You Set Up Claude Code for Code Review?
My workflow is simple. Before opening a PR in my browser, I run Claude Code on the branch:
# Checkout the PR branchgit checkout feature/user-preferences
# Ask Claude Code to review the changesclaude -p "Review the changes in this branch compared to main.Focus on:1. Bugs and logic errors2. Security issues3. Performance problems4. Code style inconsistencies5. Missing edge cases
Be specific. Reference file names and line numbers.For each issue, explain WHY it's a problem and suggest a fix.Rate severity: CRITICAL / WARNING / SUGGESTION."This takes 30-60 seconds and produces a structured review. I read Claude’s feedback, then open the PR knowing exactly where to focus my attention.
Key insight: A Claude Code code review prompt takes 30-60 seconds to run and produces structured output with severity ratings (CRITICAL / WARNING / SUGGESTION), file names, and line numbers. The key is specificity: asking for a focused review on one dimension (error handling, security, test coverage) produces 3x better feedback than a generic “review this code” request.
What Does Claude Code Catch Brilliantly in Code Review?
1. Null Safety and Edge Cases
This is Claude Code’s superpower in reviews. It’s relentless about checking every code path for potential null values, empty arrays, and boundary conditions.
In one PR, a developer added a payment processing function:
fun processRefund(orderId: String) { val order = orderRepository.findById(orderId) val payment = order.payment paymentGateway.refund(payment.transactionId, payment.amount)}Claude flagged three issues in four lines:
findByIdcan return null (no null check)order.paymentcan be null if order was free (zero amount)- No check for already-refunded payments
A human reviewer might catch the first one. The second and third require knowing the domain model. Claude caught all three because it read the Order data class and saw that payment was nullable.
2. Security Vulnerabilities
Claude Code is genuinely good at catching OWASP-style vulnerabilities. It consistently flags:
- SQL injection: String concatenation in queries
- XSS: Unescaped user input in templates
- Path traversal: User-controlled file paths
- Missing auth checks: Endpoints without authentication middleware
- Exposed secrets: API keys, tokens, connection strings in code
In one review, it caught a path traversal vulnerability that had been in our codebase for months:
app.get('/download/:filename', (req, res) => { const file = path.join(uploadsDir, req.params.filename); res.sendFile(file);});Claude’s review: “CRITICAL: Path traversal vulnerability. A request to /download/../../etc/passwd would escape the uploads directory. Validate that the resolved path starts with uploadsDir after path.resolve().”
That’s exactly the kind of thing that slips past tired human reviewers.
3. Inconsistent Patterns
When your codebase has established patterns, Claude Code notices when a PR breaks them. It catches:
- Using callbacks when the rest of the codebase uses async/await
- Direct database access when other code uses the repository pattern
- Inline error handling when there’s a global error handler
- Different naming conventions in the same module
SUGGESTION: This function uses try/catch with manual error responses,but all other endpoints in this module use the asyncHandler wrapper(see src/middleware/asyncHandler.ts). Using asyncHandler would maintainconsistency and ensure errors are logged through the central handler.This is the kind of feedback that takes a human reviewer deep familiarity with the codebase. Claude gets it from reading the surrounding files.
4. Missing Test Coverage
Claude Code reliably spots when new code paths lack tests. It’s especially good at noticing:
- New branches (if/else) without corresponding test cases
- Error paths that are never tested
- New public methods without any test
- Modified logic where existing tests don’t cover the change
WARNING: The new validateEmail() function has 4 distinct code paths(valid email, empty string, missing @, domain without TLD) butthe test file only covers the valid case and empty string. Missingtests for: malformed emails without @, and domains without TLD.Key insight: Claude Code’s code review strengths cluster around four areas: null safety and edge cases (catching all nullable paths a human reviewer might miss), security vulnerabilities at OWASP severity (SQL injection, path traversal, missing auth checks), pattern inconsistencies relative to the existing codebase, and untested code branches. After 3 months of production use, bug detection rates improved from ~60% to ~85% before merge.
What Does Claude Code Miss Completely in Code Review?
1. Business Logic Correctness
Claude Code can tell you if code will crash. It cannot tell you if code does the right thing. This is the biggest gap.
A developer implemented a discount calculation:
def calculate_discount(user, cart_total): if user.is_premium: return cart_total * 0.15 if cart_total > 100: return cart_total * 0.10 return 0Claude reviewed this and found no issues. But the business requirement was: premium users get 15% OR the cart discount, whichever is higher, not 15% always. The code gives premium users 15% even when the cart discount would be 0% (cart under $100). For a $50 cart, a premium user should get $7.50 (15%), which is correct. But for a $200 cart, a premium user gets $30 (15%) when they should get $30 (15%) or $20 (10%), whichever is higher, so 15% is actually correct here.
Bad example. Let me give a better one. The real issue was: the PM wanted discounts to stack (15% + 10% = 25% for premium users with carts over $100). Claude couldn’t know that because it was in a Jira ticket, not in the code. This is the fundamental limitation: Claude reviews code against code, not code against requirements.
2. Architecture and Design Decisions
Claude can tell you if a function is well-written. It struggles to tell you if a function should exist at all.
In one PR, a developer created a new NotificationService that duplicated functionality from our existing EventBus. Claude reviewed the code quality of NotificationService and found it clean. But the right review feedback was: “We already have EventBus for this. Don’t create a parallel system.”
Architecture reviews require understanding the system’s history and direction. Why did we choose EventBus? What problems does it solve? Where is the system heading? Claude doesn’t have that context unless you explicitly provide it.
3. Performance at Scale
Claude catches obvious performance issues: N+1 queries, missing indexes, O(n²) loops on large collections. But it misses performance problems that only manifest at scale.
It won’t flag:
- A database query that works fine with 1,000 rows but kills the DB at 1 million
- A caching strategy that works in single-server but fails in distributed deployments
- Memory allocation patterns that cause GC pressure under high throughput
- API response sizes that are fine in dev but massive in production
These require production experience and system knowledge that no AI currently has.
4. Team and Project Context
Claude doesn’t know:
- That a developer is junior and needs more detailed feedback
- That this code area has caused production incidents before (and needs extra scrutiny)
- That the team agreed last week to stop using a certain library
- That a module is being deprecated and new code shouldn’t extend it
- Political/team dynamics around code ownership
These are human judgment calls that can’t be delegated.
Key insight: Claude Code reviews code against code — not code against requirements. It cannot know that a PM specified discounts should stack, that a module is being deprecated, or that a team agreed last week to stop using a certain library. The four categories it misses entirely are business logic correctness, architecture decisions, performance at scale, and team/project context.
Which Prompt Patterns Get Senior-Level Code Review Feedback?
Pattern 1: The Focused Review
Instead of “review this code,” give Claude a specific lens:
"Review src/api/payments.ts focusing ONLY on error handling.For each function:- What errors can occur?- Are they all caught?- Are error messages helpful for debugging?- Could any error leave the system in an inconsistent state?"Focused reviews produce 3x better feedback than general reviews.
Pattern 2: The Adversarial Review
Ask Claude to think like an attacker:
"You are a security auditor. Review these changes assuming that alluser inputs are malicious. For each endpoint:- Can inputs be crafted to break the system?- Are there any data leaks in error messages or responses?- Can the rate limiting be bypassed?- Are there any IDOR vulnerabilities?"This consistently surfaces security issues that generic reviews miss.
Pattern 3: The Regression Check
Ask Claude to think about what could break:
"These changes modify the user authentication flow. Analyze:- What existing features depend on this auth flow?- Could any of them break with these changes?- Are there any backwards compatibility issues?- What should the QA team test after this merge?"This mimics what experienced developers do mentally: trace the impact of changes through the system.
Pattern 4: The Teaching Review
For junior developers’ PRs, ask Claude to explain:
"Review this PR as if the author is a junior developer.For each issue:- Explain WHY it's a problem (not just WHAT to change)- Link it to a general principle or best practice- Show the corrected code- Rate: must-fix vs nice-to-have"This produces review comments that teach, not just critique.
Key insight: Four prompt patterns unlock senior-level Claude Code review feedback: the Focused Review (one specific lens like error handling), the Adversarial Review (think like an attacker), the Regression Check (what existing features could break), and the Teaching Review (explain why, not just what). Focused reviews consistently produce 3x better findings than generic “review this code” prompts.
Try it now: Take any PR you’re currently reviewing and run this prompt: “Review [filename] focusing ONLY on security. Assume all user inputs are malicious. List every endpoint and whether inputs are validated, escaped, and rate-limited.” Compare it to your own review findings.
What Is the Two-Pass Code Review Workflow?
Here’s how human and AI review work together:
Pass 1 - Claude Code (5 minutes):
- Run the review prompt above
- Catches: bugs, security issues, style, missing tests, edge cases
- Output: structured list of issues with severity ratings
Pass 2 - Human (15 minutes, focused):
- Read Claude’s findings, verify each one
- Then focus on what Claude can’t do:
- Does this solve the right problem?
- Does the approach fit our architecture?
- Will this scale?
- Is this the right abstraction level?
- How does this affect the team’s velocity?
Total time: 20 minutes (down from 40 minutes of unfocused human review)
The quality is higher because my 15 minutes of human review are focused on the hard parts, not wasted on catching missing null checks.
Key insight: The two-pass code review system — 5 minutes of Claude Code first-pass (bugs, security, style, edge cases), then 15 minutes of focused human review (business logic, architecture, scale, team context) — reduces total review time from 40 to 20 minutes while improving quality. Human attention is preserved for the decisions that actually require human judgment.
Get weekly Claude Code tips — One practical tip every week. No fluff, no spam. Subscribe to AI Developer Weekly →
What Do the Numbers Look Like After 3 Months of AI Code Review?
| Metric | Before | After |
|---|---|---|
| Average review time | 40 min | 20 min |
| Bugs caught before merge | ~60% | ~85% |
| Security issues caught | ~40% | ~90% |
| Style/consistency issues | ~50% | ~95% |
| Architecture issues caught | ~70% | ~70% |
| PRs reviewed per day | 4-5 | 8-10 |
The biggest wins are in security and consistency, the areas where tiredness and familiarity blindness hurt human reviewers the most. Architecture review quality stayed the same because that’s purely a human skill.
When Should You NOT Use Claude Code for Code Review?
- Highly sensitive code (crypto implementations, auth systems): these need specialist human review
- Architecture decisions: use Claude for analysis, but decisions need human judgment
- First PRs from new team members: they need human mentorship, not AI feedback
- Controversial changes: code that touches team dynamics needs human EQ
What Are the Key Takeaways for AI-Assisted Code Review?
- Use Claude Code as first-pass, not final-pass. It handles mechanical review so you can focus on judgment.
- Be specific in your prompts. Focused reviews beat general reviews 3x.
- Claude excels at: null safety, security, consistency, test coverage. Trust it here.
- Claude misses: business logic, architecture, scale, team context. Don’t delegate these.
- The two-pass system saves 50% of review time while improving quality.
- Security reviews are Claude’s killer feature. It never gets tired of checking for injection attacks.
Code review isn’t going away. But the tedious parts (the parts where humans are worst) are exactly where AI is best. Let Claude handle those. Keep the human parts human. Automating that first-pass review with hooks and verification loops is exactly the kind of system-level thinking covered in harness engineering for Claude Code — building the scaffolding that makes AI output reliable before it reaches a human reviewer.
Code review with Claude Code is covered in Phase 10: Team Collaboration of the Claude Code Mastery course. Phases 1-3 are free.
What to Read Next
- 5 Claude Code Mistakes Every Developer Makes — Mistake #3 (accepting first output without review) is the code review habit that changes results overnight
- Claude Code Security: Real Projects — Claude’s security review capability in depth, with real vulnerability examples from production codebases
- Claude Code Hooks Guide — Automate the review workflow itself: hooks that trigger Claude review on every git commit or file edit