Skip to content

Legacy Test Generation

Estimated time: ~35 minutes

Prerequisite: Module 9.2 (Incremental Refactoring)

Outcome: After this module, you will know how to use Claude Code to generate characterization tests for legacy code, understand what to test and what to skip, and have a workflow for adding tests before refactoring.


You want to refactor that 500-line function. No tests exist. “I’ll just be careful and manually test.” Famous last words.

You refactor. It seems to work. You deploy. Next morning: production incident. A rarely-used code path you didn’t test was broken. Customer data corrupted.

Tests are the safety net for refactoring. No tests = no safety net = high risk. Claude Code can generate tests for legacy code faster than you can write them — giving you that safety net before you start changing things.


TypeQuestion It Answers
Unit TestDoes the code do what it SHOULD?
Characterization TestWhat does the code CURRENTLY do?

Legacy code is “correct” by virtue of being in production. Users depend on its current behavior. Characterization tests capture that behavior — even if it seems wrong. The goal is detecting changes, not verifying correctness.

graph LR
A[Read Code] --> B[Identify Behaviors]
B --> C[Generate Tests]
C --> D[Run & Fix Tests]
D --> E[Verify Coverage]
E --> F[Safe to Refactor]

Key insight: When tests fail, fix the TEST, not the code. The code is “correct” — it’s what’s running in production.

PriorityWhatWhy
HighPublic interface / APIUsed by other code
HighHappy path flowsMost common usage
MediumEdge cases in codeExplicit handling exists
MediumError handlingFailure modes matter
LowInternal helpersChanged during refactor anyway

Don’t aim for 100%. Aim for “enough to refactor safely”:

  • 70-80% coverage of the specific function is usually enough
  • Focus on code you’re about to change
  • Test behaviors, not lines

Scenario: Legacy function calculateDiscount() — no tests, about to refactor.

You: Read calculateDiscount() in pricingService.js.
List all the behaviors/paths through this function.
Claude:
1. Returns 0 if user is not logged in
2. Returns 10% for regular users
3. Returns 15% for premium users
4. Returns 30% for VIP users
5. Caps discount at 50% regardless of user type
6. Throws error if amount is negative
You: Generate Jest tests for calculateDiscount() that capture ALL those behaviors.
These are characterization tests — capture what it DOES, not what it SHOULD do.
Claude: [Generates test file with 6 test cases]
Terminal window
$ npm test pricingService.test.js

Output:

PASS pricingService.test.js
calculateDiscount
✓ returns 0 for non-logged-in user
✓ returns 10% for regular user
✓ returns 15% for premium user
✓ returns 30% for VIP user
✓ caps at 50% max discount
✓ throws on negative amount
6 tests passed

Suppose one test fails — Claude assumed wrong behavior:

FAIL: expected 20% for premium, got 15%
You: The test is failing. The CODE is correct — it's in production.
The actual discount for premium users is 15%, not 20%.
Fix the test to match actual behavior.
Claude: [Fixes test assertion from 20% to 15%]
Terminal window
$ npm run test:coverage -- --collectCoverageFrom="**/pricingService.js"

Output:

pricingService.js | 85% coverage

Good enough to refactor safely.

You: We have tests. Now refactor calculateDiscount() to use
a strategy pattern instead of if-else chain.
Any refactoring that changes behavior will be caught by tests.

Goal: Generate characterization tests for existing code.

Instructions:

  1. Find a function without tests in any project
  2. Ask Claude to list all behaviors/paths
  3. Generate tests for each behavior
  4. Run tests — all should pass (if not, fix tests)
  5. Check coverage
💡 Hint
"Read [function]. What are all the possible execution paths?
Generate a test case for each path."

Goal: Capture complex output as regression baseline.

Instructions:

  1. Pick a function with complex output (formatting, calculations)
  2. Run it with 10 different inputs, capture outputs
  3. Ask Claude to generate tests asserting those exact outputs
  4. Now you have regression detection

Goal: Practice the full workflow.

Instructions:

  1. Pick a function you want to refactor
  2. Generate characterization tests
  3. Achieve 70%+ coverage
  4. Do a small refactor
  5. Run tests — did they catch anything?
✅ Solution

Workflow:

  1. "List all behaviors in [function]."
  2. "Generate tests for each behavior."
  3. Run tests, fix any that fail (fix TEST, not code)
  4. Check coverage, add more tests if needed
  5. Refactor with confidence

  1. Read code, list behaviors
  2. Generate tests for each behavior
  3. Run tests (expect all pass)
  4. If fail: fix TEST, not code
  5. Check coverage
  6. Now safe to refactor
"List all behaviors/paths in [function]."
"Generate characterization tests capturing current behavior."
"Test is failing but CODE is correct. Fix the test."
"What edge cases does this code handle?"
GoalTarget
Minimum70% of function-to-refactor
Good80% with edge cases
Overkill100% (not worth the effort)
CharacterizationUnit
What does it DO?What SHOULD it do?
Fix test on failureFix code on failure
Before refactoringDuring development

❌ Mistake✅ Correct Approach
Fixing code when tests failFix TESTS. Code is “correct” (it’s in production).
Aiming for 100% coverage70-80% of code-to-refactor is enough.
Testing internal helpersFocus on public interface. Helpers will change.
Verifying “correct” behaviorVerify CURRENT behavior, even if it’s a bug.
Generating tests without runningALWAYS run. Claude may misunderstand behavior.
Skipping tests “I’ll be careful”Tests are safety net. Always add before refactor.
Complex mocking for legacy codeStart with integration-level tests. Mock less.

Scenario: Vietnamese fintech, legacy loan calculation module. 2,000 lines, zero tests, 8 years old. Business wants new loan type added. Team afraid to touch it.

Old approach: “We’ll be careful” → Added new loan type → Broke existing calculation for edge case → ₫500M miscalculation discovered after 2 weeks → Painful fix and customer complaints.

New approach with Claude:

  1. Claude analyzed code, identified 15 distinct calculation paths
  2. Generated 45 characterization tests in 3 hours
  3. Tests revealed 3 undocumented behaviors (not bugs — features no one remembered)
  4. Achieved 78% coverage on loan calculation core
  5. Added new loan type, tests caught 2 regressions during development
  6. Zero production issues

Investment: 3 hours generating tests Saved: Weeks of debugging, potential ₫ millions in miscalculations

Quote: “The tests weren’t about proving correctness. They were about proving we didn’t break the thing that’s been working for 8 years.”


Next: Module 9.4: Tech Debt Analysis