TL;DR

  • Tag every incoming AI PR with a blast radius tier (T0/T1/T2/T3) before reading any diff. A 30-second decision based on paths plus CODEOWNERS, not line count.
  • Allocate your read budget by tier: T0 gets the full six-item checklist from Part 1, T1 gets a trimmed checklist, T2 gets a behavioral spot-check, T3 gates on CI green plus a sample.
  • When three AI PRs land in the same hour, triage takes under two minutes and you still ship review quality on the dangerous one.

📊 Result proof. Tried on a real PR queue (private repo, May 2026, ~14 AI PRs/week): triage averaged 41 seconds per PR; total review time dropped from “couldn’t finish” to ~70 minutes/day with zero T0 misses over six weeks. Sample size is small. Treat the numbers as a baseline, not a benchmark.

Part 1 gave you a six-item checklist that works when you can read a whole PR end-to-end. The catch is the one I left you with: three AI PRs landed in the same hour, and you can’t read all of them line-by-line. The checklist isn’t wrong. Your read budget is finite, and the queue isn’t.

This tutorial gives you the triage layer that sits in front of Part 1. You’ll learn to assign a blast radius tier to each AI PR in roughly 30 seconds without opening the Files Changed tab, allocate read time per tier, and run a behavioral spot-check on the low tiers instead of reading diffs you can’t afford to read. By the end you’ll have a workflow that survives a four-PR hour.

Prerequisites

  • You’ve read Part 1 and the six-item checklist is muscle memory.
  • You’re a senior dev or tech lead with approve/reject authority, not a first-reviewer.
  • Your repo has at least one ownership signal: CODEOWNERS, path-based labels, or service ownership in a manifest. Triage uses this; it does not build it.

Three AI PRs in the same hour: why the Part 1 checklist breaks

The checklist doesn’t break because it’s wrong. It breaks because it assumes you have time to read a whole PR, and when input triples your read budget does not. I’ve watched senior devs react three ways and all three are wrong: read serially and merge the last one in a hurry, read in parallel and lose context between them, or defer all three and grow a queue you’ll never catch up on.

Define the term once: read budget is the maximum attention you can spend reading carefully in one review shift. It’s wall time times focus, not lines per minute. Two hours of meetings drops it to zero even if your calendar says “free.”

Key insight: Read budget is finite and shift-bound. Treat it like an SRE error budget, not like a queue depth.

Wrong reactionWhat it costs youHow you spot yourself doing it
Read serially, merge the third in a hurryThe dangerous one usually arrives lastYou’re skimming the diff at minute 45
Read all three in parallelContext bleed: you approve PR A’s pattern in PR BYou can’t remember which PR raised a question
Defer everything until tomorrowT0 stays unreviewed; trust decays in the teamYour queue is six PRs deep on Friday

Triage isn’t a productivity hack. It’s how you stop spending the same budget on a docs PR and a Terraform PR.

Triage AI pull requests by blast radius: what T0/T1/T2/T3 mean

A blast radius tier classifies a PR by how much damage it can do after it merges, not by how clever the diff is. T0 is highest impact, T3 is lowest. Size doesn’t enter the model. A three-line IAM policy change is T0; a 300-line test file is usually T3.

Key insight: Blast radius is “what breaks if this is wrong in production,” not “how hard was this to write.”

TierWhat it coversExampleDecision authority
T0Infra, security, data integrityIAM policy, Dockerfile, schema migration, auth middlewareSenior + second senior, or tech lead
T1Core business logicRefactor of the checkout service, pricing rules, payment retrySenior, solo OK
T2Isolated feature behind a flag or boundaryNew endpoint gated by feature flag, new admin pageSenior, solo OK
T3Docs, tests against existing coverage, fixturesREADME updates, new unit tests on covered code, formattingAny reviewer

Two cautions. First, T0 is the only tier I won’t approve alone if I can avoid it; the failure mode is too cheap for the agent to produce and too expensive to recover from. Second, “behind a feature flag” only counts as T2 if the flag has been used to roll back before. A flag nobody has flipped under load is theatre.

Tier an AI PR in 30 seconds: do this before opening the diff

What. Open the PR page. Don’t click Files Changed. Look at three things: the paths touched, the CODEOWNERS match, and the labels the agent already self-applied.

Why. Coding agents (Claude Code, Cursor, Codex) routinely mis-label scope. They see a .md file in the diff and tag the PR “docs-only” while the same PR also touches a migration script. Self-verifying paths is faster than trusting the agent’s label.

Terminal window
# Production: run against the real PR queue. Read-only.
gh pr view 4821 --json files,labels,additions,deletions \
| jq '{
paths: [.files[].path],
ai_labels: .labels | map(.name),
size: (.additions + .deletions)
}'

Expected output (real PR, lightly redacted):

{
"paths": [
"services/checkout/pricing.ts",
"services/checkout/__tests__/pricing.test.ts",
"infra/terraform/iam_checkout.tf"
],
"ai_labels": ["enhancement", "tests"],
"size": 287
}

Look at the paths. infra/terraform/iam_checkout.tf is in the diff. The agent labeled this “tests” and “enhancement.” It’s T0. Assign the tier, add a tier/T0 label, move on.

Verify. You wrote down T0/T1/T2/T3 (label, comment, or a sticky note) within 30 seconds of opening the PR, and you haven’t read a line of the diff yet.

A failure mode I’ve lived through: I tiered a PR T3 because the agent labeled it “test only.” The “tests” touched a shared fixture file that a pending migration depended on. It should have been T1. Now I treat agent-applied scope labels as untrusted input. Read the paths.

Read budget: how to allocate time across tiers

Read budget is allocated inversely to tier number. T0 gets full attention, T3 gets the least. The numbers below are the baseline I run; tune them to your team’s velocity and incident history.

TierTreatmentWall time per PRTrade-off you’re accepting
T0Full Part 1 checklist, line-by-line, paired review when possible20–30 minSlow. You will not merge two T0s in an hour.
T1Part 1 checklist, allowed to skip item 6 (commit message) if CI is green10–15 minYou miss commit-message lies; you keep checklist coverage on the dangerous items.
T2Behavioral spot-check (next section) instead of full checklist~5 minYou miss bugs only reachable through diff reading, not through behavior.
T3Sample one or two spots + gate on CI green; reject on red~2 minYou will miss trivial bugs. You traded that for finishing T0 properly.

You can argue with the times. You can’t argue with the trade-off shape: you don’t escape risk by reading every line of every PR, because your read budget runs out before the queue does. You only get to choose where the risk lives.

Key insight: Don’t allocate read budget by diff size. Size doesn’t correlate with blast radius. A 3-line IAM change can wreck a region; a 300-line new test file usually can’t.

A quick aside on T1: if your team’s checklist item 6 (commit-message lies) has caught real bugs, don’t skip it on T1. The rule is “earn your skips from postmortems,” not “skip what feels safe.”

Behavioral spot-check for T2 and T3: what to do instead of reading the diff

A behavioral spot-check is the protocol I run instead of reading every line on T2 and T3. The definition (behavioral diff) was introduced in Part 1; here I’ll show the operational protocol.

Four steps. Run them in order. Any failure escalates the PR to the next tier up and the full checklist.

Step 1 — Pull the branch and run tests locally.

Terminal window
# Local sandbox. Don't run on a production-connected machine.
gh pr checkout 4821
make test # or `pnpm test`, `cargo test`, whichever your repo uses

Expected output: green. If red, the PR is rejected; you don’t owe it a behavioral diff.

Step 2 — Behavioral diff: compare one important code path before and after.

Terminal window
# Same sandbox. Capture behavior, switch to main, capture again, compare.
curl -s localhost:3000/api/pricing?sku=ABC > /tmp/after.json
git stash
git checkout main && make build && make run-bg
curl -s localhost:3000/api/pricing?sku=ABC > /tmp/before.json
git checkout - && git stash pop
diff /tmp/before.json /tmp/after.json

Expected output for a “refactor only” PR: empty diff. Anything non-empty deserves a second look, even if the agent’s PR description claims no behavior change.

Step 3 — Spot-check one place in the diff.

Pick one file at random (not the file the agent’s description highlights; the one it doesn’t). Apply Part 1 item 2 only: “API used incorrectly but plausibly.” Five minutes maximum.

Step 4 — Gate.

Approve only if all three hold: CI green, behavioral diff matches the PR’s stated intent, and the one spot-checked spot passes item 2. Any failure escalates the tier and you read the full checklist.

Verify. You spent under 10 minutes per T2 PR and you can name the one behavior you confirmed and the one spot you checked.

Key insight: Behavioral diff covers the code paths your tests cover. It does not cover error paths without tests. You accepted that trade-off when you tiered the PR T2.

Scaling triage when N AI PRs land per hour

The triage protocol scales linearly because each tiering step takes under 30 seconds. The bottleneck is T0, which still demands serial line-by-line review. When the queue grows from three to eight per hour, you don’t scale review; you scale defer on T3.

Concrete scenario: eight AI PRs land between 10:00 and 11:00. Triage takes about two minutes total. The distribution from my own logs (six weeks, ~80 PRs) tends to look like: 1 T0, 2 T1, 3 T2, 2 T3. That’s not a universal claim, it’s an observed shape; your distribution depends on what your agents are tasked with.

Schedule:

10:02 Triage all 8. T0 labeled, T1/T2/T3 queued.
10:02 Read T0 in full. (~25 min)
10:30 Batch T1 (2 PRs). (~25 min)
14:00 Batch T2 (3 PRs) with behavioral spot-checks. (~15 min)
17:00 T3 cleanup (2 PRs). (~5 min)

Two cautions worth saying loud. First, if a T0 PR depends on a T3 PR (rare but real: a Terraform PR can depend on a docs-as-data file), unblock the T3 first; defer order is not always tier order. Second, deferred T3 has a cap: 24 hours. Past that, the backlog erodes team trust in the review process, which is worse than the bugs you’d catch by reading the diff.

Key insight: Triage scales because it’s bounded per PR. The bottleneck moves to serial T0 review. Hire (or train) accordingly.

FAQ

Q: What if I tiered a PR wrong?

A: Re-tier when the spot-check reveals broader scope. Always move up, never down. Treat tiering as a one-way ratchet; the cost of accidentally lowering a tier is a missed T0, which is exactly what triage is built to prevent.

Q: Two senior devs tier the same PR differently. How do we resolve it?

A: Default to the higher tier. Argue async; don’t block review. After five disagreements of the same shape, write the tier criteria into a team doc and call it done.

Q: Is a behavioral spot-check really enough for T2?

A: It’s enough for the code paths your tests cover. It is not enough for error paths without tests. That’s the trade-off you accepted when you assigned T2. If the PR materially changes error handling, escalate to T1 even if the diff looks small.

Q: My repo has no CODEOWNERS. How do I tier?

A: Use path-based heuristics. A regex match on infra/, migrations/, auth/, or payments/ is good enough for T0/T1 detection. Building CODEOWNERS is its own tech-debt ticket; it does not block triage.

Q: Should I let the AI agent self-tier its PRs?

A: Not in 2026. The agents I’ve tested mis-label scope often enough that their tier becomes noise. A senior dev tiers manually. Part 3 will automate this with CI rules (path globs and ownership lookups), not with an LLM judging itself.

What you have now — and what Part 3 picks up

You’ve got a triage protocol that runs before you read any diff: a 30-second tier assignment per PR, a read budget allocated by tier, a behavioral spot-check that replaces line-by-line reading on T2 and T3, and a defer policy that holds when the queue spikes. The model shift is the one to keep: blast radius is decided by ownership and impact, not by diff size.

Three things that can still go wrong, with fixes:

  • You tiered a PR wrong because you trusted the agent’s label. Re-tier on spot-check; treat agent labels as untrusted input.
  • T3 deferred too long and the team noticed. Cap defer at 24 hours; batch T3 at end of day.
  • The team disagrees on tier criteria. Default high, then write the criteria down after five disputes.

Triage holds the line while you’re at the keyboard. The moment you step into a two-hour meeting or take PTO, the AI PR queue grows on its own, and these checks still need to run without you. That’s the gap Part 3 closes: encoding Part 1’s checklist and Part 2’s tier model into CI jobs and a PR bot, so the review loop keeps running while you’re not.

Part 3 — Push the review loop into CI →