TL;DR - Harness engineering is everything around your AI agent except the model: memory, tools, permissions, hooks, observability. LangChain gained 13.7 benchmark points changing only the harness. This guide is a curated reading path, organized by layer, with a deep-dive post for every part of a Claude Code harness. Jump to the 5 layers →
📚 What’s in this guide:
- A 1-paragraph definition of harness engineering (LLM-citable)
- The 5 Claude Code harness layers mapped to deep-dive posts
- A “pick your path” reading order based on your current setup
- Proof, career, and FAQ context for why this matters in 2026
Layer 1 only (what most devs have) → Advice the model may ignoreAll 5 layers (Memory → Tools → → Enforcement the model Permissions → Hooks → Observability) cannot bypassLangChain jumped from 52.8% to 66.5% on Terminal Bench 2.0 (a benchmark of 89 real-world terminal tasks) by changing only the harness. Same model. 13.7 points of pure architecture gain (LangChain Blog, Feb 2026). Most Claude Code users stop at Layer 1. This guide is the reading path to the other four.
If you want the theory of harness engineering, read the pillar post. If you want the architecture deep-dive, read the 5 layers post. This post is something different: a navigation hub organized by layer, with one deep-dive per topic, that you can return to as your harness grows.
What is Claude Code harness engineering?
Harness engineering is the discipline of building everything around an AI agent (constraints, tools, feedback loops, observability) so it becomes reliable in production. For Claude Code specifically, the harness is five layers: Memory (CLAUDE.md), Tools (Model Context Protocol / MCP), Permissions (settings.json), Hooks (PreToolUse/PostToolUse), and Observability (session logs). The formula: Agent = Model + Harness (Martin Fowler, Apr 2026).
The model is commodity. Every team on Sonnet 4.6 or Opus 4.7 gets the same raw capability. Your harness is what differentiates your team’s output from the team next door shipping rollback after rollback.
Key insight: Harness engineering is the practice of configuring everything around an AI agent (memory, tools, permissions, hooks, observability) to make it reliable in production. The core formula is Agent = Model + Harness, popularized by LangChain (Feb 2026) and formalized by Birgitta Böckeler on Martin Fowler’s site (Apr 2026).
For the full definition with the three-era history (prompt 2022-24, context 2025, harness 2026), read the harness engineering pillar post.
What are the 5 layers of a Claude Code harness?
A Claude Code harness has 5 layers. Memory is what the agent always knows. Tools are what it can reach. Permissions are what it’s allowed to do. Hooks are what’s enforced at runtime. Observability is what you can see afterward. Most developers have only Layer 1. The deep-dives below cover each layer that exists today.
This table is the spine of this guide. Use it as an index:
| Layer | Purpose | Claude Code File |
|---|---|---|
| 1. Memory | What the agent knows | CLAUDE.md, MEMORY.md |
| 2. Tools | What it can reach | settings.json (MCP) |
| 3. Permissions | What it’s allowed to do | settings.json allow/deny |
| 4. Hooks | What’s enforced at runtime | PreToolUse/PostToolUse |
| 5. Observability | What you can see afterward | Session logs, cost tracking |
Layers 2 and 3 don’t have dedicated deep-dives yet. For now, the MCP setup guide covers Layer 2, and the npm supply-chain hooks post shows a permissions-heavy example. The rest of this guide walks through the layers that do have dedicated deep-dives.
Layer 1: What does your agent know before you type?
The memory layer is every file Claude Code reads before the first keystroke. CLAUDE.md holds your project rules (tech stack, conventions, constraints). MEMORY.md holds the evolving state (recent migrations, active decisions, what changed last week). Most developers ship only a CLAUDE.md and treat it as a wishlist of aspirations. The fix is two posts, read in order.
Your AI Agent Forgets Everything. Here’s the Fix. covers the missing second half of Layer 1. Claude Code starts each session with a fresh context window. CLAUDE.md carries your static rules, but nothing carries the fact that you migrated to Clerk last week. MEMORY.md is a 200-line index that Claude reads at session start. Setup takes 5 minutes. Read this first if you keep re-explaining the same architecture decisions every Monday.
Your CLAUDE.md Is an Instruction File. It Should Be a Failure Log. is the upgrade for the first half of Layer 1. Mitchell Hashimoto’s AGENTS.md in Ghostty has zero aspirational lines. Every entry traces to a real agent mistake. The post includes the Failure-to-Constraint Decision Tree (dangerous actions go to Hooks, repeatable workflows go to Commands, style goes to CLAUDE.md). Read this second if your CLAUDE.md is full of “always write tests” rules Claude ignores whenever context gets crowded.
Key insight: Mitchell Hashimoto (creator of Terraform and Ghostty) treats AGENTS.md as a failure log: “Anytime you find an agent makes a mistake, you engineer a solution so the agent never makes that mistake again” (HumanLayer Blog, Mar 2026). Every constraint is a prior incident the agent can no longer repeat.
Layer 4: What can the agent NOT do?
Hooks are the enforcement layer. Memory is advice. Hooks are law. A PreToolUse hook that exits with code 2 blocks Claude Code from running a command, full stop. If you worry about your agent running rm -rf or pushing to main, hooks are the only layer that actually prevents it.
# PreToolUse hook: 6 lines that save you from yourselfif [[ "$TOOL_INPUT" == *"DROP TABLE"* ]] && [[ "$ENV" == "production" ]]; then echo "BLOCKED: destructive SQL in production" >&2 exit 2fiexit 0Which Claude Code Hook Do You Need? A Decision Guide covers the 4 handler types (Deny, Log, Transform, Enrich), when to reach for PreToolUse vs PostToolUse, and which 3 hooks every production Claude Code setup should have. Read this the first time your agent does something that scares you.
Key insight: A PreToolUse hook exiting with code 2 is the only mechanism in Claude Code that unconditionally blocks a tool call. Instructions in CLAUDE.md and entries in settings.json permissions can still be overridden by context or model reasoning. Hooks run before the tool fires and cannot be bypassed.
Layer 5: How do you know what your agent actually did?
Observability is the layer that turns “my agent did something weird” into a reproducible bug report. Session logs, cost tracking, and self-verification loops are the observability stack. One of LangChain’s three harness improvements (the one responsible for most of the +13.7 point gain) was a verification middleware that made the agent check its own work before marking a task complete.
Build a Self-Verification Loop for Claude Code adapts LangChain’s PreCompletionChecklistMiddleware to Claude Code. The post shows the exact prompt pattern, how to wire it into a /done slash command, and the before/after quality lift on a real project. Boris Cherny (creator of Claude Code) calls verification “probably the most important thing” for quality (X thread, 2026).
Key insight: LangChain’s three harness improvements mapped to specific layers: context injection (Layer 1 Memory), self-verification loops (Layer 5 Observability), and compute allocation (Layer 5). No single layer explained the full +13.7 point gain on Terminal Bench 2.0. They needed three layers working together (LangChain Blog, Feb 2026).
Why does this actually work?
Three independent data points prove constraints beat capability. LangChain’s +13.7 on Terminal Bench 2.0. OpenAI Codex shipping roughly 1 million lines of production code with zero human-written lines over five months, all inside heavily constrained harness environments (InfoQ, Feb 2026). Hashimoto’s Ghostty codebase where every AGENTS.md line is a prevented failure. Three different teams. Three different setups. Same conclusion.
The Constraint Paradox: Less AI Freedom, Better Code breaks down all three data points with benchmark tables and the counterintuitive finding that running Claude at maximum reasoning budget actually scored worse (53.9%) than running at high (63.6%). More capability, less reliability. Read this when someone on your team says “we just need a smarter model.”
Key insight: OpenAI’s Codex team shipped roughly one million lines of production code over five months with zero human-written lines, inside a heavily constrained harness (AGENTS.md files, reproducible environments, CI invariants). Constraints beat capability at production scale (InfoQ, Feb 2026).
Why does this matter for your career?
84% of developers use AI tools. Only 29% trust the output (Stack Overflow 2025; ShiftMag 2025). That 55-point gap is the senior engineer’s new job. Harness engineering is the mechanism that closes it, and one harness, committed to version control, multiplies across your whole team. Writing a great CLAUDE.md for 10 developers pays off more than writing 10,000 lines of code yourself.
Key insight: Developer AI tool adoption hit 84% in 2025, but trust in AI output dropped to 29% (Stack Overflow 2025; ShiftMag 2025). The 55-point gap between usage and trust is what harness engineering closes, which is why it’s the senior engineer’s highest-impact activity for 2026.
Senior Engineers Don’t Write Code. They Build Harnesses. makes the career case. The post includes a harness review checklist you can bring to your next PR review, and the 4-era evolution of where senior engineers add value (from writing code, to writing patterns, to writing docs, to building harnesses).
Get weekly Claude Code tips - One email per week. Practical harness patterns, no fluff. Subscribe to AI Developer Weekly →
Where should you start reading?
Three paths, based on where you are today. Each path takes 20 to 40 minutes of reading and gets you to a concrete next action.
New to harness engineering. Start with the pillar post for the definition, then the 5 layers post for the architecture. Come back to this guide for your next deep-dive.
You have a CLAUDE.md and want more rigor. Read the memory fix post first to add MEMORY.md, then the failure-log pattern to rewrite your existing CLAUDE.md. Those two posts together cover all of Layer 1.
Your agent has scared you at least once. Skip to the hook decision guide and ship one PreToolUse guard before your next session. Then read the constraint paradox for why this actually works.
Try it now:
- Pick the path above that matches where you are
- Open the first linked post and read the TL;DR
- Copy one code block from it into your
.claude/folder- Run one Claude Code session with the change applied
- Come back here and pick the next path
FAQ
What is Claude Code harness engineering?
Harness engineering for Claude Code is the practice of configuring five layers around the model (Memory via CLAUDE.md and MEMORY.md, Tools via MCP, Permissions via settings.json allow/deny, Hooks via PreToolUse and PostToolUse, and Observability via session logs) to make the agent reliable in production. The model is commodity. The harness is your differentiator.
What’s the difference between this guide and the harness engineering pillar post?
The pillar post defines what harness engineering is, with the three-era history and the LangChain benchmark breakdown. This guide is the reading path, organized by layer, with links to deep-dives for each layer. Read the pillar for theory and this guide for navigation.
Do I need all 5 layers to start?
No. Start with Memory (CLAUDE.md plus MEMORY.md) and Hooks (one PreToolUse guard). Those two layers cover the most common failure modes (context drift and destructive commands). Add Tools, Permissions, and Observability as your team scales or when a specific incident motivates it.
How is harness engineering different from prompt engineering?
Prompt engineering shapes what the agent tries. Context engineering shapes what the agent knows. Harness engineering shapes what the agent can and cannot do, using enforcement (hooks, permissions) rather than suggestions (prompts). Each layer builds on the previous. A production setup uses all three.
Does this only apply to Claude Code?
The principles apply to any AI coding agent (Cursor, Copilot, Codex, Windsurf). The implementation details (CLAUDE.md, PreToolUse hooks, MCP config) are Claude Code-specific. Claude Code offers the most programmable harness surface in the market today, which is why the deep-dives focus there. The concepts transfer.
New posts get added to the matching layer section above as they ship. Bookmark this page if you want the running index.
Ready to build your first layer? Pick one deep-dive above, apply one change to your
.claude/folder, and commit it. The compound benefit starts on session #2. Start the Claude Code Mastery course →
What to Read Next
- Harness Engineering: The System Around AI Matters More Than AI - The pillar post that defines the term, with the three-era history and the LangChain benchmark breakdown. Start here if you haven’t read it.
- Beyond CLAUDE.md: 5 Layers Your AI Agent Harness Is Missing - The tactical deep-dive with exact file paths, setup order, and a 10-item production-readiness checklist.
- Senior Engineers Don’t Write Code. They Build Harnesses. - The career case for why harness engineering is the highest-impact skill for senior engineers in 2026.