Image & Visual Context
Module 5.3: Image & Visual Context
Section titled “Module 5.3: Image & Visual Context”Estimated time: ~25 minutes
Prerequisite: Module 5.2 (Context Optimization)
Outcome: After this module, you will know how to use visual context with Claude Code — providing screenshots, UI mockups, diagrams, and error screenshots to get more precise, visually-informed responses.
1. WHY — Why This Matters
Section titled “1. WHY — Why This Matters”You’re trying to describe a UI bug to Claude Code. “The button overlaps the text when the screen is narrow. The margin seems wrong. Maybe the flex container?” You type five sentences describing the layout. Claude suggests a CSS fix, but it doesn’t match your actual issue — because it couldn’t SEE what you’re seeing. One screenshot would have been worth a thousand tokens. Visual context is the most underused capability in Claude Code. Most developers don’t realize Claude can process images. When you show instead of tell, accuracy jumps dramatically. For UI bugs, architecture diagrams, or error screenshots, visual context is not optional — it’s essential.
2. CONCEPT — Core Ideas
Section titled “2. CONCEPT — Core Ideas”What Claude Code Can “See”
Section titled “What Claude Code Can “See””Claude Code’s vision capabilities allow it to analyze:
- UI/UX screenshots — layout bugs, styling issues, responsive design problems
- Architecture diagrams — system design, data flow, component relationships
- Error screenshots — terminal errors, browser dev tools, stack traces with context
- Wireframes and mockups — design-to-code conversion, implementation guidance
- Database schema diagrams — ERDs, table relationships, migration planning
- Whiteboard photos — hand-drawn architecture, brainstorming diagrams
How to Provide Images
Section titled “How to Provide Images”⚠️ Needs verification — exact mechanisms vary by Claude Code version and platform:
The typical methods include:
- File path reference — specify the image path in your prompt
- Copy-paste from clipboard — some terminals support image paste
- Drag-and-drop — depends on terminal emulator support
Recommendation: Test image input in your specific environment before relying on it. The most reliable method is typically saving the image to your project directory and referencing its path.
When Visual Beats Text
Section titled “When Visual Beats Text”graph TD A[Problem Type] --> B{Visual Component?} B -->|Yes| C[UI/Layout Issue?] B -->|No| D[Use Text Context]
C -->|Yes| E[Screenshot + Description] C -->|No| F{Architecture/Design?}
F -->|Yes| G[Diagram + Explanation] F -->|No| H{Error with UI State?}
H -->|Yes| I[Error Screenshot] H -->|No| D
E --> J[High Visual Value] G --> J I --> J D --> K[Low Visual Value]
style J fill:#90EE90 style K fill:#FFB6C1Use images when:
- The problem has a visual component you can show
- Describing it in text would take >100 words
- Exact visual state matters (UI bugs, layout issues)
- Architecture is complex with many relationships
Use text when:
- The problem is purely logical (algorithm bugs)
- Performance issues (logs and metrics are better)
- Configuration problems (show config files as text)
- The visual would be unclear or ambiguous
Image Context Cost
Section titled “Image Context Cost”Images consume significant tokens. A typical screenshot can use 500-2000 tokens depending on resolution and complexity. Be strategic:
- Crop tightly — show only the relevant area
- Reduce resolution for diagrams and wireframes
- Use text for code — don’t screenshot code, paste it as text
- Monitor with /cost — track your context budget
3. DEMO — Step by Step
Section titled “3. DEMO — Step by Step”Step 1: UI Bug with Screenshot
Section titled “Step 1: UI Bug with Screenshot”Scenario: Button overlaps text on mobile viewport.
⚠️ Note: The exact method to provide images may vary. This example assumes file path reference works:
# Save screenshot as mobile-bug.png in your project$ claudeYour prompt:
See mobile-bug.png — the "Submit" button overlapsthe form text when viewport is <400px wide.Fix the CSS to prevent this overlap.What Claude sees: The exact visual state — button position, text flow, container dimensions.
Expected response pattern:
I can see the button is using position: absolutewithout accounting for the form height. Here's the fix:
[Provides targeted CSS that addresses the specificvisual issue, not a generic solution]Why it works: Claude sees the EXACT problem, not your interpretation of it.
Step 2: Architecture Diagram Analysis
Section titled “Step 2: Architecture Diagram Analysis”Scenario: You have a system architecture diagram showing microservices.
$ claudeYour prompt:
See architecture.png — we're adding a new"Notification Service". Where should it sit inthis architecture? Should it be behind the APIGateway or talk to services directly?Expected response pattern:
Looking at your diagram, I see:- API Gateway → Auth Service → User Service- Payment Service talks directly to external APIs- ...
For Notification Service, I recommend:[Specific architectural guidance based onwhat it actually sees in your diagram]Step 3: Error Screenshot Diagnosis
Section titled “Step 3: Error Screenshot Diagnosis”Scenario: Browser console shows cryptic React error.
$ claudeYour prompt:
See console-error.png — this React error appearswhen I click "Delete". The stack trace is cut off.What's the likely cause and how do I debug it?What Claude sees: The error message, component names in the stack, React DevTools state, network tab status.
Expected response pattern:
I can see this is a "Cannot read property 'id' of undefined"error in your DeleteButton component. Looking at thestack trace visible in your screenshot...
[Specific debugging steps based on the actual error]Step 4: Wireframe to Code
Section titled “Step 4: Wireframe to Code”Scenario: Designer provides a wireframe for a new dashboard.
$ claudeYour prompt:
See dashboard-wireframe.png — implement thisin React with Tailwind CSS. Use the exact layoutshown: sidebar left, main content area with3 cards across, header with search.Expected response pattern:
Based on your wireframe, here's the component structure:
[Provides implementation that matches the visuallayout — correct grid, spacing, hierarchy]Accuracy improvement: Instead of guessing layout details, Claude sees the exact design intent.
Step 5: Cost Impact Measurement
Section titled “Step 5: Cost Impact Measurement”After using visual context:
/costExpected output:
Total tokens: 8,234├─ Input: 6,891 ($0.021)│ ├─ Text: 4,123 tokens│ └─ Images: 2,768 tokens ← Track this└─ Output: 1,343 ($0.004)Analysis: The image used ~40% of input tokens. Was it worth it? If it prevented 3 rounds of back-and-forth (300+ tokens each), then yes.
4. PRACTICE — Try It Yourself
Section titled “4. PRACTICE — Try It Yourself”Exercise 1: Screenshot vs Description
Section titled “Exercise 1: Screenshot vs Description”Goal: Compare accuracy and efficiency between text description and screenshot.
Instructions:
- Find a UI bug in any app (or create a simple HTML page with an obvious layout issue)
- Round 1 — Text only:
- Describe the bug to Claude Code in text (don’t mention you have a screenshot)
- Note the solution it provides
- Note the token cost with
/cost
- Round 2 — With screenshot:
- Start a new session (
/clearor restart) - Provide the screenshot with a brief description
- Note the solution it provides
- Note the token cost with
/cost
- Start a new session (
- Compare:
- Which solution was more accurate?
- Which required fewer clarifications?
- Token cost difference
- Time saved
Expected result: The screenshot version should provide a more accurate solution with fewer follow-up questions, even if token cost is slightly higher.
💡 Hint
For the text description, try to be detailed but not exhaustive. Describe the bug as you normally would to a colleague. For the screenshot, keep your text prompt brief — let the image do the talking.
✅ Solution Approach
Round 1 — Text-only example:
Prompt: "My navigation menu items are too close togetheron mobile. They're hard to tap. I think the padding iswrong. It's a flex container with space-between."
Tokens: ~150 input + likely follow-up questionsAccuracy: Depends on how well you described itRound 2 — Screenshot example:
Prompt: "See nav-bug.png — menu items overlap on mobile.Fix the spacing."
Tokens: ~50 text + 800 image = 850 inputAccuracy: High — Claude sees exact spacing issueFollow-ups: Usually none neededVerdict: Screenshot costs more tokens (~5x) but typically saves 2-3 rounds of clarification. Net saving in most cases.
Exercise 2: Wireframe to Component
Section titled “Exercise 2: Wireframe to Component”Goal: Use a visual mockup to generate accurate component code.
Instructions:
- Create a simple wireframe (or find one online) — can be hand-drawn or from Figma/Sketch
- Should show: layout structure, component hierarchy, basic content
- Save as PNG/JPG
- Prompt Claude Code:
See wireframe.png — implement this componentin [your framework: React/Vue/Svelte].Use [your styling: Tailwind/CSS-in-JS/plain CSS].Match the layout exactly.
- Check the generated code against your wireframe:
- Layout structure correct?
- Spacing roughly matched?
- Component hierarchy logical?
- Run
/costto see token impact
Expected result: Generated component should structurally match the wireframe. Exact styling may need refinement, but the foundation should be solid.
💡 Hint
If the wireframe is hand-drawn or low-fidelity, add clarification in your prompt about what each box/section represents. For example: “The top box is the header, middle section is a card grid (3 across), bottom is pagination.”
✅ Solution Pattern
Good wireframe prompt structure:
See [filename] — this is a [component type] for [purpose].
Layout details:- [Top section description]- [Middle section description]- [Bottom section description]
Implement in [framework] with [styling approach].Use semantic HTML. Make it responsive.What Claude delivers:
- Component structure matching visual hierarchy
- Reasonable styling approximations
- Semantic HTML elements
- Basic responsiveness
What you’ll still need to adjust:
- Exact spacing/margins
- Colors and fonts (unless specified)
- Animations/interactions
- Accessibility details
Token cost: Expect 600-1500 tokens for a wireframe, depending on complexity. Worth it if it saves 15+ minutes of describing layout in text.
5. CHEAT SHEET
Section titled “5. CHEAT SHEET”Visual Context Use Cases
Section titled “Visual Context Use Cases”| Use Case | What to Provide | Example Prompt | Token Cost |
|---|---|---|---|
| UI Bug | Screenshot of issue | ”See bug.png — button overlaps on mobile. Fix CSS.” | 500-1000 |
| Architecture Review | System diagram | ”See arch.png — should Payment Service use the API Gateway?“ | 800-1500 |
| Error Diagnosis | Console/terminal screenshot | ”See error.png — this crashes on submit. Why?“ | 600-1200 |
| Wireframe to Code | Mockup/wireframe | ”See mockup.png — implement in React + Tailwind” | 700-2000 |
| Database Design | ERD or schema diagram | ”See schema.png — optimize for read-heavy queries” | 800-1500 |
| Code Review Visual | Screenshot with annotations | ”See review.png — red circles show performance issues” | 600-1200 |
How to Provide Images
Section titled “How to Provide Images”⚠️ Verify in your environment — methods may vary:
| Method | Reliability | Notes |
|---|---|---|
| File path reference | High (usually) | Save image in project, reference in prompt |
| Clipboard paste | Medium | Terminal-dependent |
| Drag-and-drop | Medium | Terminal-dependent |
Most reliable approach: Save images to your project directory and reference them by path in your prompt.
When NOT to Use Images
Section titled “When NOT to Use Images”| Scenario | Why Text is Better |
|---|---|
| Code snippets | Text is searchable, copy-pasteable, uses fewer tokens |
| Log files | Text format preserves structure, easier to grep |
| JSON/Config | Text allows precise editing |
| Performance data | Numbers in text are easier to analyze |
| Stack traces | Text stack traces are fully copy-pasteable |
Token Cost Guidelines
Section titled “Token Cost Guidelines”- Simple screenshot: 500-800 tokens
- Complex UI screenshot: 1000-1500 tokens
- Detailed diagram: 1200-2000 tokens
- High-resolution image: 2000+ tokens (avoid unless necessary)
Budget rule: If an image uses >2000 tokens, consider if a text description + smaller/cropped image would work.
6. PITFALLS — Common Mistakes
Section titled “6. PITFALLS — Common Mistakes”| ❌ Mistake | ✅ Correct Approach |
|---|---|
| Describing UI bugs verbosely — “The button is 20px from the left, overlaps the text at 380px viewport, the container has flex…” | Show, don’t tell — Screenshot + “Button overlaps text on mobile. Fix CSS.” Uses fewer tokens, higher accuracy. |
| Huge high-res screenshots — Providing 4K screenshots that consume 3000+ tokens | Crop and resize — Show only the relevant area. Resize to 1200px max width for web UIs. Diagrams can be even smaller. |
| Using images for text-based problems — Screenshot of code instead of pasting text | Use text for code — Code as text is searchable, editable, uses 5-10x fewer tokens. Images only for visual context. |
| Screenshot without context prompt — Just “See this image” with no explanation | Always add text context — “See mobile-bug.png — this happens when viewport <400px. Fix the button overlap.” Guides Claude’s focus. |
| Multiple screenshots without cost monitoring — Adding 5-6 screenshots in one prompt | Track with /cost — Each image is 500-1500 tokens. 5 images = 2500-7500 tokens = significant budget. Use sparingly. |
| Assuming Claude can read tiny text — Screenshot with 8pt font or compressed terminal output | Ensure readability — If text in the image is important, make sure it’s legible at normal viewing size. Zoom in before screenshot if needed. |
7. REAL CASE — Production Story
Section titled “7. REAL CASE — Production Story”Scenario: Building a mobile banking app for a Vietnamese bank using Kotlin Multiplatform (KMP). The design team uses Figma for all screen mockups. Previously, developers would manually translate designs to code, often missing spacing and layout details.
Problem: Each screen took 10-15 minutes to describe to Claude Code in text, and the first implementation was only about 70% accurate to the design. This led to back-and-forth with designers and rework.
Solution: New workflow adopted:
- Designer exports PNG from Figma at 2x resolution
- Developer crops to show one screen at a time
- Provides to Claude Code with framework context:
See transaction-screen.png — implement this inCompose Multiplatform. Match the layout exactly.Use our design tokens from theme/tokens.kt.
Implementation:
# Example session$ claude
> See screens/transaction-detail.png>> Implement this screen in Compose Multiplatform.> Layout: Top app bar with back button, transaction> amount (large text), detail rows below, action> buttons at bottom.>> Use our design system: MaterialTheme + custom tokens> in ui/theme/Tokens.kt
[Claude Code generates 90-95% accurate implementation]Results:
- Speed: 3x faster per component (15 min → 5 min per screen)
- Design accuracy: 70% → 95% on first implementation
- Designer-developer back-and-forth: Reduced by 60%
- Context cost: ~5% of project token budget per screen
- Quality: Designers reported implementation “feels more faithful to design intent”
Key insight: The 5% token cost (typically 800-1200 tokens per screen) was easily worth the time saved and accuracy gained. One round of designer-developer back-and-forth costs more in time than the token cost of the screenshot.
Team adoption: Within 2 weeks, the entire 6-person team adopted this workflow. The practice spread to other visual tasks: error diagnosis (screenshots of crash logs), architecture discussions (whiteboard photos), and code reviews (annotated screenshots).