132 lines
4.8 KiB
Markdown
132 lines
4.8 KiB
Markdown
---
|
|
name: systematic-debugging
|
|
description: Four-phase debugging methodology. Always find root cause before attempting fixes.
|
|
source: obra/superpowers (pinned 2026-03-19)
|
|
---
|
|
|
|
# Systematic Debugging
|
|
|
|
## Overview
|
|
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
|
|
|
|
**Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
|
|
|
|
## The Iron Law
|
|
```
|
|
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
|
|
```
|
|
|
|
If you haven't completed Phase 1, you cannot propose fixes.
|
|
|
|
## When to Use
|
|
Use for ANY technical issue:
|
|
- Test failures
|
|
- Bugs in production
|
|
- Unexpected behavior
|
|
- Performance problems
|
|
- Build failures
|
|
- Integration issues
|
|
|
|
**Use this ESPECIALLY when:**
|
|
- Under time pressure (emergencies make guessing tempting)
|
|
- "Just one quick fix" seems obvious
|
|
- You've already tried multiple fixes
|
|
- Previous fix didn't work
|
|
- You don't fully understand the issue
|
|
|
|
## The Four Phases
|
|
You MUST complete each phase before proceeding to the next.
|
|
|
|
### Phase 1: Root Cause Investigation
|
|
**BEFORE attempting ANY fix:**
|
|
|
|
1. **Read Error Messages Carefully**
|
|
- Don't skip past errors or warnings
|
|
- They often contain the exact solution
|
|
- Read stack traces completely
|
|
- Note line numbers, file paths, error codes
|
|
|
|
2. **Reproduce Consistently**
|
|
- Can you trigger it reliably?
|
|
- What are the exact steps?
|
|
- Does it happen every time?
|
|
- If not reproducible → gather more data, don't guess
|
|
|
|
3. **Check Recent Changes**
|
|
- What changed that could cause this?
|
|
- Git diff, recent commits
|
|
- New dependencies, config changes
|
|
- Environmental differences
|
|
|
|
4. **Gather Evidence in Multi-Component Systems**
|
|
For EACH component boundary:
|
|
- Log what data enters component
|
|
- Log what data exits component
|
|
- Verify environment/config propagation
|
|
- Check state at each layer
|
|
Run once to gather evidence showing WHERE it breaks.
|
|
|
|
5. **Trace Data Flow**
|
|
- Where does bad value originate?
|
|
- What called this with bad value?
|
|
- Keep tracing up until you find the source
|
|
- Fix at source, not at symptom
|
|
|
|
### Phase 2: Pattern Analysis
|
|
**Find the pattern before fixing:**
|
|
|
|
1. **Find Working Examples** — Locate similar working code in same codebase
|
|
2. **Compare Against References** — Read reference implementation COMPLETELY
|
|
3. **Identify Differences** — List every difference, however small
|
|
4. **Understand Dependencies** — Settings, config, environment, assumptions
|
|
|
|
### Phase 3: Hypothesis and Testing
|
|
**Scientific method:**
|
|
|
|
1. **Form Single Hypothesis** — "I think X is the root cause because Y"
|
|
2. **Test Minimally** — Make the SMALLEST possible change to test
|
|
3. **Verify Before Continuing** — Did it work? Yes → Phase 4. No → NEW hypothesis
|
|
4. **When You Don't Know** — Say "I don't understand X". Don't pretend.
|
|
|
|
### Phase 4: Implementation
|
|
**Fix the root cause, not the symptom:**
|
|
|
|
1. **Create Failing Test Case** — Simplest possible reproduction
|
|
2. **Implement Single Fix** — ONE change at a time, no "while I'm here" improvements
|
|
3. **Verify Fix** — Test passes? No other tests broken? Issue resolved?
|
|
4. **If Fix Doesn't Work** — If < 3 attempts: return to Phase 1. If ≥ 3: STOP and question the architecture
|
|
5. **If 3+ Fixes Failed: Question Architecture** — Is this pattern fundamentally sound? Discuss with human before more fixes.
|
|
|
|
## Red Flags - STOP and Follow Process
|
|
- "Quick fix for now, investigate later"
|
|
- "Just try changing X and see if it works"
|
|
- "Add multiple changes, run tests"
|
|
- "It's probably X, let me fix that"
|
|
- "I don't fully understand but this might work"
|
|
- Each fix reveals new problem in different place
|
|
|
|
**ALL of these mean: STOP. Return to Phase 1.**
|
|
|
|
## Common Rationalizations
|
|
| Excuse | Reality |
|
|
|--------|---------|
|
|
| "Issue is simple, don't need process" | Simple issues have root causes too. Process is fast for simple bugs. |
|
|
| "Emergency, no time for process" | Systematic debugging is FASTER than guess-and-check thrashing. |
|
|
| "Just try this first" | First fix sets the pattern. Do it right from the start. |
|
|
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
|
|
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
|
|
|
|
## Quick Reference
|
|
| Phase | Key Activities | Success Criteria |
|
|
|-------|---------------|------------------|
|
|
| **1. Root Cause** | Read errors, reproduce, check changes, gather evidence | Understand WHAT and WHY |
|
|
| **2. Pattern** | Find working examples, compare | Identify differences |
|
|
| **3. Hypothesis** | Form theory, test minimally | Confirmed or new hypothesis |
|
|
| **4. Implementation** | Create test, fix, verify | Bug resolved, tests pass |
|
|
|
|
## Real-World Impact
|
|
- Systematic approach: 15-30 minutes to fix
|
|
- Random fixes approach: 2-3 hours of thrashing
|
|
- First-time fix rate: 95% vs 40%
|
|
- New bugs introduced: Near zero vs common
|