Agent Workflows That Stick: Short Threads, External Memory, Staged Diffs

TL;DR:

Start new threads around 50–100k tokens; beyond ~100k quality degrades
External memory (context.md) beats long threads every time
Git staging area + tight feedback loops = quality at speed

📖 Quick Start

Who it’s for: Developers with first wins ready to level up

Time to complete: 20 minutes reading + practice

Prerequisites: First win completed

Expected outcome: Repeatable workflow patterns that avoid context rot

Next step: Master power patterns

📋 Workflow Checklist

Start new threads for each distinct task

Write decisions to .agents/context/ files

Use feedback loops (tests, builds, screenshots)

Stage good changes, discard bad ones

Fork threads to explore alternatives

Add tests to verify fixes/features

Monitor token usage, start fresh around 50-100k

The Problem with Long Threads

You: “Fix the authentication bug”
Agent: ✅ Done

You: “Now add logging”
Agent: ✅ Done

You: “Refactor the API client”
Agent: ✅ Done

You: “Add tests”
Agent: 🤔 Forgets the auth fix, breaks the logger, introduces new bugs

What happened? Context sprawl. Beyond ~100k tokens, quality degrades—the model:

Forgets early instructions
Loses track of file state
Enters “doom loops” (tries the same failed fix repeatedly)
Produces lower-quality code

The Desk Analogy

Think of your thread as a desk:

Clean desk: One task, relevant files, clear goal—productive
Messy desk: 10 tasks, 50 files, vague goals—chaos

The fix: Clear your desk often. Start new threads.

Start New Threads Often

When to start a new thread:

Every distinct task (bug fix, feature, refactor)
After 5-10 agent turns in a complex conversation
When switching to a different part of the codebase
When the agent starts “forgetting” earlier context
When you want to try a different approach

How to carry context forward:

Use Handoff to extract only relevant context into a fresh thread
Reference external memory (context.md files)
Link to previous threads if needed

Example:

Thread 1: Fix authentication bug → Commit
Thread 2: Add logging to auth flow → Commit
Thread 3: Refactor API client → Commit
Thread 4: Add tests for all of the above → Commit

Each thread: focused, verifiable, committable.

External Memory: The context.md Pattern

Instead of: Keeping everything in thread context Do this: Write it down, reference it later

Pattern:

You: "Summarize the key decisions and constraints from this
      conversation to .agents/context/auth-refactor.md"

[Agent writes context.md]

[Start new thread]

You: "Read @.agents/context/auth-refactor.md and continue
     adding the JWT middleware"

What to store:

Architectural decisions
Constraints and requirements
API contracts and schemas
Command sequences that work
Known pitfalls and workarounds

Naming convention: Name files predictably: .agents/context/<topic>.md (e.g., auth-refactor.md, api-migration.md)

Benefits:

Fresh context per thread
Searchable, version-controlled knowledge
Faster onboarding for future threads
Clear separation between tasks

Feedback Loops: Reproducible Scripts

The pattern:

goal → action → verify → adjust → repeat

Make verification automatic:

For tests:

npm test -- @tests/auth.test.ts

For builds:

npm run build && npm run check

For UI:

npm run storybook
# Agent looks at localhost:6006 and screenshots

For databases:

psql -d mydb -c "SELECT * FROM users LIMIT 5;"
# Agent verifies schema changes

Why this matters:

Agent can self-correct
You get high confidence in results
No “looks good, ship it” only to find it broken

Git Discipline: Your Quality Gate

The workflow:

Agent makes changes
Review diff (git diff or VS Code)
Stage good changes: git add <file>
Discard bad changes: git restore <file>
Repeat until satisfied
Commit staged changes

Why this is powerful:

No fear of bad code surviving
Easy to cherry-pick good changes
Version control is your undo/redo
Forces you to review carefully

Pro tip: Use git staging interactively:

git add -p  # Review changes chunk by chunk

Forking Threads: Try Different Approaches

Scenario: Agent suggests approach A, but you want to try B.

Don’t: Keep arguing in the same thread Do: Fork the thread

How to fork: Use the Fork action in VS Code (right-click message → “Fork from here”) or CLI

Benefits:

Explore alternatives without losing context
Compare approaches side by side
Easy to discard the one that doesn’t work

Confirm the Fix: Add/Modify Tests

Pattern:

You: "Fix the authentication bug AND add a test that
verifies the fix. Show me the test passing."

Why:

Proves the fix works
Prevents regressions
Forces the agent to understand the problem deeply
Gives you confidence to ship

For bug fixes:

"Write a test that reproduces the bug, verify it fails,
then fix the bug and verify the test passes."

For features:

"Implement the feature and add tests that cover the
happy path and two error cases."

💡 Token Management: Token sprawl degrades quality beyond ~100k tokens. Cap exploration, focus reads, prune logs, and start new threads often. For comprehensive token hygiene strategies and mode selection, see Power Patterns

Real Example: Refactoring a Component

Bad approach (one long thread):

You: Refactor UserProfile to use hooks
Agent: [makes changes]
You: Add loading states
Agent: [makes changes]
You: Add error boundaries
Agent: [makes changes, context at 120k tokens]
You: Add tests
Agent: [breaks the refactor, forgets loading states]

Good approach (multiple focused threads):

Thread 1:

You: Refactor @components/UserProfile.tsx to use hooks instead of class
     syntax. Verify existing tests still pass.
Agent: [refactor + verify]
You: [Review, stage, commit]

Thread 2:

You: Add loading and error states to @components/UserProfile.tsx.
     Write tests for both states.
Agent: [implement + test]
You: [Review, stage, commit]

Thread 3:

You: Add error boundary around @components/UserProfile.tsx.
     Add a test that verifies it catches render errors.
Agent: [implement + test]
You: [Review, stage, commit]

Result: Three clean commits, each verifiable, no context rot.

Takeaways

Patterns that work:

Start new threads for each distinct task
Write decisions to context.md files
Use feedback loops (tests, builds, screenshots)
Stage good changes, discard bad ones
Fork threads to explore alternatives
Always add tests to verify fixes/features

Patterns that fail:

Reusing threads until context sprawls
Relying on thread memory for critical decisions
No verification step in prompts
Committing without reviewing diffs
Letting token count climb unchecked

This Week: Practice These 3 Workflows

🔨 Try It Now: Three Workflow Exercises

Exercise 1: External Memory Extraction

Prompt:
Extract the key architectural decisions from this conversation
to .agents/context/architecture.md. Include constraints,
patterns we're using, and what to avoid.
Verification: File exists, readable by a fresh thread Success criteria: Start a new thread, reference the file, agent understands context

Exercise 2: Fork and Compare

Prompt in Thread A:
Refactor this component using approach A (hooks)
Then fork thread and prompt in Thread B:
Refactor this component using approach B (composition)
Verification: Compare diffs side by side Success criteria: Pick the better approach, discard the other

Exercise 3: Test-First Fix

Prompt:
Write a test that reproduces the bug in [component].
Verify it fails. Then fix the bug and verify the test passes.
Verification: See test fail, then pass Success criteria: Bug fixed with proof via passing test

Expected outcome: These patterns become muscle memory for your daily workflow.

Exercise 1: Short Thread Discipline

Task: Fix a bug or implement a small feature in a new thread, commit, close

Prompt:

"Fix [specific bug] in a fresh thread. Run tests to verify the fix, then show me the diff."

Success criteria:

Thread stays under 50k tokens
Clean, focused commit
No leftover uncommitted changes
Tests pass

Exercise 2: External Memory

Task: Create .agents/context/decisions.md for your main project

Prompt:

"Review the architecture of [your project] and create .agents/context/decisions.md
documenting 3-5 key architectural decisions with rationale. Include: tech choices,
API patterns, and known constraints."

Success criteria:

File contains 3-5 documented decisions
Each decision includes rationale
File is committed and version-controlled
You can reference it in future threads

Exercise 3: Test-Driven Fix

Task: Fix a bug AND verify it with a test

Prompt:

"Fix [specific bug]. First write a test that reproduces the bug and verify it fails.
Then fix the bug and verify the test passes. Commit both together."

Success criteria:

Test fails before fix
Test passes after fix
Both test and fix committed together
No other changes in the commit

Next: Amp Power Patterns — When to use subagents, Oracle, Librarian, and Rush mode for maximum leverage.

Practice Path:

Related:

Coding with Agents in 2025 — The complete overview