A friend who builds with Claude Code messaged me at 2am. He runs a small DTC tooling shop, was mid-build on a three-day feature, and had just watched his context gauge climb past 70 percent with two more days of work left on the table. His note: "what do you actually do when the gauge is red and you're not done?"
Here is the exchange that followed, cleaned up a little.
The question a friend asked me at 2am
We went back and forth six times over about an hour. I am transcribing from memory, so the wording is tighter than the original, but the beats are honest. I have used this pattern on my own site through three rounds of a 188-article content shipping project, across three Claude Code sessions in the same day, and I have watched it fail in the specific ways I will describe below.
This is a working pattern, not a finished one. It still costs overhead. That overhead is worth it.
Tokens are a real budget: the Claude Code context window as RAM
The mental model that fixed it for me was simple: tokens are RAM, not disk.
You have a 1M context window in Claude Code. That sounds like a lot. It is a lot, relative to what Claude Sonnet on the API exposes. But a live session at 400K tokens is already halfway to the wall, and once you cross 700K your turn latency climbs, your answers get less precise, and your instinct to /compact becomes the expensive instinct instead of the cheap one.
The second fact that matters is the Anthropic prompt cache has a five-minute TTL. When a session sits idle past that window, the next message reads your full conversation uncached. You pay full tokens to re-hydrate everything you already paid for once. This is not a bug. It is the pricing model. But it means a long-running session that you keep coming back to over a day is literally the worst shape a session can take.
So the rule I settled on: plan your compact boundaries the same way you plan your git commits. Proactive, at seams in the work, not reactive when the gauge turns red.
SESSION_STATE.md: the doc you write before you hit the wall
One markdown file. Predictable path. Updated at natural phase seams.
I keep mine at the root of whatever repo the work lives in, named SESSION_STATE.md or SESSION_STATE_ROUND_N.md if I know I am doing multiple rounds. It gets committed and pushed alongside the last commit of the session. The next agent pulls, the file is there, the context is reconstructable.
The structure that has held up across every round:
- What shipped this session. Commits in order, PR numbers, artifacts produced. The git log is not enough because commit messages alone do not reconstruct intent.
- Bugs caught and fixed. Especially the ones that were not caught by the audit script. Future-you will repeat them otherwise.
- Lessons to promote to memory. Stuff worth turning into a feedback file in the
~/.claude/projects/...tree so it persists beyond this repo. - What is pending or deferred. Including the reason it was deferred, not just the fact.
- Proposed next steps with recommendation. Not "here is a list of possibilities." One clean "do this next."
The discipline is writing it before you feel stuck, not after. By the time you are panic-compacting, you have already lost the detail you would have put in the doc.
The file is working memory you survive across compacts.
RESUME-PASTE.md: the baton you hand the next session
A second file. I call it RESUME-PASTE.md and I keep it in the project root but outside the repo boundary, so it never gets tracked, never gets deployed, and never leaks into the model's working context until I want it to.
What goes in the paste:
- Repo path (absolute)
- Branch name and last commit hash
- Dev server port, if one is running
- First three files the agent should read (the SESSION_STATE doc, any relevant feedback memory file, the skill SKILL.md if the work is skill-driven)
- One paragraph on where we left off
- The next task stated cleanly, like a one-sentence ticket
- The hard rules that must carry over (account-wide feedback that matters for this work)
Length target: 300 to 800 words. Long enough to land the agent on the exact task without it needing to re-read the whole repo. Short enough that you are not burning tokens on a wall of text before work begins.
The first paste handoff I wrote was 1,200 words. It tried to rehydrate the entire prior conversation. I cut it in half, then by another third, and the compressed version worked better because the agent had fewer decoys to chase. The paste is a task briefing, not a transcript.
When to /compact, when to start fresh, when to keep going
No, and the difference matters.
/compact keeps conversation memory but loses precision. The agent remembers the shape of what you did but can lose the specifics of a decision three hours ago. Useful mid-phase when the next step genuinely depends on what you just discussed.
A fresh session via paste handoff drops everything except what lives in the repo and what you put in the paste. Useful between phases when there is no shared working memory you need to preserve. Cleaner. No lingering confusion from a bad earlier turn.
Keep-going without either only works under about 40 percent of the context cap with a clear stopping point in sight. Past that threshold, pick one of the other two.
| Move | When it fits | Cost |
|---|---|---|
| Keep going | Under 40 percent of cap, clear stopping point | Free, if you finish in time |
| /compact | Mid-phase, next step needs recent working memory | Some precision loss, conversation shape intact |
| Fresh + paste | Between phases, no shared working memory needed | Cost of writing the paste, cache miss on first message |
The rule: proactive at seams, reactive only if you missed a seam.
Why intermediate plan docs are cheaper than re-deriving
Plan docs, decision logs, checkpoint notes. All cheap once written, brutal to reconstruct.
The math is the part most people miss. If you have to regenerate an hour-long derivation in a fresh session, it costs you the same tokens as the first pass, plus the context to even ask the right starting question. The planning artifact you wrote the first time is a one-time cost that pays out across every future session that touches the problem.
I have a separate project where I generate Flux hero images locally. The agent maintains a ledger file recording every verdict on every generation: what prompt, what result, the review note, what cues to preserve for the next round. Different problem, same shape. Without the ledger, every new session would re-ask "did we already try this prompt?" With the ledger, the agent reads 200 rows of prior decisions in under 10 seconds and carries on.
Plan docs, session state, resume pastes, verdict ledgers. Variants of the same pattern. Externalize the decisions, so the model does not need to rebuild them.
What this pattern actually cost to build
The first SESSION_STATE.md I wrote took about 15 minutes at the end of a round. The skill that automates most of the scaffolding now ships as part of the Claude Code Skills Pack, but that's hindsight speaking. I was tired, I did not want to do it, and I knew I had to. The second took 5 minutes. By round three it was muscle memory, and the doc basically wrote itself during the last commit anyway because I had been keeping running notes.
The first paste handoff I wrote was too long. Second one I cut to half. Third one was 400 words and landed the fresh agent cleanly on the next task with no wasted tokens on rehydration.
Payoff: round 2 and round 3 of a 188-article content build ran end-to-end on fresh Claude Code sessions via paste handoff. Zero mid-task compact panics. I did not lose the thread once across three rounds, same day, same branch, same repo.
What it taught me: context discipline is a first-class skill, same tier as git discipline or test discipline. The people who ship three-day agent builds without collapsing are the ones who treat their context window like a physical resource. Everyone else spends their afternoons arguing with an agent that forgot why it was here.
If you want the full playbook around running agent-assisted builds as a solo operator, I keep the handbook I work from updated. The math on what a single agent task actually burns is in the token budget piece, and the cache TTL economics are in the prompt caching article. The full tooling bundle is packaged as the Operator's Stack, and the parallel-dispatch tradeoffs that pair with this pattern are covered in the parallel versus sequential piece. When the session you handed off dispatches a writer fleet, grep your MDX component props before the fleet fans out so the audit doc inherits into the fresh context intact. The handoff pattern sits inside that stack.
“Context discipline is a first-class skill, same tier as git discipline or test discipline.
”
FAQ
How big should SESSION_STATE.md actually be?
Mine run 200 to 500 lines. Longer than that and I am over-documenting. Shorter than 100 lines and I probably missed something the next session needs. The five-section structure (shipped, bugs, lessons, pending, next) self-limits the length.
Does the paste handoff work across different models or only inside Claude Code?
I only use it inside Claude Code. The paste depends on the agent being able to read files at absolute paths and run bash, which is the Claude Code harness, not a generic chat interface. If you are building an agent on the API yourself, the same pattern applies, you just load the paste as a system or first-user message.
Why not just run /compact and keep going in the same session?
Compact keeps the conversation shape but loses precision on earlier decisions. That is fine mid-phase. Between independent phases, a fresh session with a paste is cleaner, because the old conversation had noise and bad turns you do not want the agent to carry forward. The trick is knowing which move fits the seam you are at.
Where should RESUME-PASTE.md live?
Project root, outside the repo. I keep mine one directory up from the tracked codebase, so it never gets committed and never leaks into deploys. It is local working-memory, not source. When I pick up work, I open it, copy, paste into fresh Claude Code, and go.
How do you decide when to trigger the compact pattern proactively?
Four signals: after a major multi-agent batch completes, when context crosses ~40 percent of cap and more work is clearly coming, between planning and execution of a large feature, and between independent clusters of work that do not need shared context. I do not trigger it mid-agent-batch or mid-checklist where losing working memory costs more than the compact saves.
Sources and specifics
- The 188-article content build shipped across three Claude Code sessions on michaeldishmon.com over 2026-04-23 and 2026-04-24 using the SESSION_STATE + paste handoff pattern.
- Anthropic's prompt cache TTL is 5 minutes per published platform behavior, which is why a long-idle session pays a full cache miss on wake-up.
- The 40 percent proactive-compact threshold is a working rule on a 1M cap, not a public benchmark, and is based on the paste handoff being a cheap operation and the cache-miss cost climbing non-linearly past halfway.
- Paste handoff length target of 300 to 800 words is derived from three rounds of iteration. The first was 1,200 words and landed worse than the 400-word version because it gave the agent more decoys to chase.
- All code references are inside Claude Code's harness, not the generic API. The same pattern applies on the API but the load mechanism differs (system message or first-user message instead of paste-at-prompt).
