Claude Codeのトークンコスト最適化 — /clearとエージェントチームによるコンテキスト管理
概要
Claude Codeのトークンコストをどう管理するかという話。LLMのリクエストには会話履歴が全部含まれるので、実装の後にcommit/pushするだけでも無駄にトークンを消費してしまう。/clearを使うパターンやagent teamsによるコンテキスト分離がどう効くかについて、Claude Codeに聞いてみた会話の記録。
前回の記事(Claude Codeのトークン節約とビッグプラン戦略)の続きのような内容で、もう少し具体的なパターン比較をしている。
me:
I’ve heard LLM requests contains all the history log of that conversation:
If so. is this better to avoid?
=== Pattern A ===
- me: implement something big
- cc: ok working… done.
- me: /commits push
- cc: ok… done.
in this case, is this better?
=== Pattern B ===
- me: implement something big, /commits push
- cc: ok working… done.
Or
=== Pattern C ===
- me: implement something big
- cc: ok working… done.
- /clear
- me: /commits push
- cc: ok… done.
Or, Can Claude Code arrange the request to LLM. so this kind of /commits push are not needed to be taken care of?
Claude Code:
Yes, each LLM request includes the conversation history. Here’s how the three patterns compare:
Pattern A (separate turns) - Most expensive for the commit step
Turn 1: [system prompt] + your request → implementation (large output)
Turn 2: [system prompt] + your request + ALL of turn 1 + "/commits push"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This entire implementation history is sent again
just to run git commands. Wasteful.
Pattern B (combined instruction) - Slightly better
Turn 1: [system prompt] + "implement X, /commits push"
→ implementation happens, then commit/push in same turn
Fewer round-trips, but the commit/push phase still carries the full turn context internally. Slightly cheaper than A because there’s one fewer request.
Pattern C (after /clear) - Most token-efficient
Turn 1: [system prompt] + your request → implementation
/clear (wipes history)
Turn 2: [system prompt] + "/commits push" → only sees git diff, clean context
The commit step starts fresh. It reads git diff and git log to understand what to commit — it doesn’t need the implementation history at all.
Recommendation: Pattern C is the best practice for token savings. /commits only needs git status / git diff / git log — it never needs the implementation conversation.
Does Claude Code optimize this automatically?
No. Claude Code does auto-compress old messages when approaching context limits, but it doesn’t know that a follow-up like /commits is context-independent. The compression is a safety net, not an optimization — it still sends more than necessary.
So /clear before context-independent operations is a good habit:
me: implement something big
cc: done.
/clear
me: /commits push
cc: done.
This is especially impactful when the implementation conversation was long (many tool calls, file reads, edits, etc.), since all of that would otherwise be re-sent just for a few git commands.
me:
wow that’s a big thing. if so… with pattern C, on the refreshed session, Claude Code has no idea about the changes. so it needs to read the files and also need to guess the context for commit message. even if such works are there, it’s still low cost as the token?
Claude Code:
Yes. Even with the “fresh discovery” work, it’s still much cheaper. Here are the rough numbers:
Pattern A (no /clear) — commit step:
| Content | Tokens (rough) |
|---|---|
| System prompt | ~10K |
| Implementation history (reads, edits, tool outputs, errors, retries…) | 50K - 500K+ |
| “/commits push” | tiny |
| git status/diff/log | ~5-20K |
| Total input for commit turn | ~65K - 520K+ |
Pattern C (after /clear) — commit step:
| Content | Tokens (rough) |
|---|---|
| System prompt | ~10K |
| ”/commits push” | tiny |
| git status/diff/log | ~5-20K |
| Total input for commit turn | ~15-30K |
The “discovery work” (git diff, git log) is far smaller than the implementation history it replaces. Even a large diff is typically 5-20K tokens. A moderate implementation conversation is easily 100K+.
What about commit message quality?
git diff tells Claude what changed. git log tells it the project’s commit style. That’s usually enough for a good commit message.
What’s lost is the “why” — your original intent. But Claude Code’s /commits skill already relies heavily on git diff anyway. The implementation history is mostly noise for the commit step (file reads, failed attempts, retries, test outputs…).
If you want the best of both worlds:
(long implementation session)
/clear
me: /commits push. Context: added retry logic to the API client for resilience
cc: done.
One line of context is ~20 tokens. The implementation history it replaces is 100K+. That’s a ~5000x efficiency gain for the “why” information.
me:
Good. next question. on Pattern B, you told me
me: implement something big, /commits push
this is better but /commits needs next operation, so token costs with this. so some long order prompt costs like chaining request? Is that automatically recognized by Claude Code software itself, not by LLM? So… in this case. Maybe Claude Code recognize.
- OK, let’s separate this prompt to…
-
- do implementation
-
- /commits push
-
Claude Code:
No, Claude Code does not automatically split or chain your prompt. The entire thing is sent to the LLM as one request, and the LLM decides what to do.
So with Pattern B:
me: implement something big, /commits push
What actually happens:
- The LLM receives your full prompt
- It starts implementing (reading files, editing, running tests…)
- Each tool call is a round-trip — the growing history is sent back to the LLM each time
- After implementation, the LLM then does the commit/push
- By the time it reaches the commit step, the context is already full of implementation history
There’s no “smart separation” happening at the Claude Code software layer. Claude Code is essentially a loop:
while not done:
send(system_prompt + full_history) → LLM
LLM returns tool calls or text
execute tools, append results to history
Every round-trip carries the full (or compressed) history. Claude Code doesn’t analyze your prompt upfront and split it into independent stages.
So Pattern B saves one user-turn round-trip compared to Pattern A, but the commit step still carries all the implementation context. The savings are minimal.
Pattern C (/clear) is the only way to actually drop the history. It’s a user-initiated action — Claude Code won’t do it for you.
Summary:
| Pattern | Who optimizes? | Savings |
|---|---|---|
| A (separate turns) | nobody | worst |
| B (combined prompt) | nobody (LLM just does both sequentially) | marginal |
| C (/clear between) | you | best |
The takeaway: token optimization is your responsibility as the user. Claude Code doesn’t optimize conversation structure for you.
me:
hum, interesting, but with too many slash-clear, it clears the memory completely. so finally, only the human can handle when to clear, and is it the only way to manage for this token cost reducing in short?
Claude Code:
Yes, exactly. You’re the only one who knows which context is still needed and which can be dropped.
/clear is a blunt tool — it’s all or nothing. There’s no “clear the implementation details but keep the architectural decisions” option.
What you can do in practice:
/clearat natural breakpoints
implement feature A → done → /clear
implement feature B → done → /clear
/commits push
Each implementation starts fresh. Good.
- Re-state what matters after
/clear
(long implementation session)
/clear
me: We just added retry logic to src/api-client.ts. /commits push
Cheap “memory” in ~20 tokens.
- Auto-compression exists, but it’s passive
Claude Code does compress old messages when context gets large, but:
- You can’t control when or what gets compressed
- It’s a safety net, not a strategy
- It still carries more than necessary
The fundamental limitation:
Human brain: knows what context matters → /clear at the right time
Claude Code: has no concept of "this context is no longer useful"
LLM: sees everything it's given, can't ask for less
So yes — in short, you are the context manager. /clear at natural task boundaries is the only real lever you have today for token cost reduction.
me:
well, but we have agent teams. with agent teams, the manager will hold the whole memory (although we can arrange the reports to text files or something like that). but it’s effective for long running operation with less cost, right? for example, if one session holds everything, it will easily consume the tokens as hell
Claude Code:
Exactly right. Agent teams are essentially architectural context isolation.
Single session — everything accumulates:
[system prompt] + [task1 reads/edits/tests] + [task2 reads/edits/tests] + [task3...]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This grows and grows. Every round-trip re-sends it all.
Agent team — each agent has its own isolated context:
Manager context:
[system prompt] + [plan] + [summary from agent1] + [summary from agent2] + ...
~100 tokens ~100 tokens
Agent 1 context (temporary):
[system prompt] + [task1 instructions] + [task1 reads/edits/tests] → dies
Agent 2 context (temporary):
[system prompt] + [task2 instructions] + [task2 reads/edits/tests] → dies
Each child agent is like an automatic /clear boundary. The heavy implementation context (file reads, diffs, test outputs, retries) lives and dies within that agent. The manager only receives a short summary.
Concrete token comparison for a 3-feature implementation:
| Approach | Manager context at end | Total tokens consumed |
|---|---|---|
| Single session | ~300-500K+ (everything) | Very high (re-sent every turn) |
| Agent team | ~30-50K (plan + summaries) | Lower (heavy work is isolated) |
The key insight:
Single session: YOU must /clear to manage context
Agent team: Agent boundaries ARE the /clear
Each agent spawn is a natural context boundary that you don’t have to manage manually. The manager stays lean.
Tradeoffs — it’s not free:
- Each agent pays the system prompt cost (~10K) on spawn
- If the manager micromanages (many back-and-forth messages), its context still grows
- Agent spawning has latency overhead
So it’s most effective when tasks are independent and substantial — worth the spawn cost, and the isolation saves far more than the overhead.
This is exactly why the “big-plan” strategy (break into issues, assign to agents) works well both organizationally and economically.