zpaper-draft

Type to search...

to open search from anywhere

Claude Codeのトークンコスト最適化 — /clearとエージェントチームによるコンテキスト管理

概要

Claude Codeのトークンコストをどう管理するかという話。LLMのリクエストには会話履歴が全部含まれるので、実装の後にcommit/pushするだけでも無駄にトークンを消費してしまう。/clearを使うパターンやagent teamsによるコンテキスト分離がどう効くかについて、Claude Codeに聞いてみた会話の記録。

前回の記事(Claude Codeのトークン節約とビッグプラン戦略)の続きのような内容で、もう少し具体的なパターン比較をしている。


me:

I’ve heard LLM requests contains all the history log of that conversation:

Claude Codeのトークン節約とビッグプラン戦略

If so. is this better to avoid?

=== Pattern A ===

  • me: implement something big
  • cc: ok working… done.
  • me: /commits push
  • cc: ok… done.

in this case, is this better?

=== Pattern B ===

  • me: implement something big, /commits push
  • cc: ok working… done.

Or

=== Pattern C ===

  • me: implement something big
  • cc: ok working… done.
  • /clear
  • me: /commits push
  • cc: ok… done.

Or, Can Claude Code arrange the request to LLM. so this kind of /commits push are not needed to be taken care of?


Claude Code:

Yes, each LLM request includes the conversation history. Here’s how the three patterns compare:

Pattern A (separate turns) - Most expensive for the commit step

Turn 1: [system prompt] + your request → implementation (large output)
Turn 2: [system prompt] + your request + ALL of turn 1 + "/commits push"
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                          This entire implementation history is sent again
                          just to run git commands. Wasteful.

Pattern B (combined instruction) - Slightly better

Turn 1: [system prompt] + "implement X, /commits push"
        → implementation happens, then commit/push in same turn

Fewer round-trips, but the commit/push phase still carries the full turn context internally. Slightly cheaper than A because there’s one fewer request.

Pattern C (after /clear) - Most token-efficient

Turn 1: [system prompt] + your request → implementation
/clear  (wipes history)
Turn 2: [system prompt] + "/commits push" → only sees git diff, clean context

The commit step starts fresh. It reads git diff and git log to understand what to commit — it doesn’t need the implementation history at all.

Recommendation: Pattern C is the best practice for token savings. /commits only needs git status / git diff / git log — it never needs the implementation conversation.

Does Claude Code optimize this automatically?

No. Claude Code does auto-compress old messages when approaching context limits, but it doesn’t know that a follow-up like /commits is context-independent. The compression is a safety net, not an optimization — it still sends more than necessary.

So /clear before context-independent operations is a good habit:

me: implement something big
cc: done.
/clear
me: /commits push
cc: done.

This is especially impactful when the implementation conversation was long (many tool calls, file reads, edits, etc.), since all of that would otherwise be re-sent just for a few git commands.


me:

wow that’s a big thing. if so… with pattern C, on the refreshed session, Claude Code has no idea about the changes. so it needs to read the files and also need to guess the context for commit message. even if such works are there, it’s still low cost as the token?


Claude Code:

Yes. Even with the “fresh discovery” work, it’s still much cheaper. Here are the rough numbers:

Pattern A (no /clear) — commit step:

ContentTokens (rough)
System prompt~10K
Implementation history (reads, edits, tool outputs, errors, retries…)50K - 500K+
“/commits push”tiny
git status/diff/log~5-20K
Total input for commit turn~65K - 520K+

Pattern C (after /clear) — commit step:

ContentTokens (rough)
System prompt~10K
”/commits push”tiny
git status/diff/log~5-20K
Total input for commit turn~15-30K

The “discovery work” (git diff, git log) is far smaller than the implementation history it replaces. Even a large diff is typically 5-20K tokens. A moderate implementation conversation is easily 100K+.

What about commit message quality?

git diff tells Claude what changed. git log tells it the project’s commit style. That’s usually enough for a good commit message.

What’s lost is the “why” — your original intent. But Claude Code’s /commits skill already relies heavily on git diff anyway. The implementation history is mostly noise for the commit step (file reads, failed attempts, retries, test outputs…).

If you want the best of both worlds:

(long implementation session)
/clear
me: /commits push. Context: added retry logic to the API client for resilience
cc: done.

One line of context is ~20 tokens. The implementation history it replaces is 100K+. That’s a ~5000x efficiency gain for the “why” information.


me:

Good. next question. on Pattern B, you told me

me: implement something big, /commits push

this is better but /commits needs next operation, so token costs with this. so some long order prompt costs like chaining request? Is that automatically recognized by Claude Code software itself, not by LLM? So… in this case. Maybe Claude Code recognize.

  • OK, let’s separate this prompt to…
      1. do implementation
      1. /commits push

Claude Code:

No, Claude Code does not automatically split or chain your prompt. The entire thing is sent to the LLM as one request, and the LLM decides what to do.

So with Pattern B:

me: implement something big, /commits push

What actually happens:

  1. The LLM receives your full prompt
  2. It starts implementing (reading files, editing, running tests…)
  3. Each tool call is a round-trip — the growing history is sent back to the LLM each time
  4. After implementation, the LLM then does the commit/push
  5. By the time it reaches the commit step, the context is already full of implementation history

There’s no “smart separation” happening at the Claude Code software layer. Claude Code is essentially a loop:

while not done:
    send(system_prompt + full_history) → LLM
    LLM returns tool calls or text
    execute tools, append results to history

Every round-trip carries the full (or compressed) history. Claude Code doesn’t analyze your prompt upfront and split it into independent stages.

So Pattern B saves one user-turn round-trip compared to Pattern A, but the commit step still carries all the implementation context. The savings are minimal.

Pattern C (/clear) is the only way to actually drop the history. It’s a user-initiated action — Claude Code won’t do it for you.

Summary:

PatternWho optimizes?Savings
A (separate turns)nobodyworst
B (combined prompt)nobody (LLM just does both sequentially)marginal
C (/clear between)youbest

The takeaway: token optimization is your responsibility as the user. Claude Code doesn’t optimize conversation structure for you.


me:

hum, interesting, but with too many slash-clear, it clears the memory completely. so finally, only the human can handle when to clear, and is it the only way to manage for this token cost reducing in short?


Claude Code:

Yes, exactly. You’re the only one who knows which context is still needed and which can be dropped.

/clear is a blunt tool — it’s all or nothing. There’s no “clear the implementation details but keep the architectural decisions” option.

What you can do in practice:

  1. /clear at natural breakpoints
implement feature A → done → /clear
implement feature B → done → /clear
/commits push

Each implementation starts fresh. Good.

  1. Re-state what matters after /clear
(long implementation session)
/clear
me: We just added retry logic to src/api-client.ts. /commits push

Cheap “memory” in ~20 tokens.

  1. Auto-compression exists, but it’s passive

Claude Code does compress old messages when context gets large, but:

  • You can’t control when or what gets compressed
  • It’s a safety net, not a strategy
  • It still carries more than necessary

The fundamental limitation:

Human brain:  knows what context matters → /clear at the right time
Claude Code:  has no concept of "this context is no longer useful"
LLM:          sees everything it's given, can't ask for less

So yes — in short, you are the context manager. /clear at natural task boundaries is the only real lever you have today for token cost reduction.


me:

well, but we have agent teams. with agent teams, the manager will hold the whole memory (although we can arrange the reports to text files or something like that). but it’s effective for long running operation with less cost, right? for example, if one session holds everything, it will easily consume the tokens as hell


Claude Code:

Exactly right. Agent teams are essentially architectural context isolation.

Single session — everything accumulates:

[system prompt] + [task1 reads/edits/tests] + [task2 reads/edits/tests] + [task3...]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                   This grows and grows. Every round-trip re-sends it all.

Agent team — each agent has its own isolated context:

Manager context:
  [system prompt] + [plan] + [summary from agent1] + [summary from agent2] + ...
                              ~100 tokens            ~100 tokens

Agent 1 context (temporary):
  [system prompt] + [task1 instructions] + [task1 reads/edits/tests] → dies

Agent 2 context (temporary):
  [system prompt] + [task2 instructions] + [task2 reads/edits/tests] → dies

Each child agent is like an automatic /clear boundary. The heavy implementation context (file reads, diffs, test outputs, retries) lives and dies within that agent. The manager only receives a short summary.

Concrete token comparison for a 3-feature implementation:

ApproachManager context at endTotal tokens consumed
Single session~300-500K+ (everything)Very high (re-sent every turn)
Agent team~30-50K (plan + summaries)Lower (heavy work is isolated)

The key insight:

Single session:  YOU must /clear to manage context
Agent team:      Agent boundaries ARE the /clear

Each agent spawn is a natural context boundary that you don’t have to manage manually. The manager stays lean.

Tradeoffs — it’s not free:

  • Each agent pays the system prompt cost (~10K) on spawn
  • If the manager micromanages (many back-and-forth messages), its context still grows
  • Agent spawning has latency overhead

So it’s most effective when tasks are independent and substantial — worth the spawn cost, and the isolation saves far more than the overhead.

This is exactly why the “big-plan” strategy (break into issues, assign to agents) works well both organizationally and economically.