Claude Codeのトークン節約とビッグプラン戦略

概要

Claude Codeの長いセッションでトークンが爆発的に増える問題を理解し、それを解決するための/big-planスキルを作った話。

自分でチマチマテキストエディタみたいなものを作っているのだけれども、追加しようとしていた実装内容があまりにも巨大だったので、1回のClaude Codeセッションで深く探索してepic issueとchild issuesを作らせ、それを元に実装させることにした。

これはなんとなくそういう風にやった方が良いっていう雰囲気だったり、実務上もまぁそうするよねという経験からそうしていて、開発を続けている方であればまぁ分かるであろう話ではあると思うのだけれど、そう言えば……と思ってClaude CodeやChatGPTに聞いたら、え？そうだったの！？っていう発見があったのでその記録。

注記

この記事で紹介しているスキルは claude-resources リポジトリで公開している。記事中のリンクは執筆時点のコミットに固定してある。最新版とは異なる場合がある。

計画セッションの後の疑問

まぁそんなわけでissueを大量に作らせた直後、ふと疑問が。自分はagent teamsでどわーっと実装させる一連のフローを/x-wt-teams スキルとして抽象化していて、その巨大実装の相談をしたセッションでそのままやるべきか、いや、ここで一旦新しくセッションを始めた方が良いのか？と。

ここで、一応二つの考えがあると思う。それは以下。

ここで相談したセッションは既にこの実装や、既存コードについて深く読み込んでいる。どうせこの先の細かい実装は子エージェントがやるわけだし、このまま進めた方が良い
現時点で実装計画はissueに分割されている。現状のセッションは不要な調査や検討が積み重なった状態であるため、新しいセッションを始めた方が良い

これってどっちが良いと思います？自分はどっちもどっちかなーと思ってたんですが、先に正解を書いてしまうと、トークン使用量的には圧倒的に後者の方が良かったんですよ。

まぁもっと突っ込んで言うと、そのissueをかなり細かく書かせ、それを元に新しいセッションを始めるというのがベストのようでして。

……それでまぁ話は戻り、issueを作った直後、自分はXで見た以下の投稿を思い出してました。

I stopped hitting Claude’s usage limits - 10 things I changed

それでClaude Codeに聞いてみた。（別のセッションで）

Takazudo

see this issue: (private repo, URL omitted)

This is what I ordered a Claude Code session for — big planning. I let the session make a plan through deep exploration, and had it create:

epic issue
child issues

Recently, I read this article. Tell me if you can’t access it.

https://x.com/0x_kaize/status/2038286026284667239

From this article, we should complete the session soon with small topics, because we’ll consume tokens more and more if we continue the long conversation. Is this… maybe on every prompt input, we’ll use the full context we created before it?

If this is true, I think we should create tiny issues and do each with agent teams. Is this correct?

これはClaude Codeに計画を立てさせた結果です（issueのURLはダミー）。大きな計画を立てるために、コードベースを深く探索させ、以下を作らせました。

epic issue
child issues

最近このポストを読みました。アクセスできなかったら教えてください。

https://x.com/0x_kaize/status/2038286026284667239

このポストによると、長い会話を続けるとトークンをどんどん消費していくため、小さなトピックでさっさとセッションを終わらせるべきとのこと。これって……プロンプトを送るたびに、それ以前に作ったコンテキスト全体を使うということですよね？

もしそうなら、小さなissueをたくさん作って、それぞれをagent teamsで処理すべきだと思うんですが、これで合ってますか？

Claude Code

Can’t access the X post (402 - requires authentication). Please paste the article content if you want me to reference it directly.

That said, I can answer your question clearly based on how LLMs work.

Yes, your understanding is correct.

Every time you send a prompt in a Claude Code session, the entire conversation history up to that point is sent to the model. So in a long planning session:

Prompt 1 costs: system prompt + message 1
Prompt 2 costs: system prompt + message 1 + response 1 + message 2
Prompt N costs: everything accumulated so far + message N

This compounds quickly. A deep-exploration planning session that uses 100k tokens of context means every subsequent prompt in that session costs 100k+ tokens as a baseline.

Your proposed strategy is correct and is the right approach:

One dedicated planning session — do the deep exploration, produce the epic + child issues. This session can be long and expensive, but it’s a one-time cost.
Fresh session per child issue — each child issue gets its own session starting from zero context.
Agent teams for parallelizable issues — issues with no dependencies can run in parallel worktrees simultaneously using /x-wt-teams.

The dependency graph already maps this well. For example:

Schema issue → must go first, solo
Team CRUD + Dashboard scaffold → can run in parallel after schema
Dependent issues → can run in parallel after Team CRUD

So yes: small issues + fresh sessions per issue + agent teams for independent ones = the right architecture.

X投稿にはアクセスできませんでした（402 - 認証が必要）。直接参照したい場合は内容をペーストしてください。

それでも、LLMの仕組みに基づいて明確にお答えできます。

はい、あなたの理解は正しいです。

Claude Codeのセッションでプロンプトを送るたびに、その時点までの会話履歴_全て_がモデルに送られます。長い計画セッションでは以下のようになります：

プロンプト1のコスト：システムプロンプト + メッセージ1
プロンプト2のコスト：システムプロンプト + メッセージ1 + レスポンス1 + メッセージ2
プロンプトNのコスト：それまでに積み重なったもの全て + メッセージN

これはすぐに増幅していきます。100Kトークンのコンテキストを使う深い探索セッションでは、以降の全プロンプトが基準として100K+トークンのコストになります。

あなたが提案した戦略は正しく、適切なアプローチです：

専用の計画セッション1回 — 深い探索を行い、epicとchild issues（#543〜#555など）を作成。このセッションは長くなりコストがかかっても、一回限りのコストです。
child issueごとにフレッシュなセッション — 各child issueはゼロコンテキストから始まる独自のセッションを持ちます。
並列化できるissuesにはagent teams — 依存関係のないissuesは/x-wt-teamsを使って別々のworktreeで同時並行処理できます。

あなたのissue #543を見ると、依存グラフがすでにうまくまとめられています。例えば：

#544 (スキーマ) → 最初に単独で処理
#545 (Team CRUD) + #550 (Dashboardの足場) → #544の後に並列処理可能
#546、#548 → #545の後に並列処理可能

つまり：小さなissues + issueごとにフレッシュなセッション + 独立したissuesにはagent teams = 正しいアーキテクチャです。

Xの投稿の内容

Xの投稿をClaude Codeに共有した。

Takazudo

nice. here is the post of x

（@0x_kaize の投稿全文を貼り付け）

なるほど。ではXの投稿を貼ります。

（@0x_kaize の投稿全文を貼り付け）

Claude Code

This applies directly to Claude Code. Every tool call (file read, grep, bash) + its result gets added to history. A deep planning session that reads 30 files and runs 20 commands burns context extremely fast — and each subsequent prompt re-reads all of it.

Your strategy is correct. Here’s the key formula from the post:

Token cost per message = all previous messages + new message Total = S × N(N+1)/2

At 500 tokens per exchange:

10 messages → 27.5k tokens
20 messages → 105k tokens
30 messages → 232k tokens

Practical rules for each child issue session:

Start with just the issue spec pasted in — don’t recap the whole planning history
Let the agent read only the files it actually needs (not a broad exploration)
If the session gets long mid-issue, /commits → new session → paste a short summary of where you left off
Use Haiku for quick tasks (linting, formatting, simple edits) — save Sonnet/Opus for architecture decisions

The child issues are already scoped small enough that each should be completable in well under 20 exchanges. That’s the right granularity.

共有ありがとうございます。このポストはあなたの直感を裏付けています。キーとなる公式：

1メッセージあたりのトークンコスト = これまでのメッセージ全て + 新しいメッセージ合計 = S × N(N+1)/2

1回のやり取りが500トークンの場合：

10メッセージ → 27.5Kトークン
20メッセージ → 105Kトークン
30メッセージ → 232Kトークン

これはClaude Codeに直接当てはまります。 ツール呼び出し（ファイル読み込み、grep、bash）とその結果は全て履歴に追加されます。30ファイルを読み込んで20コマンドを実行する深い計画セッションは、コンテキストを非常に速く消費します——そして以降のプロンプトはその全てを再読み込みします。

各child issueセッションの実用的なルール：

issue specだけを貼り付けてスタート — 計画の全履歴を振り返らない
エージェントに必要なファイルだけを読ませる（広範な探索はしない）
セッションが長くなったら /commits → 新しいセッション → どこまで終わったかの短いサマリーを貼り付け
素早いタスク（リント、フォーマット、単純な編集）にはHaikuを使い、アーキテクチャの判断にはSonnet/Opusを使う

#543のchild issuesはすでに20回以下のやり取りで完了できる粒度にスコープされています。これが適切な粒度です。

全コンテキスト再読の確認

ここまでの説明で理解はできたが、もう一度確認した。

Takazudo

so… if we continue the conversation super long, the model will consume “All what we talked before that” on every prompt, right?

つまり……もし会話をものすごく長く続けると、毎回のプロンプトで「それ以前に話したこと全て」を消費することになるんですよね？

Claude Code

Yes, exactly.

Every prompt you send includes the entire conversation history — not just your new message. The model re-reads everything from message 1 to now, every single time.

So in this very conversation right now:

The GitHub issue content we fetched
The full X post you pasted
All my explanations
All your questions

…all of that gets re-sent on every new prompt. This conversation is already fairly long.

For Claude Code it’s even worse because tool results (file contents, grep output, bash output) are also part of the history. One Read on a 500-line file = 500 lines added to every future prompt in this session.

Practical takeaway for your workflow:

Once you’ve finished planning in a session → close it. The value of the session was the output (the issues), not the conversation itself. The issues are your “saved state.” Each new implementation session starts fresh with just: the issue spec + the files it needs to touch. Nothing more.

The session is a working memory, not a log. When the work is captured externally (in git, in issues), the session has no more value — let it go.

はい、その通りです。

送信するプロンプトには毎回会話履歴全体が含まれています——新しいメッセージだけではありません。モデルは毎回、メッセージ1から現在までの全てを再読み込みします。

ですからこの会話でも今この瞬間：

取得したGitHub issueの内容
貼り付けていただいたXの投稿全文
私の説明全て
あなたの質問全て

……これらが全て、新しいプロンプトのたびに再送されています。この会話はもうかなり長くなっています。

Claude Codeではさらに悪化します。なぜならツールの結果（ファイル内容、grepの出力、bashの出力）も履歴の一部になるからです。500行のファイルへのRead 1回 = このセッション以降の全プロンプトに500行が追加されます。

ワークフローへの実用的な示唆：

セッションで計画が終わったら → 閉じてください。セッションの価値はアウトプット（issues）にあり、会話そのものにはありません。issuesがあなたの「セーブデータ」です。新しい実装セッションはクリーンに始めてください：issue specと変更が必要なファイルだけ。それ以上は不要です。

セッションは作業記憶であってログではありません。作業がgitやissueとして外部に保存された時点で、セッションに価値はない——手放してください。

/big-plan スキルの作成

「セッションは作業記憶であってログではない」。これはなるほどという感じだった。

セッションの価値は、その中の会話ではなく、アウトプット（issueやコミット）にある。計画セッションが終わったら閉じる。実装セッションはクリーンに始める。この分断を強制する仕組みが欲しくなった。そこで /big-plan スキルを作ることにした。

Takazudo

wow that’s very important info. so can you make a skill ONLY for planning? /big-plan does exactly this job — it should separate the big implementation into a collection of tiny pieces. then make one epic issue and children issues. like this:

[{impl-title}][Epic] Foo bar feature adding
[{impl-title}][Sub] create top page

for naming, sub issue needs to start with:

- https://github.com/org/repo-name/issues/245

---

linked to the epic issue. and describe it is a part of the epic issue.

on the epic issue side, describe that this is an epic issue, make sure to close sub issues when each implementation is done.

this epic issue should be done with /x or /x-wt-teams most of the time. in this flow, make a base branch like base/foo-bar-implementation, and each sub issue’s implementation should target this base branch. the manager session merges everything into the base branch at last, and makes one big PR against the original base branch (mainly main). this detailed flow is written in /x-wt-teams. so /big-plan should do only the planning, make issues, and suggest the user to start a clean session with /x-wt-teams. what do you think?

それはとても重要な情報ですね。では計画専用のスキルを作ってもらえますか？/big-planがまさにこの仕事を担うスキルです。大きな実装を分割し、小さなタスクの集合にして、epic issueとchild issuesを作るものです。こんな感じで：

[{impl-title}][Epic] Foo bar feature adding
[{impl-title}][Sub] create top page

命名について、sub issueの冒頭は以下のようにする必要があります。

- https://github.com/org/repo-name/issues/245

---

epic issueへのリンクを張り、それがepic issueの一部であることを説明してください。

epic issue側には、これがepic issueであることを明記し、各実装完了後にsub issueを必ずcloseするよう書いてください。

このepic issueは多くの場合、/xまたは/x-wt-teamsで処理されます。このフローでは、base/foo-bar-implementationのような名前のbase branchを作り、各sub issueの実装はそのbase branchをターゲットにします。managerセッションが最終的に全てをbase branchにマージし、元のbase branch（主にmain）に対して大きなPRを1つ作ります。この詳細なフローは/x-wt-teamsに記述されているので、/big-planセッションは計画だけを行い、issuesを作成し、ユーザーに/x-wt-teamsでクリーンなセッションを始めるよう提案してください。どう思いますか？

Claude Code

Great idea. The concept is solid. Let me build it.

（スキル作成後）

Done. /big-plan is live.

What it does:

Explore — reads relevant code deeply (this is the expensive one-time cost)
Draft breakdown — small, parallelizable sub-tasks
Confirm with user — shows the plan before touching GitHub
Create epic issue — [{Impl Title}][Epic] ...
Create sub-issues — [{Impl Title}][Sub] ..., each starting with the epic link + ---
Update epic — fills in all sub-issue URLs
End the session — prints summary and says “start fresh with /x-wt-teams {epic-url}”

Usage:

/big-plan add team management and workspace sharing

The skill explicitly enforces the token discipline we discussed — it names the fresh session handoff as a required step, not a suggestion.

いいアイデアですね。コンセプトはしっかりしています。作りましょう。

（スキル作成後）

完成しました。/big-planが使えるようになりました。

できること：

探索 — 関連コードを深く読み込む（一回限りの高コスト作業）
分割案の作成 — 小さく並列化可能なサブタスクに
ユーザーに確認 — GitHubに触れる前に計画を提示
epic issueを作成 — [{Impl Title}][Epic] ...
sub-issueを作成 — [{Impl Title}][Sub] ...、それぞれepicリンク + ---で始まる
epicを更新 — 全sub-issueのURLを書き込む
セッションを終了 — サマリーを出力し「/x-wt-teams {epic-url}でフレッシュなセッションを始めてください」と伝える

使い方：

/big-plan add team management and workspace sharing

このスキルは話し合ったトークンの規律を明示的に強制します——フレッシュなセッションへの引き継ぎを「提案」ではなく「必須ステップ」として位置づけています。

まとめ

まぁそういうわけで、この後も周辺skillをチョイチョイ整理していたわけだけれど、その後ChatGPTにも同じことを聞いてみた。諸々略すけども以下のように似た答えが返ってきた。

Takazudo

（S × N(N+1)/2）の話だけど、まぁ最適化がどう行われているかとかは置いておくとして、3回目のプロンプトだと、1回目の会話、2回目の会話もフルで読み込み直すってこと？

ChatGPT

はい、その理解で基本的に正しいです。

結論だけ明確に言うと：

3回目のプロンプトでは、「1回目 + 2回目 + 3回目」の全文（コンテキスト）をまとめて読み直します。

もう少し正確に言うと、モデルは「履歴を覚えている」わけではなく、毎回、与えられたテキスト全体を一度に処理しているという動きです。

3回目のときの実態

あなたが3回目に送ると、内部ではだいたいこうなっています：

[システム指示]
[1回目: ユーザー]
[1回目: アシスタント]
[2回目: ユーザー]
[2回目: アシスタント]
[3回目: ユーザー] ← 今ここ

これを丸ごと1つの入力として処理します。

なぜこうするのか

LLMは基本的に「状態を持たない（ステートレス）」「前回の記憶を内部に保持しない」ので、毎回「必要な情報を全部もう一度渡す」必要があります。

だから起きること

これがまさにあの式の元ネタで、1回目 → 小さい、2回目 → ちょっと増える、3回目 → さらに増える（過去全部込み）、→ 合計すると N²っぽく増えます。

重要な補足（誤解しやすい点）

❌ 「前の会話を覚えている」→ そうではない

✅ 「毎回、履歴を再入力している」→ これが正しい

直感的なたとえ

かなり極端に言うと：毎回「これまでの会話ログを全部コピペしてから質問している」のと同じです。

この経験を経て、うわ、そうだったのか……となりまして。

と言うのは、自分はこの話を聞いたことがあって、それは直接ClaudeのAPIを叩くとき。そういう風に今までの会話を全て送るんだよという実装をClaude Codeに聞いて、フーンそんなもんかー。あーいやこれはAPIだとそうしないと会話として維持できないってことか。いやー非効率だけどしょうがない。うーんClaude Codeの場合は覚えてくれてるから賢いねェ〜〜ヨーシヨシヨシと、まぁそういう風に認識していたんですが、これは自分はとんだ勘違いヤローだったってことみたいです。

つまり、我々がClaude Codeとおしゃべりしているみたいに、さも一言二言のキャッチボールをしているだけに見えても、その裏ではそれまでの会話を全部送ってるってことなんですね……。いやーびっくり。だから、最後に「ありがとうね〜」みたいなことも全然言う意味が無いというか、そこでその会話の中での最大級のトークンを最後に振り絞って使い、「ありがとうね〜」を送っているということみたいです……。

いやなんかそう考えると、1プロンプトに複数のオーダーを詰めた方が良いなとか、あれやってこれやってissueにコメントしといて、ついでにこれってどういうこと？質問3つ──みたいなのを1プロントプトで投げた方が良いってことみたいですね。

まぁ実際はその全てをフルパワーで送っているわけでは無くて、重複する部分を削ったり、要約したり、そういうソフトウェア的な絶え間ない努力の結果が我々が使っているClaude CodeだったりChatGPTだったりするらしいんですが（なので「ありがとうね〜」みたいな挨拶はもしかしたら省エネ処理されている可能性がある）、そう言えばChatGPTに長々と思考メモみたいなのを聞かせ続け、あとで文字に起こせといったら出来ないと言われ、ふざけんなとか怒ってたことを思い出してしまいましたね。自分が仕組みを分かってないだけじゃんねという。

ただそういう、かなり無駄のある構造をしていつつも、会話というインターフェースにするという判断をしているのだという背景に微妙に関心すらしました。そこがAIを人の手から使いやすい形にするためにたどり着いている答えなのかという。