How to Save Tokens on Claude in 2026
60 field-tested tips across Claude Chat, Cowork, and Claude Code – from quick wins any user can apply in under a minute to the context-engineering moves power users run daily.
Every few weeks another wave of posts hits X and Reddit claiming someone figured out how to save tokens on Claude. Some of the tips are real. Some are noise. A lot of them only matter if you are running Claude Code inside a giant monorepo – which is not how most people use Claude. (Token efficiency is also where Codex sweeps Claude in the data: see Claude Code or Codex: 10,000 Reddit posts analyzed for the comparison.)
This guide sorts through all of it. I pulled tips from Anthropic’s own documentation, developer blogs, GitHub gists, Reddit threads, and X posts, then organized them by who they actually help. Beginner Chat users. Cowork and Teams users working across an organization. Claude Code developers. And the community secrets that are genuinely clever once you understand what they are doing.
The framework below is simple. Match the tip to your setup. Apply the top three in your category. Watch your usage drop.
Why Saving Tokens on Claude Matters More in 2026
Usage is the single most common complaint I see from people who upgraded to Pro or Max. They opened a long planning chat in the morning, got to work, and by lunchtime Claude told them they were rate limited until 4pm. This is not a bug. It is the direct math of how large language models work.
Every message you send to Claude, the model re-reads the entire conversation up to that point before replying. A chat that has been running for two hours is not spending tokens the way a brand-new chat is. It is paying the cost of every single previous turn, every attached document, every bit of tool output, every time.
That math compounds. A 20-turn conversation where each turn adds 2,000 tokens to context is not paying 40,000 tokens to get to turn 20. It is paying something closer to 20 x (20 + 1) / 2 x 2,000 = 420,000 tokens worth of reads by the time you finish. This is why long sessions feel like they hit a wall: because they do.
Saving tokens is not about being cheap. It is about staying in flow for longer. The people who get the most out of Claude in 2026 are not the ones with the biggest plans. They are the ones who understand where tokens go and have built habits to avoid burning them in the places that do not matter.
Usage Limits vs. Context Limits: The Distinction That Matters
Before any of the 60 tips make sense, you need to separate two things that get conflated constantly online.
| Concept | What It Is | Where You Hit It |
|---|---|---|
| Usage limit | Your time-based budget across all Claude surfaces combined. A rolling window measured in tokens. | The “You have reached your limit” banner after hours of work. Reset on a timer (typically 5 hours for Pro). |
| Context window | The maximum size of a single conversation. How much Claude can hold in working memory at once. | The “Long chat: things may get slower” warning inside a single session. |
Most of the tips in this guide help with both, but some are specifically aimed at one or the other. When you see a tip that talks about /clear or starting a fresh chat, that is a context tip. When you see a tip about model selection or subagents, that is a usage tip.
I run Claude against the same codebase every day. My instinct used to be: keep one long chat going because it “knows the project.” That habit cost me at least an hour a day in hitting usage limits by 3pm. I moved to a single Project with instructions that summarize the project, and fresh chats for each new task. Usage dropped by roughly 40 percent in a week. Same output quality.
The 60 Tips, Organized by Your Setup
The filters below let you narrow the 60 tips to your specific setup. Click Beginner if you use Claude Chat. Click Intermediate if you are on Teams, Work, or Cowork. Click Advanced if you use Claude Code. Click Secrets if you want the aggressive community hacks.
The filter bar stays pinned to the top of your screen as you scroll. If you land in a section whose tips are filtered out, a reveal banner will offer to show them anyway.
Jump to a section
- Why saving tokens on Claude matters in 2026
- Usage limits vs. context limits
- The compounding cost of long conversations (chart)
- 15 Claude Chat tips for everyday users
- 15 Cowork and Teams tips for business users
- 15 Claude Code tips for developers
- 15 community secrets from Reddit, GitHub, and X
- The 10 highest-impact actions, ranked
- Frequently asked questions
- Related Claude reading on this site
- Sources and further reading
The Compounding Cost of Long Conversations
The chart below shows the real cost of adding more turns to the same chat. Each turn assumes 2,000 tokens of new context. The “cumulative” bar is what Claude actually processes, not what you typed.
How cumulative token cost grows with chat length
Per-turn adds 2,000 tokens. Cumulative is N x (N+1) / 2 x per-turn – what Claude processes, not what you typed.
The takeaway: a 60-turn conversation does not cost 60 times a one-turn conversation. It costs roughly 120 times. This is the single biggest reason /clear is the most valuable command in Claude Code and “start a fresh chat” is the most valuable habit in Claude Chat.
15 Claude Chat Tips for Everyday Users (Beginner)
If you use Claude through the web app, mobile app, or desktop app, this section is for you. None of the tips below require a developer setup. The biggest gains come from just three of them – planning your prompt, editing instead of replying, and starting fresh chats. The other twelve compound from there.
Plan the whole ask before you send
High impactMost token waste in Chat comes from incomplete first prompts. You ask for a draft, Claude misunderstands the audience, you correct it, Claude rewrites it, you ask for a different format, and now five turns deep you have spent maybe ten times the tokens you would have spent if you had said it all in one prompt.
How to apply: Before hitting send, ask yourself: audience, tone, format, length, must-include, must-avoid. Put all of those in the first prompt.
Example: Instead of “write me a LinkedIn post about AI agents,” write: “Write a 180-word LinkedIn post about AI agents for a technical audience. Tone: matter-of-fact, no hype words. Open with a specific failure case from production. End with a one-line takeaway. No emoji.”
Be specific and concise in every prompt
Medium impactVague prompts produce wandering, generic, long-winded outputs – and you pay output tokens for every word. A specific prompt produces a specific reply.
How to apply: Replace open phrases like “help me with” or “tell me about” with verbs like “write,” “list,” “compare,” “convert,” or “extract.”
Example: “Help me with Python” becomes “Write a Python function that parses a CSV and returns rows where column X is greater than 100.”
Batch related questions into one turn
High impactEvery new turn pays the cost of re-reading everything that came before. Three separate questions about the same document is three full re-reads. One numbered question is one re-read.
How to apply: When you have multiple questions about the same context, number them inside one prompt: “Three things on this draft – 1) is the opening hook strong, 2) which paragraph should be cut, 3) what is the best title.”
Edit your prompt instead of typing a follow-up
High impactThis is the single most underused feature in Claude Chat. The pencil-edit icon next to your message lets you change what you sent and re-run from that point – without keeping the failed attempt in context.
How to apply: When Claude misunderstands or you spot a typo in your own prompt, edit and resubmit instead of typing “no, I meant X.”
“No, I meant X” loads three new things into context: your follow-up, Claude’s apology turn, and Claude’s second attempt. Editing replaces all three with one corrected attempt.
Start a fresh chat the moment your topic shifts
High impactIf you finished writing an email and now you want to debug a Python script, do not keep going in the same chat. The email context is now dead weight that Claude re-reads on every Python turn.
How to apply: Treat new topics like new browser tabs. The keyboard shortcut for a new chat is faster than scrolling up to remind yourself where you are.
Use chat search and memory before re-asking
Medium impactIf Claude already generated a snippet, command, or summary you need again, search your chat history for it instead of asking Claude to regenerate it. Output tokens cost the same regardless of whether you have seen them before.
How to apply: Use the search field in the chat sidebar. Most browsers also support Ctrl+F or Cmd+F inside an open chat.
Use Projects for any recurring topic
High impactProjects give you a dedicated space with persistent instructions and uploaded knowledge that every chat in that project inherits. Anthropic’s caching means project knowledge is read more efficiently than pasted text.
How to apply: If you have asked Claude the same kind of question more than three times, you should have a Project for it. Marketing, coding, parenting logistics – whatever it is.
Example: A “Brand Voice” project with your style guide uploaded once means every blog draft you ever write inherits the same voice without you pasting the guide.
Upload core documents into Project Knowledge once
High impactPasting the same PDF into every new chat is one of the most expensive habits in Claude Chat. Project Knowledge is built to be referenced cheaply across every chat in that project.
How to apply: Add anything you reference more than twice – brand guidelines, API specs, a long contract, a research paper – into the Project Knowledge section.
I worked with a content team that pasted their 14-page brand voice guide into every chat. After moving it into Project Knowledge, their average chat length dropped by 60 percent and the team stopped hitting weekly limits. Same output. Same workflow. One upload.
Put recurring rules into project instructions
Medium impactIf you find yourself adding “respond in markdown, no preamble” or “always show your reasoning” to every prompt, put it in the project’s instruction field once and stop typing it.
How to apply: Open the project, click Instructions, and add your standing rules. Keep them under 500 words – this text is loaded into every chat.
Use clear filenames so Claude can find the right doc
Low-medium impact“doc1.pdf” forces Claude to pull the file in to identify it. “Q3_Financial_Report.pdf” is identifiable from the filename alone.
How to apply: Rename uploads using the pattern topic_period_version.pdf before adding them to a project.
Group related files; split unrelated work into separate projects
Medium impactOne project with your job’s docs, your novel draft, and your kid’s school forms is a mess. Each new chat in that project may pull from any of those documents whether you want it to or not.
How to apply: One project per domain. Pay the small cost of clicking between them. The token savings show up immediately.
Ask for terse output: “bullets only,” “no preamble,” “decision only”
High impactOutput tokens are usually the bigger spender, not input. A reply that opens with “Of course! I’d be happy to help with that. Here is a thoughtful answer…” costs you tokens for content you do not want.
How to apply: Append output rules to your prompts. The shorter, the better.
Output rules: no preamble, no closing pleasantries, no caveats. Direct answer only. Use bullets. Cap response at 200 words.Ask for targeted edits, not full rewrites
High impactIf you have a 2,000-word essay and one paragraph is off, asking Claude to “rewrite the whole thing with a tighter intro” makes Claude regenerate 2,000 words. Asking for “a tighter version of paragraph 2 only, returned alone” makes Claude generate 80.
How to apply: Specify exactly what you want returned. “Return only the changed paragraph” is the magic phrase.
Pick a lighter model when the task is simple
High impactOpus is the right tool for hard reasoning, code architecture, or nuanced writing. It is the wrong tool for “summarize this email,” “convert these dates to ISO format,” or “give me three subject line options.” Sonnet and Haiku do those tasks faster, cheaper, and just as well.
How to apply: Use the model picker in the chat input. Default to Sonnet 4.6. Drop to Haiku for trivial transformations. Reserve Opus for when you genuinely need its reasoning. If you want Opus-quality answers on a Sonnet budget, see our writeup on Claude’s Advisor Strategy.
Watch your usage settings; spend extra usage intentionally
High impactPro and Max plans have a Usage panel that shows your current rolling consumption. If you can see you are at 80 percent and you have a four-hour stretch ahead, that is the moment to switch to Sonnet, clear chats, and queue smaller asks.
How to apply: Bookmark the usage settings page. Glance at it before starting a heavy task. If you hit the wall anyway, our guides on what to do when you hit the message limit and how to fix the “conversation too long” error cover the recovery paths.
Across maybe a dozen people I have helped tune their Claude habits, the three tips that move the needle hardest are #1 (plan the prompt), #4 (edit instead of follow up), and #5 (fresh chat for new topics). If you only adopted those three this week, you would feel the difference by Friday.
15 Cowork and Teams Tips for Business Users (Intermediate)
Once Claude is being used by more than one person inside an organization, the math changes. Now you are not just managing your own usage – you are managing the team’s combined draw on a shared pool. The tips below are from the Anthropic Teams documentation, the Cowork documentation, and a few patterns I have seen work cleanly in marketing and ops teams.
Create one shared project per workstream, not per person
Medium impactIf five people each have their own “Marketing” project with overlapping uploads, your team is paying five times the storage and zero times the cache benefit. One shared project per initiative consolidates context and lets the cache do its job.
How to apply: Pick the workstream (a campaign, a product launch, a quarterly report). Create one Project. Invite the whole team. Make it the canonical workspace.
Reuse a shared knowledge base instead of re-uploading the same files
High impactThe most common Cowork waste pattern: every team member has their own copy of the brand guidelines, the API spec, the org chart. Each upload is a separate token cost. A shared project with one canonical version solves this.
How to apply: Designate an owner per shared knowledge base. Their job is to keep one authoritative version uploaded.
Store team conventions in project instructions once
Medium impactEvery team has tribal knowledge – “we use Jest not Mocha,” “we always reply in markdown,” “we never use hype words in client comms.” Putting that into project instructions means nobody has to retype it.
How to apply: Run a 30-minute meeting where the team agrees on 5-10 standing rules. Lead writes them into project instructions.
Use public projects for reusable org-wide knowledge
Medium impactHR docs, brand guidelines, the AI usage policy itself – any document that anyone in the org might need – belongs in a public project. This stops fifteen people from each uploading the employee handbook.
How to apply: Set Project Visibility to Public for general resources. Use a clear naming convention so people can find them.
Use private projects for sensitive work
Low impact (security)Not strictly a token tip – but private projects keep the right context with the right people. Token waste from accidentally over-broad sharing is real but usually small.
How to apply: Default to Private for legal, executive, or HR-sensitive material. Add only the people who need it.
Share project access instead of pasting long chat transcripts
Medium impactPasting a full 30-message debugging session to a coworker so they can “catch up” sends them a wall of text and forces Claude to re-read all of it on their first reply. Sharing the actual chat or the project skips both costs.
How to apply: Use the Share Chat or Share Project link instead of copy-paste.
Remember chats in shared projects are still private by default
Low impactThis trips people up. A shared Project does not mean shared chats. Each person’s chats inside the project are private until explicitly shared. This is mostly a discoverability tip, but it prevents teammates from accidentally pulling someone else’s full chat history into their own context.
How to apply: Train team members to share specific chats only when relevant.
Use descriptive filenames so RAG can find the right file
Medium impactProject Knowledge in Cowork uses retrieval-augmented generation (RAG) to pull only the relevant chunks of large documents. The retrieval system works much better with descriptive filenames than with “scan_001.pdf.”
How to apply: Enforce a naming convention before upload. Pattern: area_topic_period.ext. Example: Auth_Architecture_v2.pdf.
Group related docs together; split noisy or stale content apart
High impactRAG retrieval gets noisier as a project’s knowledge base grows. If half the documents are outdated, irrelevant retrievals burn tokens and waste replies.
How to apply: Audit project files monthly. Remove deprecated docs. Move stale-but-historically-useful docs into a separate “archive” project.
Teams that “just upload everything” think they are being thorough. They are actually degrading retrieval. RAG works on signal-to-noise. More docs is not better. The right docs is better.
Let RAG handle large knowledge instead of forcing giant prompts
High impact (up to 10x capacity)Trying to paste a 200-page manual into a chat will fail at the context window. Putting the same manual into Project Knowledge lets RAG retrieve only the few chunks Claude needs for a given question – effectively giving you ~10x the practical capacity.
How to apply: If a document is over ~30 pages or 50 KB of text, it belongs in Project Knowledge, not the prompt body.
Assign Premium seats only to true power users
High impact (org cost)Most orgs over-provision Premium seats. The marketing director who uses Claude twice a week does not need the same seat as the engineer running it eight hours a day. Premium economics work in your favor only when the user actually consumes the entitlement.
How to apply: Pull usage analytics from the Admin Console quarterly. Move low-consumption users to Standard. Save the Premium budget for the people who genuinely need it.
Enable extra usage with an org-wide spend cap
High impactExtra usage is the safety valve – it lets work continue past the plan limit. Without a cap, that safety valve becomes a surprise invoice. With a cap, it is exactly what it should be: a controlled overflow.
How to apply: In the Admin Console, set an org-level extra-usage limit. Pick a number you can defend in a budget review.
Add seat-level and user-level spend limits, then review MTD spend regularly
High impactOrg-wide caps are the floor. Per-seat caps are how you catch one user with a runaway Cowork task burning through the whole pool.
How to apply: Set a per-seat extra-usage limit. Build a recurring monthly review of spend by user. Flag anomalies the same week they happen, not at quarter end.
A small ops team had a single Cowork user who set up a recurring scheduled task to “summarize all their Asana tickets” – and never noticed the task was iterating across 4,000+ tickets every six hours. The team’s monthly extra-usage budget burned in three days. Per-seat caps would have surfaced this on day one.
Use Cowork only for complex multi-step tasks; keep simple work in normal chat
High impactCowork is built for agentic work – it spawns subagents, runs scheduled tasks, and orchestrates multi-step actions. All of that is more compute-intensive than a normal chat reply. Using Cowork to ask “what is a regex for an email” is using a freight train to deliver a postcard.
How to apply: Cowork for: file orchestration, multi-document analysis, scheduled reports, anything that needs to take real action across systems. Normal chat for: drafting, reading, single-question answers.
In Cowork, rely on persistent folders, instructions, and scheduled tasks instead of re-briefing
High impactCowork’s folder instructions and global instructions persist across runs. Scheduled tasks let you set “do X every Monday” instead of typing the brief each week. Both reduce repetitive input tokens drastically.
How to apply: For any recurring Cowork job, define folder instructions once. For anything that runs on a schedule, configure the scheduled task instead of running it manually.
One Cowork folder per recurring deliverable – “Weekly Marketing Report,” “Monthly Competitor Scan,” “Daily Inbox Triage.” Each folder has its own instructions, its own schedule, and a strict cap on how much it can spend. Setup takes an afternoon. The savings are visible in the first week. For a deeper breakdown of which Cowork automation mode to pick for which job, see our guide to Claude Routines vs Desktop Scheduled Tasks vs /loop.
15 Claude Code Tips for Developers (Advanced)
Claude Code is where token consumption becomes a first-class engineering concern. The CLI gives you direct visibility into context, model selection, and tooling – which means it also gives you direct ways to cut costs. The 15 tips below are the ones I run on a daily basis. They are pulled from the official Claude Code documentation and the better community write-ups. If you want to forecast a session’s cost before you start it, check our Claude Code token calculator.
Keep /cost and /context in your daily loop
High impactYou cannot manage what you cannot see. /cost shows you what your session has consumed so far. /context shows you what is currently loaded into the window – including how much each MCP server, skill, and file is contributing.
How to apply: Run /context once at session start, after you load any new files, and any time something feels slow. If you want a forecast before the session begins, the Claude Code token calculator estimates consumption based on repo size and task type.
/contextUse /clear between unrelated tasks
High impactThis is the single most valuable command in Claude Code. The compounding cost from earlier in the article is exactly what /clear fixes. New task = new context.
How to apply: When you finish a task or pivot to a new file, run /clear before starting the next one. The friction of re-loading the relevant files is much smaller than the cost of dragging the old context along.
Compact proactively at logical breakpoints
High impact/compact summarizes the conversation history into a compressed form, freeing up context while preserving the important decisions. The trick is to run it at logical milestones – not when the bar is already red.
How to apply: Compact after finishing a feature, before switching to tests, after closing out a PR. Not in the middle of a single thought.
Pass focus instructions to /compact and add custom rules in CLAUDE.md
Medium impactDefault compaction is generic. Telling /compact what to preserve gives you control over what survives the summary.
How to apply: Use /compact Focus on architectural decisions and unresolved bugs. For repos where compaction matters, add a “compact preferences” block to CLAUDE.md.
/compact Focus on the API changes, the failing tests, and any TODOs we left in the codeKeep CLAUDE.md lean – only broad, always-relevant rules
High impactCLAUDE.md is read on every single turn. Every line in it is a recurring tax. A 10,000-token CLAUDE.md is paying that tax 50 times in a 50-turn session.
How to apply: Audit CLAUDE.md quarterly. Anything that is not “must apply on every turn” gets moved to a skill, a doc, or a sub-CLAUDE.md.
Move specialized instructions out of CLAUDE.md and into skills
High impactSkills load on demand. CLAUDE.md loads always. Instructions that only apply to “running a database migration” or “writing a release note” do not need to be in your context for every code review.
How to apply: For each section of CLAUDE.md, ask: would I be okay if Claude needed to actively pull this in when relevant? If yes, it is a skill candidate.
Use skills for on-demand domain knowledge and repeatable workflows
Medium impactSkills are how you make Claude Code feel like a team member who knows your specific repo without forcing every chat to load that knowledge by default.
How to apply: Create a .claude/skills/ directory. One skill per workflow. PR descriptions, release notes, migration runbooks, deployment scripts – each gets its own skill.
Default to Sonnet; escalate to Opus only when the problem truly needs it
High impactSonnet 4.6 is shockingly capable. For most engineering tasks – bug fixing, test writing, refactoring, code review – it is the right model. Opus is for the architectural calls and the genuinely hard reasoning.
How to apply: Set Sonnet as your default with /model sonnet. Switch up to Opus only when you hit something Sonnet visibly struggles with.
Lower effort or thinking budget for simpler work
High impactExtended thinking is excellent for hard problems and expensive for easy ones. If you are running lint fixes, the model does not need to plan deeply.
How to apply: Use /effort low or set MAX_THINKING_TOKENS=8000 for routine work. Bump it up for design or debugging.
Run subagents on cheaper models when possible
High impactThe subagent model can be set independently from the main model. For exploration tasks – “find all references to X” or “summarize what these 30 files do” – Haiku is plenty.
How to apply: Set CLAUDE_CODE_SUBAGENT_MODEL=haiku. Your main session keeps Sonnet or Opus quality. Subagents get done cheap and fast.
export CLAUDE_CODE_SUBAGENT_MODEL=haikuDelegate verbose investigations, tests, and log processing to subagents
High impactThe biggest context killer is dumping a 10,000-line log into the main session. Subagents read the noise, return a summary, and the main context never sees the raw mess.
How to apply: When you have a long log, a flaky test run, or a giant grep result, instruct Claude to use a subagent to process it.
Use a subagent to read the test output and tell me which tests failed and why - do not paste the raw output.Prefer CLI tools over MCP servers when a CLI already exists
Medium impactMCP servers load their full schema into context. A typical MCP server can cost 5,000+ tokens of schema before you have used it for anything. gh or aws in a bash call costs the bash tool definition – which is already loaded – and nothing more.
How to apply: If a CLI exists for the job, use the CLI. Reserve MCP servers for tools that genuinely need the structured tool-use interface.
Disable unused MCP servers and keep the active set small
High impactEvery connected MCP server adds its tool schemas to your prompt every turn. Five MCP servers you are not using today is five tool schema costs you are paying for nothing.
How to apply: Use /mcp to see what is loaded. Disable anything you are not actively using in this session.
Use hooks to pre-filter logs, test output, and machine noise before Claude reads them
High impactHooks let you intercept tool inputs and outputs. A PreToolUse hook that runs grep -A 5 ERROR on test output before Claude sees it can turn a 50,000-token log into a 500-token summary.
How to apply: Write hooks for the noisiest tools in your stack – test runners, deploy scripts, anything that produces a wall of output.
Use plan mode, verification targets, and early rewinds to prevent expensive wrong turns
High impactThe most expensive thing Claude Code does is build the wrong thing for an hour. Plan mode forces alignment before code is written. Verification targets (a failing test, a clear acceptance criterion) give Claude a checkpoint. Esc+Esc rewinds to before the mistake started.
How to apply: Use /plan for anything non-trivial. Always provide a test case. Rewind early instead of trying to “patch” a wrong direction.
1) CLAUDE_CODE_SUBAGENT_MODEL=haiku in my shell init. 2) Default model: Sonnet. 3) MCP servers: only the two I am actively using today, the rest disabled. 4) /context after every meaningful change. 5) /clear the moment I am moving on. None of this is exotic. All of it together is the difference between hitting limits at 11am and finishing the day with budget to spare.
15 Community Secrets From Reddit, GitHub, and X
These tips are aggregated from Reddit’s r/ClaudeCode, GitHub gists, and X posts from people running Claude at scale. Some are elegant. Some are aggressive. All of them should be benchmarked on your workflow before you adopt them permanently.
Ultra-terse CLAUDE.md – only if output verbosity is your real bottleneck
Medium impactAdding strict output rules (“no pleasantries, bullets only, terse”) to CLAUDE.md is the fast way to cut output tokens. But it comes at a cost – you pay for those rules on every input turn.
How to apply: Only adopt if your bottleneck is clearly output (long responses), not input (long context). Benchmark before committing.
Benchmark your own terse rules – community saw 63% shorter output
Medium impactA popular GitHub benchmark showed that a custom “terse” CLAUDE.md cut output tokens by 63 percent on the same prompts. That is real savings – but it also changed the character of the responses. Make sure your tasks can tolerate the change.
How to apply: Run the same representative prompts with and without your terse rules. Measure output tokens. Decide whether the quality trade-off is worth it for your workflow.
A tiny root CLAUDE.md can outperform pasting rules every session
Medium impact (~17.4% cost-to-green)The same community benchmark showed that a focused 7-line root CLAUDE.md reduced “cost-to-green” (total tokens until the task was done) by about 17.4 percent. Small rules, big impact.
How to apply: Keep your root CLAUDE.md to 7 lines of high-impact rules. Push everything else to sub-CLAUDE.md files or skills.
Keep CLAUDE.md under ~5k tokens; spill cold knowledge into docs/
High impactThe pattern I see across the best-performing repos: main CLAUDE.md is the “daily driver” rules. Archive, historical context, and future plans live in a docs/ folder that Claude can pull from on demand but does not carry in context by default.
How to apply: Run wc -w CLAUDE.md periodically. If it is over ~3,500 words, start pruning or splitting.
Layer global, project, and subdirectory CLAUDE.md files
Medium impactGlobal preferences (your tone, your style) go in ~/.claude/CLAUDE.md. Project-wide rules go in the repo root. Frontend-specific rules go in frontend/CLAUDE.md. Claude Code automatically layers these and only loads what is relevant to the directory you are working in.
How to apply: Use the three-level hierarchy: global, project, sub-project. Do not duplicate rules across levels.
Compact every ~40 messages and write a session_summary.md
High impact (~45k tokens reclaimed)One of the better Reddit writeups showed a user regaining ~45,000 tokens of headroom after running /compact at a 40-message checkpoint and dumping the summary to a file. The file becomes a “memory” Claude can re-load cleanly later.
How to apply: Set a mental counter – every 40 turns, compact and save. Treat compaction as a routine, not an emergency.
End each session with /compact, append to docs/progress.md
High impactEnd-of-day routine: compact, pipe the summary to docs/progress.md, clear. Tomorrow starts fresh with a one-file summary of where you left off. No more “let me scroll up to remember.”
How to apply: Build a bash alias or a skill for this. Run it at the end of every session.
Start new sessions by loading only the files you truly need
High impactLetting Claude do a directory scan “to get oriented” is fine in an exploratory chat. It is wasteful in a targeted fix. If you know the bug is in src/auth/middleware.ts, start with @src/auth/middleware.ts and nothing else.
How to apply: Use @file on prompt 1. Only expand if the session genuinely needs more files.
Store state in lightweight YAML frontmatter / mini metadata blocks
High impactIf Claude needs to know the status of a module (“migration applied: yes, tests passing: no, last deployed: 2026-04-18”), a 10-line YAML block is dramatically cheaper than expecting Claude to infer this by reading code.
How to apply: Add YAML frontmatter to any file where status matters. Treat it like the file’s vital signs.
Don’t preload every MCP server or skill; bash and file discovery are cheaper
High impactThe “more tools = more capability” instinct is backwards at scale. Every dormant tool is paying schema cost on every turn. Bash and grep already handle most discovery cases at zero extra schema cost.
How to apply: Start sessions with the minimum tool set. Add tools only when you hit a specific need.
Offload huge scans to cheap sidecars; feed Claude the summary
High impact“Summarize this 100-file repo” is a prompt that will eat your context. A 20-line Python script using a cheap model (Haiku via the API) that scans the repo and returns a structured summary is ~1/30 the cost.
How to apply: For heavy scanning jobs, script them outside Claude Code. Run Haiku to scan. Pipe the structured summary back in.
Put explicit model and effort values in skill / agent frontmatter
Medium impactIf you have a formatting skill, it does not need Opus. Pin the model in the skill’s frontmatter and it will run on Haiku regardless of what the rest of your session uses.
How to apply: Add model: haiku and effort: low to the YAML frontmatter of utility skills.
Treat long sessions as compounding cost
Very high impactThe math from earlier in this guide – N x (N+1) / 2 – is not a metaphor. It is the actual shape of how history cost grows. The correct response is aggressive resets, not clever mid-conversation pivots.
How to apply: When you feel a session “slowing down,” that is the model telling you it is overloaded with history. /clear is cheaper than “powering through.”
Telegraphic / “caveman” output can save output tokens
Low-medium impactOne genuinely odd trick that showed up in community benchmarks: adding “respond in telegraphic/caveman style” to CLAUDE.md for specific automation tasks. The output loses all prose – which is exactly what you want when you are piping Claude’s output to a downstream script.
How to apply: Use only for structured data / pipeline tasks where prose is waste. Not for human-facing replies.
Trim giant JSON / API payloads before Claude sees them
High impactAPIs return huge nested JSON. Most of it is noise you do not need. jq (or a similar selector) filters to only the fields that matter before the payload reaches Claude’s context.
How to apply: Learn three jq patterns. Use them in any hook that handles API output.
curl -s https://api.example.com/users | jq '.[] | {id, email, status}'About 40 percent of the “token saving tips” I see on social media are real. Another 40 percent are situational – they help if you are running a specific kind of workflow. The last 20 percent are placebo. Benchmark before you adopt anything permanently. The savings that come from habits (tips 1 to 45) will always matter more than any one trick.
The 10 Highest-Impact Actions (If You Only Do These)
- Clear aggressively. Use
/clearor start a fresh chat the moment your topic shifts. History is a compounding tax. - Compact proactively. Do not wait for auto-compaction. Run
/compactat every logical milestone. - Delegate to subagents. Offload heavy file scanning, log reading, and testing to Haiku-powered subagents.
- Pre-filter inputs. Never feed Claude raw system logs or full JSON payloads. Use bash,
jq, or hooks. - Keep CLAUDE.md lean. Under 5k tokens. Move domain-specific knowledge to on-demand skills.
- Model down. Sonnet default, Haiku helper, Opus for real reasoning only.
- Kill dormant MCP servers. Their schemas load into every prompt, whether you use them or not.
- Modular projects. One project per knowledge domain. Do not dump the entire company drive into one space.
- Plan and verify. Use
/planmode and provide explicit test cases up front. The cheapest bug is the one Claude never wrote. - End-of-day summary. Compact, save progress.md, clear. Tomorrow starts clean.
Frequently Asked Questions About Saving Claude Tokens
What is the single fastest way to save tokens on Claude?
Start fresh chats when topics change. This one habit reduces your compounding history cost more than any other single action. A 20-turn chat is not 20x more expensive than a one-turn chat – it is closer to 210x. Clearing breaks that curve.
Does Claude bill based on tokens or conversations?
On consumer plans (Pro, Max), you have a rolling usage budget measured in tokens, not messages. A long, context-heavy message costs more than a short one. On API / developer usage, you pay literal per-token input and output rates.
How is a token different from a word?
A token is roughly three-quarters of a word in English. The word “tokenization” is about three tokens. Common words are one token, rare words or code can be multiple tokens. Rough rule: 100 tokens is about 75 English words.
Does using Projects actually save tokens?
Yes, for two reasons. First, project knowledge is cached by Anthropic, so repeated reads are cheaper than pasting the same document every time. Second, project instructions eliminate the need to restate recurring rules, which means shorter prompts.
When should I use Opus vs. Sonnet vs. Haiku?
Use Haiku for trivial transformations, summaries, and helper tasks. Use Sonnet for most engineering, writing, and analysis work. Use Opus for architectural design, nuanced legal or financial reasoning, or when Sonnet visibly struggles with the problem. Defaulting to Sonnet and escalating as needed is the right posture for almost everyone.
What is the difference between a usage limit and a context window?
Usage limits are your time-based budget across all Claude surfaces. Hit it and Claude tells you to wait until reset. Context window is the maximum size of a single conversation. Fill it and a single chat degrades or refuses new turns. Different problems, different fixes.
Does the “caveman” output trick actually work?
For specific automation workflows where Claude’s output is being piped to another script, yes. It strips prose and reduces output tokens. For human-facing tasks, it makes the output unusable. Test before adopting.
How often should I run /compact in Claude Code?
Proactively, at logical milestones – after finishing a feature, before switching to tests, after closing a PR. Some community writeups suggest every 40 messages as a rule of thumb. Do not wait until the context bar turns red.
Do disabled MCP servers still consume tokens?
No. Disabling an MCP server removes its schema from your prompt. The cost is paid only when a server is actively connected. This is why keeping the active set small is one of the highest-impact Claude Code tips.
Is it worth upgrading to Max for token headroom?
It depends on your usage pattern, not your volume. If you are hitting limits because of habits (long sessions, unused MCP servers, Opus for everything), Max will delay the problem but not fix it. Fix the habits first. If you are still hitting limits after applying the top 10 tips in this guide, then Max is the right upgrade.
Related Claude Reading
These companion articles go deeper on specific points raised in this guide. Each one is written in the same tested, operator-grade style.
Sources and Further Reading
The 60 tips above are drawn from Anthropic’s official documentation, developer write-ups, and community posts. Official docs are authoritative. Community tips are worth testing on your own workflow before adopting permanently.
Official Anthropic documentation
- Usage limit best practices
- How do usage and length limits work?
- What are projects?
- Retrieval augmented generation (RAG) for projects
- Manage project visibility and sharing
- Get started with Claude Cowork
- Organize your tasks with projects in Claude Cowork
- What is the Team plan?
- Manage extra usage for Team and Enterprise plans
- Manage extra usage for paid Claude plans
- Manage costs effectively (Claude Code)
- Claude Code commands reference
- Best practices for Claude Code
Community write-ups
- Ryan Doser – 10 Claude Code usage tips that actually save tokens
- MindStudio – 18 Claude Code token management hacks
- Medium – 10 tips to stop burning your tokens in Claude Code
- GitHub Gist – How to save context tokens when using Claude Code
- everything-claude-code – token optimization notes
- drona23 – claude-token-efficient benchmark repo
- Reddit – reduced token use, what helped most
- Reddit – context usage lowering tips
- Reddit – compacting every two requests
Final Word: Pick Three, Start This Week
Sixty tips is a lot. Do not try to adopt all of them. Pick the three in your category that match how you actually use Claude today, run them for a week, and see what changes. For most people in Chat, that is plan-the-prompt, edit-instead-of-reply, and fresh-chat-on-topic-shift. For most Cowork users, that is shared projects, RAG over giant documents, and scheduled tasks instead of re-briefs. For most Claude Code users, that is /clear, subagents on Haiku, and a lean CLAUDE.md.
The rest of the 60 compound on top of those. Start small. Watch the usage meter. Adjust.

