Why Token Efficiency Matters
In the context of Large Language Models (LLMs) and CLI agents, tokens are the basic units of text processing. Every input prompt and output response consumes tokens. Efficient token usage is critical for three main reasons: cost, latency, and context window limits.
Minimizing Output with Quiet Flags
Shell commands often produce verbose output. When using run_shell_command, this output is fed back into the agent's context. To avoid wasting tokens on progress bars, warnings, or extensive logs, use quiet flags or redirection.
- Quiet Flags: Use
-q,--quiet, or--silentwhere available (e.g.,npm install --silent). - Redirection: Redirect stdout or stderr to
/dev/nullif the output is not needed (e.g.,command > /dev/null 2>&1), or to a temporary file if you only need to inspect a part of it later.
Efficient Tool Selection
Choosing the right tool can drastically reduce token consumption.
- glob vs. Shell: Instead of running
findorls -Rand parsing thousands of lines, use theglobtool to get a precise list of matching files. - ripgrep (search_file_content) vs. grep: The search_file_content tool is optimized to limit output lines, whereas a raw grep command might return thousands of matches.
Managing Context
The context window is the amount of text the model can "remember" at once. If it fills up with irrelevant command outputs or repetitive errors, the agent may "forget" earlier instructions. Keep prompts concise and ensure tool outputs are strictly relevant to the task at hand.