Token Efficiency Optimization | Gemini CLI Power User

Why Token Efficiency Matters

In the context of Large Language Models (LLMs) and CLI agents, tokens are the basic units of text processing. Every input prompt and output response consumes tokens. Efficient token usage is critical for three main reasons: cost, latency, and context window limits.

Minimizing Output with Quiet Flags

Shell commands often produce verbose output. When using run_shell_command, this output is fed back into the agent's context. To avoid wasting tokens on progress bars, warnings, or extensive logs, use quiet flags or redirection.

Quiet Flags: Use -q, --quiet, or --silent where available (e.g., npm install --silent).
Redirection: Redirect stdout or stderr to /dev/null if the output is not needed (e.g., command > /dev/null 2>&1), or to a temporary file if you only need to inspect a part of it later.

Efficient Tool Selection

Choosing the right tool can drastically reduce token consumption.

glob vs. Shell: Instead of running find or ls -R and parsing thousands of lines, use the glob tool to get a precise list of matching files.
ripgrep (search_file_content) vs. grep: The search_file_content tool is optimized to limit output lines, whereas a raw grep command might return thousands of matches.

Managing Context

The context window is the amount of text the model can "remember" at once. If it fills up with irrelevant command outputs or repetitive errors, the agent may "forget" earlier instructions. Keep prompts concise and ensure tool outputs are strictly relevant to the task at hand.