Context and Memory Management

Effective interaction with Large Language Models (LLMs) requires understanding Context and Memory.

The Context Window

Every LLM has a Context Window, which is the maximum limit of tokens (text) it can process at one time. This includes:

System instructions
The current conversation history
Content of files you have asked the agent to read
The agent's own responses

If this limit is exceeded, the model may 'forget' earlier parts of the conversation or fail to process new information.

Managing Conversation History

As you chat, the history grows. To prevent overflow:

Be specific: Only read files relevant to the immediate task.
Use /compact: This command summarizes the current conversation history, keeping essential details while freeing up token space.
Start fresh: For unrelated tasks, it is often better to start a new session.

Persistent Memory

To avoid repeating information in every session (consuming context space), use the save_memory tool. This allows the agent to store specific facts or preferences (e.g., 'I prefer TypeScript over JavaScript', 'My API keys are in .env.local') into a long-term storage that persists across different sessions.