Effective interaction with Large Language Models (LLMs) requires understanding Context and Memory.
The Context Window
Every LLM has a Context Window, which is the maximum limit of tokens (text) it can process at one time. This includes:
- System instructions
- The current conversation history
- Content of files you have asked the agent to read
- The agent's own responses
If this limit is exceeded, the model may 'forget' earlier parts of the conversation or fail to process new information.
Managing Conversation History
As you chat, the history grows. To prevent overflow:
- Be specific: Only read files relevant to the immediate task.
- Use
/compact: This command summarizes the current conversation history, keeping essential details while freeing up token space. - Start fresh: For unrelated tasks, it is often better to start a new session.
Persistent Memory
To avoid repeating information in every session (consuming context space), use the save_memory tool. This allows the agent to store specific facts or preferences (e.g., 'I prefer TypeScript over JavaScript', 'My API keys are in .env.local') into a long-term storage that persists across different sessions.