What it is
The context window is the LLM's working memory for a single request: system + user messages, tool results, and whatever the model generates.
Why it matters
- It limits how much text you can send in one go.
- Longer contexts can increase cost and latency.
- Not everything in the context gets equal attention.
Practical tips
- Summarize and compress old context.
- Retrieve only the most relevant chunks (RAG) instead of dumping everything.
