Context window

The maximum amount of tokens an LLM can consider at once (input + output).

What it is

The context window is the LLM's working memory for a single request: system + user messages, tool results, and whatever the model generates.

Why it matters

  • It limits how much text you can send in one go.
  • Longer contexts can increase cost and latency.
  • Not everything in the context gets equal attention.

Practical tips

  • Summarize and compress old context.
  • Retrieve only the most relevant chunks (RAG) instead of dumping everything.