Tokens

The chunks of text an LLM reads and generates; cost and limits are usually token-based.

BasicsLLMsSubword tokens

What it is

Tokens are how most LLMs represent text internally. A token is often a subword piece (not always a whole word).

Why it matters

  • Pricing is typically per input/output token.
  • Model limits (context window) are counted in tokens.
  • Tokenization changes between models.

Rules of thumb

  • Short words may be 1 token; long or rare strings may split into many tokens.
  • Code and structured text can tokenize differently than prose.