Latency (Time to First Token)

The delay between sending a request and seeing the first word of the response.

What it means

Latency in AI is the wait time. It's usually measured as 'Time to First Token' (TTFT) - how long you stare at a spinning circle before the text starts streaming.

Why it matters

Speed is user experience. For a chatbot, high latency feels sluggish and annoying. For a voice assistant, even a second of delay ruins the illusion of a natural conversation.