Knowledge Distillation

The process of transferring knowledge from a huge, complex model to a smaller, faster one.

What it means

Distillation is like a teacher (the large model) creating a condensed study guide for a student (the small model). The small model tries to mimic the output of the large model, learning to give the same answers without needing the same massive brainpower.

Why it matters

It allows us to have our cake and eat it too: the smarts of a giant model packed into a fast, cheap model that can run anywhere.