Quantization

Compressing an AI model to make it smaller and faster with minimal loss in quality.

What it means

Quantization is like lowering the resolution of an image to save space. It reduces the precision of the model's parameters (e.g., from 16-bit numbers to 4-bit numbers). The model becomes much lighter and can run on consumer laptops instead of massive servers.

Why it matters

It is essential for running AI locally on your phone or computer. It allows powerful models to run offline without needing an internet connection to a cloud giant.