quantization

The process of reducing the numerical precision of model weights (e.g., from float32 to int8) to decrease memory footprint and improve inference speed, often with minimal accuracy loss.