How I Made BIELIK 11B v2.3 Run on Half the Memory. Quantized Models

February 12, 2025

What are quantized models?

Quantized models are models that went through the process of quantization.

Quantization in simple words reduces resource usage with minimal accuracy loss.

How does it work?

It uses lower-precision numerical formats.

I chose Q8_0 quantization, which:

Uses 8-bit integer precision instead of 16-bit floating point (FP16).
Requires only (min) ~12GB of VRAM, half of the full FP16 model’s (min) 24GB.
Maintains nearly identical accuracy, making it an efficient alternative.

For most tasks, Q8_0 works almost as well as the full FP16 model but uses much fewer resources.

If you don’t have enough server resources to run the full version of BIELIK AI v2.3, you can use the quantized version of the model.

Reach out to me! Find me on linkedin!

Want to stay updated? Join my newsletter and get a weekly report on the most exciting industry news! 🚀