There are a lot of quantization strategies to scale back the scale of huge language fashions (LLM). Not too long ago, higher low-bit quantization strategies have been proposed. As an example, AQLM achieves 2-bit quantization whereas preserving a lot of the mannequin’s accuracy.