HFLeoArthurYounes

company

AI & ML interests

None defined yet.

Recent Activity

HFLAY's activity

ArthurZ 
posted an update about 1 month ago
ybelkada 
posted an update 4 months ago
ybelkada 
posted an update 4 months ago
ybelkada 
posted an update 10 months ago
view post
Post
Check out quantized weights from ISTA-DAS Lab directly in their organisation page: https://huggingface.co/ISTA-DASLab ! With official weights of AQLM (for 2bit quantization) & QMoE (1-bit MoE quantization)

Read more about these techniques below:

AQLM paper: Extreme Compression of Large Language Models via Additive Quantization (2401.06118)
QMoE: QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)

Some useful links below:

AQLM repo: https://github.com/Vahe1994/AQLM
How to use AQLM & transformers: https://huggingface.co/docs/transformers/quantization#aqlm
How to use AQLM & PEFT: https://huggingface.co/docs/peft/developer_guides/quantization#aqlm-quantizaion

Great work from @BlackSamorez and team !
ArthurZ 
posted an update 10 months ago
ArthurZ 
posted an update 10 months ago
view post
Post
Just when I was about to go to bed....... Here we go again
ybelkada 
posted an update 10 months ago
view post
Post
Try out Mixtral 2-bit on a free-tier Google Colab notebook right now!

https://colab.research.google.com/drive/1-xZmBRXT5Fm3Ghn4Mwa2KRypORXb855X?usp=sharing

AQLM method has been recently introduced on transformers main branch

The 2bit model can be found here: BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf-test-dispatch

And you can read more about the method here: https://huggingface.co/docs/transformers/main/en/quantization#aqlm

Great work @BlackSamorez and team!
·