Finetune wtih QLoRA please

#14
by supercharge19 - opened

Quantizing the large (40 or even 7b) model on 4bit will help community a lot. And please fine tune it with large code database and on Wizard-Vicuna, Mega and other big chat databases as well so it can produce code during chat even.

I successfully quantizing it with Qlora with using bitsandbytes package.

Activated it with bitsandbytes config which select "nf4" Quant type + load_in_4bit + bfloat16. It allows to run on single A100 40 GB VRAM. The quantized version is run on the fly in jupyter lab without manually exported/saved it the new model.

@Ichsan2895 How is its performance?

@Ichsan2895 : Please share the finetuning and evaluation code if possible.

how many tokens/sec? aprox

Technology Innovation Institute org

You can check out the FalconTune package from the community as well :).

how many tokens/sec? aprox

I Ran it from cloud environment with Single A6000 48 GB VRAM. Falcon-40B ran with 1-2 tokens/sec

@Ichsan2895 : Please share the finetuning and evaluation code if possible.

Sorry, I never do fine tuning with new dataset. Just interference it with question to see the answer :)

Quantizing the large (40 or even 7b) model on 4bit will help community a lot. And please fine tune it with large code database and on Wizard-Vicuna, Mega and other big chat databases as well so it can produce code during chat even.

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
https://huggingface.co/blog/falcon

Colab Falcon fine tuning with QLoRA and Guanaco Dataset
https://colab.research.google.com/drive/1BiQiw31DT7-cDp1-0ySXvvhzqomTdI-o?usp=sharing

Sign up or log in to comment