Can you post the script that was used to quantize this model please?

#2
by ctranslate2-4you - opened

Can you post the script that was used to quantize this model please?

Unsloth AI org

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

This comment has been hidden

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

Can you please add documentation on -HOW- to do that to your website? There is nothing about this on your docs site: https://docs.unsloth.ai/
This was brought up on github, and the response was effectively: "go to discord". https://github.com/unslothai/unsloth/issues/972

None of the process is properly described anywhere in your docs. Some of your quantized models are 1 safe-tensor file... others are 2 or more.
Why is that the case? Don't know, because there is no explanation of what you are doing and how you are doing it and why you are doing it that way.

Unsloth AI org

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

Can you please add documentation on -HOW- to do that to your website? There is nothing about this on your docs site: https://docs.unsloth.ai/
This was brought up on github, and the response was effectively: "go to discord". https://github.com/unslothai/unsloth/issues/972

None of the process is properly described anywhere in your docs. Some of your quantized models are 1 safe-tensor file... others are 2 or more.
Why is that the case? Don't know, because there is no explanation of what you are doing and how you are doing it and why you are doing it that way.

If you use Unsloth you can quantize the models - if you save directly to 4bit. Will add a section for this but it does say in our Google Colab notebooks.

Quantized models are 1 safe-tensor file... others are 2 or more because it will be too large to download so you divide it into portions. It's what Hugging Face does as well.

Unsloth AI org

Can you post the script that was used to quantize this model please?

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

Can you please add documentation on -HOW- to do that to your website? There is nothing about this on your docs site: https://docs.unsloth.ai/
This was brought up on github, and the response was effectively: "go to discord". https://github.com/unslothai/unsloth/issues/972

None of the process is properly described anywhere in your docs. Some of your quantized models are 1 safe-tensor file... others are 2 or more.
Why is that the case? Don't know, because there is no explanation of what you are doing and how you are doing it and why you are doing it that way.

Example see our Google Colab notebook for Llama 3.2 here which allows to quantize your model: https://colab.research.google.com/drive/1T5-zKWM_5OD21QHwXHiV9ixTRR7k3iB9?usp=sharing

Example see our Google Colab notebook for Llama 3.2 here which allows to quantize your model:

Thanks!

@BallisticAI did you figure out how to load the model and get it working

@shimmyshimmer , Could you guide me on how to save a fine-tuned LoRA model for Llama-3.2-11B-Vision-Instruct in 4-bit precision for optimized inference, similar to the repository unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit?

Unsloth AI org

@shimmyshimmer , Could you guide me on how to save a fine-tuned LoRA model for Llama-3.2-11B-Vision-Instruct in 4-bit precision for optimized inference, similar to the repository unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit?

When you use Unsloth for finetuning there's a section in the notebook which allows you to do it. Select the Llama vision notebook here: https://docs.unsloth.ai/get-started/unsloth-notebooks

@shimmyshimmer , I tried the code below to save the model in 4-bit format, but it is still saving as a 16-bit file. When I checked the model-00001-of-00002.safetensors (5 GB) and model-00002-of-00002.safetensors (2.18 GB) in the unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit repo, they are saved in 4-bit format and have smaller sizes.

However, when I load the trained LoRA adapter and save it in merged_4bit or merged_4bit_forced, the weights are saved as 16-bit instead of 4-bit. I want to save the models in 4-bit format so I can run the model on a 15GB GPU in production. Could you please help me solve this issue?

I have also raised a similar issue on GitHub.
https://github.com/unslothai/unsloth/issues/1422

Screenshot 2024-12-20 162157.png

Sign up or log in to comment