Can you post the script that was used to quantize this model please?

by ctranslate2-4you - opened Sep 26

Discussion

ctranslate2-4you

Sep 26

Can you post the script that was used to quantize this model please?

shimmyshimmer

Unsloth AI org Sep 28

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

benTow07

Oct 6

This comment has been hidden

BallisticAI

Oct 9

•

edited Oct 9

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

Can you please add documentation on -HOW- to do that to your website? There is nothing about this on your docs site: https://docs.unsloth.ai/
This was brought up on github, and the response was effectively: "go to discord". https://github.com/unslothai/unsloth/issues/972

None of the process is properly described anywhere in your docs. Some of your quantized models are 1 safe-tensor file... others are 2 or more.
Why is that the case? Don't know, because there is no explanation of what you are doing and how you are doing it and why you are doing it that way.

shimmyshimmer

Unsloth AI org Oct 9

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

Can you please add documentation on -HOW- to do that to your website? There is nothing about this on your docs site: https://docs.unsloth.ai/
This was brought up on github, and the response was effectively: "go to discord". https://github.com/unslothai/unsloth/issues/972

None of the process is properly described anywhere in your docs. Some of your quantized models are 1 safe-tensor file... others are 2 or more.
Why is that the case? Don't know, because there is no explanation of what you are doing and how you are doing it and why you are doing it that way.

If you use Unsloth you can quantize the models - if you save directly to 4bit. Will add a section for this but it does say in our Google Colab notebooks.

Quantized models are 1 safe-tensor file... others are 2 or more because it will be too large to download so you divide it into portions. It's what Hugging Face does as well.

shimmyshimmer

Unsloth AI org Oct 9

Can you post the script that was used to quantize this model please?

Can you post the script that was used to quantize this model please?

You can quantize automatically using our unsloth github package

Can you please add documentation on -HOW- to do that to your website? There is nothing about this on your docs site: https://docs.unsloth.ai/
This was brought up on github, and the response was effectively: "go to discord". https://github.com/unslothai/unsloth/issues/972

None of the process is properly described anywhere in your docs. Some of your quantized models are 1 safe-tensor file... others are 2 or more.
Why is that the case? Don't know, because there is no explanation of what you are doing and how you are doing it and why you are doing it that way.

Example see our Google Colab notebook for Llama 3.2 here which allows to quantize your model: https://colab.research.google.com/drive/1T5-zKWM_5OD21QHwXHiV9ixTRR7k3iB9?usp=sharing

BallisticAI

Oct 12

Example see our Google Colab notebook for Llama 3.2 here which allows to quantize your model:

Thanks!

yashlanjewar20

Oct 21

@BallisticAI did you figure out how to load the model and get it working

sabaridsnfuji

3 days ago

@shimmyshimmer , Could you guide me on how to save a fine-tuned LoRA model for Llama-3.2-11B-Vision-Instruct in 4-bit precision for optimized inference, similar to the repository unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit?

shimmyshimmer

Unsloth AI org 3 days ago

@shimmyshimmer , Could you guide me on how to save a fine-tuned LoRA model for Llama-3.2-11B-Vision-Instruct in 4-bit precision for optimized inference, similar to the repository unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit?

When you use Unsloth for finetuning there's a section in the notebook which allows you to do it. Select the Llama vision notebook here: https://docs.unsloth.ai/get-started/unsloth-notebooks

sabaridsnfuji

3 days ago

@shimmyshimmer , I tried the code below to save the model in 4-bit format, but it is still saving as a 16-bit file. When I checked the model-00001-of-00002.safetensors (5 GB) and model-00002-of-00002.safetensors (2.18 GB) in the unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit repo, they are saved in 4-bit format and have smaller sizes.

However, when I load the trained LoRA adapter and save it in merged_4bit or merged_4bit_forced, the weights are saved as 16-bit instead of 4-bit. I want to save the models in 4-bit format so I can run the model on a 15GB GPU in production. Could you please help me solve this issue?

I have also raised a similar issue on GitHub.
https://github.com/unslothai/unsloth/issues/1422

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment