Guilherme34/Samantha-roleplay-ptbrv2-model quantize that please

#159
by Guilherme34 - opened

queued and on the way (hopefully :). should be done in no time

mradermacher changed discussion status to closed

ah, it already is quantized, I don't think that will work with llama.

Yes, unfortunately, llama doe snot support integer tensors.

ohhh, ok

but you can put together(merge) the lora model and the quantize right? can you do this Guilherme34/Samantha-roleplay-ptbr-v2 this is the lora, its writted without "-model"

I probably could, but it's currently outside my area of expertise. If you want to do it yourself, or find somebody else to do so, I'd happily quantize the resulting model.

i have a colab to do this, im gonna send you, i just cant do myself because i dont have the amount of ram nescessary

i promise to look into what you send, but i can't promise anything else

i have prepared everything to you its literally execute everything in order, the last thing, you need to put your own hf_api token

Ok, I never worked with google collab, but I gather it's just python code. I see how far I get.

I'm also running it at the moment. Works very well so far. I just downloaded the collab it as python file, commented out !pip and manually installed all the required dependencies.

Says out of GPU memory here. How much GPU memory is needed? Maybe you seriously overestimate the hardware I have at my disposal :)

However, I was under the impression you wouldn't have to load the complete model to do a lora merge.

I'm currently experiencing the same issue:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU

But it's only using my first RTX 4090. If you could make it use all available GPUs it would have way more GPU memory available.
With all GPUs combined I would have 66 GB GPU memory while with it only using a single GPU it only has 24 GB GPU memory which apparently isn't enough.

I believe I managed to successfully run it by setting device_map="auto" so it uses all available GPUs:

root@AI:~/merge# venv/bin/python merge.py 
Loading checkpoint shards: 100%|______________________________________________________________________________________________| 3/3 [00:06<00:00,  2.06s/it]
adapter_config.json: 100%|_________________________________________________________________________________________________| 743/743 [00:00<00:00, 13.9MB/s]
adapter_model.safetensors: 100%|_______________________________________________________________________________________| 1.16G/1.16G [00:20<00:00, 55.9MB/s]
tokenizer_config.json: 100%|_______________________________________________________________________________________________| 713/713 [00:00<00:00, 15.0MB/s]
tokenizer.model: 100%|___________________________________________________________________________________________________| 500k/500k [00:00<00:00, 1.37MB/s]
tokenizer.json: 100%|__________________________________________________________________________________________________| 1.84M/1.84M [00:00<00:00, 3.98MB/s]
special_tokens_map.json: 100%|_____________________________________________________________________________________________| 411/411 [00:00<00:00, 8.80MB/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

Now I just need to figure out the HuggingFace uploading part but seams relatively easy.

It's now uploading to https://huggingface.co/nicoboss/Samantha-roleplay-ptbr-v2. It should be done in approximately half an hour.

ohhh brooo, thanks nico!!!

im waiting for this😊

maybe we can have a partnership, i have vision models too, you can use in my server if you want, lets talk in discord, what is your discord?

It's uploaded! @mradermacher please add https://huggingface.co/nicoboss/Samantha-roleplay-ptbr-v2 to the queue.

maybe we can have a partnership, i have vision models too, you can use in my server if you want, lets talk in discord, what is your discord?

Feel free to add me on Discord. My Discord username is "nicobosshard". Now that I figured out this merging thing doing so again in the future will be relatively easy for me.

Should be quantized in a few hours. Cheers :)

Sign up or log in to comment