mradermacher/model_requests · Guilherme34/Samantha-roleplay-ptbrv2-model quantize that please

but you can put together(merge) the lora model and the quantize right? can you do this Guilherme34/Samantha-roleplay-ptbr-v2 this is the lora, its writted without "-model"

mradermacher

Owner Jul 21

I probably could, but it's currently outside my area of expertise. If you want to do it yourself, or find somebody else to do so, I'd happily quantize the resulting model.

Guilherme34

Jul 21

i have a colab to do this, im gonna send you, i just cant do myself because i dont have the amount of ram nescessary

mradermacher

Owner Jul 21

i promise to look into what you send, but i can't promise anything else

Guilherme34

Jul 21

https://colab.research.google.com/drive/19e3QwBnzltxMOREzQT1giyyGXAJQnSSr?usp=sharing

Guilherme34

Jul 21

•

edited Jul 21

i have prepared everything to you its literally execute everything in order, the last thing, you need to put your own hf_api token

mradermacher

Owner Jul 21

Ok, I never worked with google collab, but I gather it's just python code. I see how far I get.

nicoboss

Jul 21

I'm also running it at the moment. Works very well so far. I just downloaded the collab it as python file, commented out !pip and manually installed all the required dependencies.

mradermacher

Owner Jul 21

Says out of GPU memory here. How much GPU memory is needed? Maybe you seriously overestimate the hardware I have at my disposal :)

However, I was under the impression you wouldn't have to load the complete model to do a lora merge.

nicoboss

Jul 21

I'm currently experiencing the same issue:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU

But it's only using my first RTX 4090. If you could make it use all available GPUs it would have way more GPU memory available.
With all GPUs combined I would have 66 GB GPU memory while with it only using a single GPU it only has 24 GB GPU memory which apparently isn't enough.

nicoboss

Jul 21

I believe I managed to successfully run it by setting device_map="auto" so it uses all available GPUs:

root@AI:~/merge# venv/bin/python merge.py 
Loading checkpoint shards: 100%|______________________________________________________________________________________________| 3/3 [00:06<00:00,  2.06s/it]
adapter_config.json: 100%|_________________________________________________________________________________________________| 743/743 [00:00<00:00, 13.9MB/s]
adapter_model.safetensors: 100%|_______________________________________________________________________________________| 1.16G/1.16G [00:20<00:00, 55.9MB/s]
tokenizer_config.json: 100%|_______________________________________________________________________________________________| 713/713 [00:00<00:00, 15.0MB/s]
tokenizer.model: 100%|___________________________________________________________________________________________________| 500k/500k [00:00<00:00, 1.37MB/s]
tokenizer.json: 100%|__________________________________________________________________________________________________| 1.84M/1.84M [00:00<00:00, 3.98MB/s]
special_tokens_map.json: 100%|_____________________________________________________________________________________________| 411/411 [00:00<00:00, 8.80MB/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.

Now I just need to figure out the HuggingFace uploading part but seams relatively easy.

nicoboss

Jul 21

It's now uploading to https://huggingface.co/nicoboss/Samantha-roleplay-ptbr-v2. It should be done in approximately half an hour.

Guilherme34

Jul 21

ohhh brooo, thanks nico!!!

Guilherme34

Jul 21

im waiting for this😊

Guilherme34

Jul 21

maybe we can have a partnership, i have vision models too, you can use in my server if you want, lets talk in discord, what is your discord?

nicoboss

Jul 21

It's uploaded! @mradermacher please add https://huggingface.co/nicoboss/Samantha-roleplay-ptbr-v2 to the queue.

maybe we can have a partnership, i have vision models too, you can use in my server if you want, lets talk in discord, what is your discord?

Feel free to add me on Discord. My Discord username is "nicobosshard". Now that I figured out this merging thing doing so again in the future will be relatively easy for me.

mradermacher

Owner Jul 22

Should be quantized in a few hours. Cheers :)