How to convert that to ggml?

#1
by RoversX - opened

Hello, this model looks great. Thanks. I was wondering how you convert the original model to ggml. I am trying to use Qlora to fine tune a model based on that and run it on my Mac. Is this possible? Thanks

You have to merge base model with fine tuned lora version, than based on that structure you can convert it to ggml format

s3nh changed discussion status to closed
s3nh changed discussion status to open

You have to merge base model with fine tuned lora version, than based on that structure you can convert it to ggml format

Thank you for providing a useful description of the combination of basic models with fine -tuning LoRa version and converting it to GGML format. I have solved the problem

sorry, I just was in a rush, so the answer is not really detailed at all.

To merge and unload:


    model = PeftModel.from_pretrained(base_model, args.peft_model_path, **device_arg)
    model = model.merge_and_unload()

where base_model is path to original implementation and peft_model path is a directory to your Lora tuned weights.
Then, you have to save it (together with tokenizer)

model.save_pretrained(f"{args.output_dir}")

and you are ready to convert it to ggml.

to convert it I am using llama.cpp
https://github.com/ggerganov/llama.cpp

In simplest case you can use lora_to_ggml.py with properly defined arguments

python  lora_to_ggml.py -m cached_model_path -l llama.cpp_path -i output_path

Thank you. I basically use the same method. I use the convert-pth-to-ggml.py to convert the model to ggml-model-f32.bin and then quantize it. Basically follow the instructions provided in llama.cpp. Thanks. πŸ‘

Using Colab

  1. Convert the model to ggml-model-f32.bin:

    !python3 convert-pth-to-ggml.py models/model-path/ 0
    
  2. Quantize the model:

    !./quantize ./models/model-path/ggml-model-f32.bin ./models/model-path/ggml-model-q4_0.bin q4_0
    

Thats great ti hear! Cool!

Sign up or log in to comment