iocuydi/llama-2-amharic-3784m · How to download and use the model.

Jan 26, 2024

Hello, does anyone have a snippet of python code on how to download and use the model? OR anything that shows you the procedures to use the model.

iocuydi

Owner Jan 26, 2024

Hello, you can download the model with git LFS and then run it using the inference script in the github repo.

Accept Llama2 license and download Llama2 weights
Download the amharic finetune from this repository as shown here https://huggingface.co/docs/hub/models-downloading
Clone the github repo and put your path to llama2 and the peft model into the inference script here: https://github.com/iocuydi/amharic-llama-llava/blob/main/inference/run_inf.py

abdimussa87

Jan 26, 2024

What is the peft model?

abdimussa87

Jan 26, 2024

This line doesn't seem to import inside the run_inf.py file:

from model_utils import load_model, load_peft_model

I can't find the model_utils file anywhere in the github repo

iocuydi

Owner Jan 26, 2024

Added that file to the github repo.

Peft stands for "Parameter Efficient Fine Tuning." It allows large models to be finetuned more easily, more about it here: https://huggingface.co/blog/peft
With this and most llama finetunes, you'll load the original llama weights, and then a smaller set of Peft weights from the finetune.

abdimussa87

Jan 26, 2024

•

edited Jan 26, 2024

Thank you for doing that. So I did the following as you described:

Downloaded the llama-2-7b model using the download.sh script
Downloaded this amharic model using git lfs from hugging face
Cloned the github repository and put the path to the llama model in the run_inf.py file

Questions:

Where do I use the amharic model I downloaded from here (step 2 above)
What is the below path exactly
peft_model = '/path/to/checkpoint'
How do I change the Llama-2 tokenizer with the Llama-2-Amharic tokenizer.

Thank you.

iocuydi

Owner Jan 27, 2024

Forgot to mention you need to convert llama2 to huggingface format as with this: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

The "main_path" param should point at the directory with the llama weights after they are converted to huggingface format.
The peft model path is the path to the finetuned checkpoint. Without it loading a checkpoint, you're just using the original Llama2. This path should point to a directory containing the files downloaded from this hf repository (the fine tuned weights).
Replace the tokenizer files that come with Llama2 with the tokenizer files from this repository.

abdimussa87

Jan 27, 2024

•

edited Jan 27, 2024

Thank you!! Regarding the tokenizer files, would replacing only the tokenizer.model file work? I tried that and it does respond with Amharic. Though not sure if replacing the remaining files improve its output.

iocuydi

Owner Jan 27, 2024

You should replace all the applicable tokenizer files with ours. A couple other tips for prompting:
-Try different system prompts (the initial instruction about being an Amharic assistant) but keep the system prompt in English
-Experiment with different hyper parameters depending on the task, higher top k/temperature can give more varied and creative answers but also more chance of hallucinations and wrong answers.

abdimussa87

Jan 28, 2024

Thanks for the tips.
I was thinking of continuing the pre-training with more amharic data. Unfortunately, I wasn't really able to find good resources on how to do that. Can you please recommend some helpful resources to achieve that?

iocuydi

Owner Jan 30, 2024

The scripts in the github repo can be used for pretraining and finetuning. Unless you have a massive amount of Amharic data (billions of tokens), doing additional pretraining likely will not help much, and finetuning would be a more effective strategy. You can also check out the Chinese Llama Alpaca paper/repo for more details, much of this work was based on that.

abdimussa87

Jan 30, 2024

Alright, thanks a lot for your support!!

abdimussa87

Feb 2, 2024

One more thing. So I tried to finetune the model on top of loading the gari model using peft. Then, when I try to run inference by loading both the gari peft and my finetuned peft one after another and try to ask a question, it no longer gives an answer it previously replied correctly. Like if I ask "what medicine should I take if I have a flu" it answers well on the gari peft, but outputs giberrish on the one that loads both the gari peft and the newer finetuned peft.

MAIN_PATH = '/model/Llama-2-7b-hf'
peft_model = '/model/llama-2-amharic-3784m'
#newer finetuned version on top of the garri model
peft_model2 = '/home/user/model/output'

model = load_model(model_name, quantization)
model = load_peft_model(model, peft_model)
model = load_peft_model(model, peft_model2)

Is the way I'm loading both peft models correct?

iocuydi

Owner Feb 7, 2024

Only load one peft model. If you load another you're replacing the weights of the first one, they aren't meant to be mixed. In general you will load a single base llama model, and optionally a single peft model.

For your case, it sounds like you should follow these steps:

load Llama2 with my peft model, then finetune
After training, load Llama2 with your peft model, perform inference, additional finetuning, etc.

If your model isn't performing as expected, there may be an issue with your dataset or training process. One way to debug is to first try a very simple dataset of a couple thousand identical items (all the same training example) and see if you can get the model to overfit and get 0 loss on this and inference properly, before moving on to the actual dataset.

AbelAI

Nov 18, 2024

•

edited Dec 22, 2024

anyone who can explain the steps one by one in detail with the file structure of all folders and files? I have completed step 1 and 2(which are Accept Llama2 license and download Llama2 weights and Download the amharic finetune from this repository as shown here https://huggingface.co/docs/hub/models-downloading). but for the 3rd one, when I try to convert llama to hf file, the tokenizer.model was not available in llama-2-7b . but it was available in llama folder. I tries to replicate that file to llama-2-7b, nothing works.

abdimussa87

Nov 18, 2024

•

edited Nov 18, 2024

Not sure if lots have changed, but this method worked for me back in january:-

1.Accept Llama2 license on huggingface and download it like this:
git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-hf
2. Download the amharic finetune from huggingface like this:
git lfs install
git clone https://huggingface.co/iocuydi/llama-2-amharic-3784m
3. Clone this github repository: https://github.com/iocuydi/amharic-llama-llava
4. Then inside inference/run_inf.py:
comment the import safety_utils line
change the MAIN_PATH to the path to folder you downloaded from step 1
change the peft_model to the path you cloned in the step 2
Go to your llama2 folder(from step 1) and replace the tokenizer.model file with the one you find from the 2nd step
set quanitzation=True inside the main function before the load_model function call
5. Finally run the inference/run_inf.py file

iocuydi

Owner Nov 21, 2024

@AbelAI Is there a specific error message you're getting? Confused about what specifically is failing for you when it doesn't work.

Also note that you only need to convert the original llama2 weights to hf. The amharic models are already in the proper format and won't need conversion.

AbelAI

Nov 22, 2024

•

edited Nov 22, 2024

@iocuydi i was little bit confused to follow the discussion that why I have asked about the file structure. but now thanks to @abdimussa87 it is clear. one more question, is the size of https://huggingface.co/meta-llama/Llama-2-7b-hf more that 14 GIB? I was trying to test the model using on google colab, since the size is large am unable to download it completely. here is the error I am facing

and without those files the model does it work.

how much space do I need to run the this Amharic model? any alternative way of using this model?

ghost2025

20 days ago

•

edited 20 days ago

Do not waste time. This is entirely pointless.
The output example is as follows: