---
model_creator: Nekochu
quantized_by: Nekochu
model_name: Llama-3.1 8B German ORPO
pretty_name: Llama-3.1 8B German ORPO
model_type: llama3.1
prompt_template: >-
Below is an instruction that describes a task. Write a response that
appropriately completes the request. ### Instruction: {Instruction} {summary} ### input: {category} ### Response: {prompt}
library_name: peft
license: llama3.1
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
- llama-factory
- lora
datasets:
- mayflowergmbh/intel_orca_dpo_pairs_de
- LeoLM/OpenSchnabeltier
- LeoLM/German_Songs
- LeoLM/German_Poems
- bjoernp/ultrachat_de
- mayflowergmbh/ultra-chat_de
- mayflowergmbh/airoboros-3.0_de
- mayflowergmbh/booksum_de
- mayflowergmbh/dolphin_de
- mayflowergmbh/evol-instruct_de
- mayflowergmbh/openschnabeltier_de
- mayflowergmbh/alpaca-gpt4_de
- mayflowergmbh/dolly-15k_de
- mayflowergmbh/oasst_de
language:
- de
- en
pipeline_tag: text-generation
task_categories:
- question-answering
- text2text-generation
- conversational
inference: True
model-index:
- name: Llama-3.1-8B-German-ORPO
results: []
---
- Fine-tuning of Llama-3.1-8B on german datasets. Same datasets used in [Nekochu/Llama-2-13B-German-ORPO](https://huggingface.co/Nekochu/Llama-2-13B-German-ORPO).
- I've (alway) kept LoRA `QLoRA_German-ORPO` so it can be applied to any *LLaMA-3.1-8B* fine-tuned model but may affect performance.
- Quants: exl2 [2.4bpw-h6](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/2.4bpw-h6), [4.25bpw-h6](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/4.25bpw-h6), [8.0bpw-h8](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/8.0bpw-h8) | [GGUF](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/gguf) Q4_K_M,IQ4_XS...
Oh, and I am not a GER speaker. ^^
This training can be replicated using LLaMA-Factory.
Stage A: SFT
```
set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage sft --do_train True --model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 1 --finetuning_type lora --template alpaca --rope_scaling linear --flash_attn fa2 --dataset_dir data --dataset ultrachat_de,airoboros_de,booksum_de,dolphin_de,evol_instruct_de,openschnabeltier_de,alpaca-gpt4_de,dolly_15k_de,oasst_de,bjoernp_ultrachat_de,German_Poems,German_Songs,OpenSchnabeltier --cutoff_len 8192 --learning_rate 5e-05 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 100 --save_steps 1000 --warmup_steps 1000 --neftune_noise_alpha 5 --optim adamw_8bit --packing True --neat_packing True --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Llama-3.1-8B-German --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.15 --lora_target all --use_adam_mini True --create_new_adapter True
```
Stage B: Continued, `orpo`
```
set CUDA_VISIBLE_DEVICES=0 && llamafactory-cli train --stage dpo --do_train True --model_name_or_path meta-llama/Meta-Llama-3.1-8B-Instruct --preprocessing_num_workers 1 --finetuning_type lora --template alpaca --rope_scaling linear --flash_attn fa2 --dataset_dir data --dataset fix_orca_dpo_de --cutoff_len 4000 --learning_rate 5e-05 --num_train_epochs 1.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 10 --save_steps 1000 --warmup_steps 0 --neftune_noise_alpha 5 --optim adamw_8bit --packing True --report_to none --output_dir saves\LLaMA3.1-8B-Chat\lora\Llama-3.1-8B-German-ORPO --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --quantization_bit 4 --quantization_method bitsandbytes --lora_rank 32 --lora_alpha 64 --lora_dropout 0.35 --lora_target all --pref_beta 0.1 --pref_ftx 0 --pref_loss orpo --adapter_name_or_path saves\LLaMA3.1-8B-Chat\lora\Llama-3.1-8B-German
```
Average training time: 5d sft, 6h dpo
dataset_info.json
`dataset_info.json`:
```json
"oasst_de": {
"hf_hub_url": "mayflowergmbh/oasst_de"
},
"dolly_15k_de": {
"hf_hub_url": "mayflowergmbh/dolly-15k_de"
},
"alpaca-gpt4_de": {
"hf_hub_url": "mayflowergmbh/alpaca-gpt4_de"
},
"openschnabeltier_de": {
"hf_hub_url": "mayflowergmbh/openschnabeltier_de"
},
"evol_instruct_de": {
"hf_hub_url": "mayflowergmbh/evol-instruct_de"
},
"dolphin_de": {
"hf_hub_url": "mayflowergmbh/dolphin_de"
},
"booksum_de": {
"hf_hub_url": "mayflowergmbh/booksum_de"
},
"airoboros_de": {
"hf_hub_url": "mayflowergmbh/airoboros-3.0_de"
},
"ultrachat_de": {
"hf_hub_url": "mayflowergmbh/ultra-chat_de"
},
"German_Songs": {
"file_name": "German_Songs.json",
"file_sha1": "3ec36066a19debd1b138020b293e05f21264c352",
"columns": {
"prompt": "prompt",
"query": "analysis_prompt",
"response": "song",
"history": "analysis",
"system": "topic"
}
},
"German_Poems": {
"file_name": "German_Poems.json",
"file_sha1": "f0f4bbea3b8cbc378afb640f4ff4dcd11132263c",
"columns": {
"prompt": "prompt",
"query": "topic",
"response": "poem"
}
},
"bjoernp_ultrachat_de": {
"file_name": "ultrachat_de.json",
"file_sha1": "4e2b6dba1c387b3fa439c33ab35281403c39e973",
"formatting": "sharegpt",
"columns": {
"messages": "conversations"
},
"tags": {
"role_tag": "from",
"content_tag": "value",
"user_tag": "human",
"assistant_tag": "gpt",
"system_tag": "system"
}
},
"OpenSchnabeltier": {
"file_name": "OpenSchnabeltier.json",
"columns": {
"prompt": "instruction_de",
"response": "output_de"
}
},
"fix_orca_dpo_de": {
"file_name": "fix_intel_orca_dpo_pairs_de.json",
"ranking": true,
"columns": {
"prompt": "instruction",
"query": "input",
"chosen": "chosen",
"rejected": "rejected"
}
}
}
```
Additionally, the locally converted `.json` dataset is in the branch [dataset-reformat](https://huggingface.co/Nekochu/Llama-3.1-8B-German-ORPO/tree/dataset-reformate).
Output Examples
```
#Question:
Wie geht es in diesem Absatz weiter? Dann reibt sie eine Nadel auf einem Wattebausch, schiebt ihn dann auf einen Bleistift und wickelt einen Faden darum. Dann hält sie eine Schachtel mit einem Produkt hoch und gießt dann mehrere Flüssigkeiten in eine Schüssel. sie Wählen Sie Ihre Antwort aus: A. Fügt einen Topf hinzu und schüttelt das Produkt in einer Mühle. B. kneift den Faden, um eine Zigarette zu stylen, und geht dann weg. Dann taucht C. die Nadel in Tinte und zeichnet mit dem Bleistift ein Motiv auf ihr Bein, das sie am Ende mit einem Lappen abreibt. D. beginnt, ihre Haare zu stylen und schneidet sie mehrmals, bevor sie die Spitzen scheitelt, um die Frisur zu zeigen, die sie kreiert hat.
#Only Llama3 3.1 (wrong) - Llama3 Template:
Die richtige Antwort ist B.
#Model SFT GER (wrong) - Alpaca Template:
Es ist unklar, welche Handlung sie als nächstes kommt, da der Absatz zu Ende geht.
#Model SFT+orpo GER (correct) - Alpaca Template, linear RoPE Scaling:
C. taucht die Nadel in Tinte und zeichnet mit dem Bleistift ein Motiv auf ihr Bein, das sie am Ende mit einem Lappen abreibt.
```
Note: Output from inference [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) (and exl2-8bpw). Source question: [mayflowergmbh/intel_orca_dpo_pairs_de](https://huggingface.co/datasets/mayflowergmbh/intel_orca_dpo_pairs_de)
Eval English
[MMLU-Pro](https://github.com/chigkim/Ollama-MMLU-Pro)[*](https://pastebin.com/a8xRqXtg) (en):
| Model | Overall Accuracy | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other |
|----------------------------------|----------------------|---------|----------|-----------|------------------|-----------|-------------|--------|---------|------|-------|------------|---------|------------|-------|
| Llama-3.1-8B-German-ORPO-8.0bpw-h8-exl2 | 38.83 | 60.81 | 37.26 | 32.86 | 38.78 | 46.33 | 23.32 | 45.48 | 39.90 | 21.62 | 38.86 | 34.67 | 28.79 | 50.63 | 44.26 |
| Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
Note: Lower on Benchmark for **English**, seems to be degraded as trade-off. Not frequently but the output repeats sentences (because of the wrong chat template).