---
license: other
base_model: meta-llama/Meta-Llama-3-8B
tags:
- ORPO
- llama 3 8B
- conversational
datasets:
- BramVanroy/ultra_feedback_dutch
model-index:
- name: ReBatch/Llama-3-8B-dutch
results: []
language:
- nl
pipeline_tag: text-generation
---
This model is a [QLORA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) and [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer) fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the synthetic feedback dataset [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch)
## Model description
This model is a Dutch chat model, originally developed from Llama 3 8B and further refined through a feedback dataset with [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer) and trained on [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch)
## Intended uses & limitations
Although the model has been aligned with gpt-4-turbo output, which has strong content filters, the model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk.
## Training procedure
The model was trained in bfloat16 with QLORA with flash attention 2 on one GPU - H100 80GB SXM5 for around 24 hours on RunPod.
## Evaluation Results
The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/)
The model showed mixed results across different benchmarks; it exhibited slight improvements on some while experiencing a decrease in scores on others. This occurred despite being trained on only 200,000 samples for a single epoch. We are curious to see whether its performance could be enhanced by training with more data or additional epochs.
| Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:
meta-llama/Meta-Llama-3-8B-Instruct | 68.72 | 14.67 | 32.91 | 45.36 | 67.62 | 36.18 | 33.91
ReBatch/Llama-3-8B-dutch | 58.85 | 11.14 | 15.58 | 59.96 | 64.51 | 36.27 | 28.34
meta-llama/Meta-Llama-3-8B | 62.26 | 10.45| 30.3| 62.99| 65.17 | 36.38| 28.33
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 2
- eval_batch_size: 2
- num_devices: 1
- gradient_accumulation_steps: 4
- optimizer: paged_adamw_8bit
- lr_scheduler_type: linear
- warmup_steps: 10
- num_epochs: 1.0
- r: 16
- lora_alpha: 32
- lora_dropout: 0.05