--- license: other base_model: meta-llama/Meta-Llama-3-8B tags: - ORPO - llama 3 8B - conversational datasets: - BramVanroy/ultra_feedback_dutch model-index: - name: ReBatch/Llama-3-8B-dutch results: [] language: - nl pipeline_tag: text-generation ---

Llama 3 8B - Dutch

A conversational model for Dutch, based on Llama 3 8B

Try chatting with the model!

This model is a [QLORA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) and [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer) fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the synthetic feedback dataset [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) ## Model description This model is a Dutch chat model, originally developed from Llama 3 8B and further refined through a feedback dataset with [ORPO](https://huggingface.co/docs/trl/main/en/orpo_trainer) and trained on [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) ## Intended uses & limitations Although the model has been aligned with gpt-4-turbo output, which has strong content filters, the model could still generate wrong, misleading, and potentially even offensive content. Use at your own risk. ## Training procedure The model was trained in bfloat16 with QLORA with flash attention 2 on one GPU - H100 80GB SXM5 for around 24 hours on RunPod. ## Evaluation Results The model was evaluated using [scandeval](https://scandeval.com/dutch-nlg/) The model showed mixed results across different benchmarks; it exhibited slight improvements on some while experiencing a decrease in scores on others. This occurred despite being trained on only 200,000 samples for a single epoch. We are curious to see whether its performance could be enhanced by training with more data or additional epochs. | Model| conll_nl | dutch_social | scala_nl | squad_nl | wiki_lingua_nl | mmlu_nl | hellaswag_nl | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------: meta-llama/Meta-Llama-3-8B-Instruct | 68.72 | 14.67 | 32.91 | 45.36 | 67.62 | 36.18 | 33.91 ReBatch/Llama-3-8B-dutch | 58.85 | 11.14 | 15.58 | 59.96 | 64.51 | 36.27 | 28.34 meta-llama/Meta-Llama-3-8B | 62.26 | 10.45| 30.3| 62.99| 65.17 | 36.38| 28.33 ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 2 - eval_batch_size: 2 - num_devices: 1 - gradient_accumulation_steps: 4 - optimizer: paged_adamw_8bit - lr_scheduler_type: linear - warmup_steps: 10 - num_epochs: 1.0 - r: 16 - lora_alpha: 32 - lora_dropout: 0.05