--- language: - nl license: mit tags: - trl - fietje - alignment-handbook - dpo base_model: BramVanroy/fietje-2-instruct datasets: - BramVanroy/ultra_feedback_dutch_cleaned - BramVanroy/orca_dpo_pairs_dutch_cleaned pipeline_tag: text-generation inference: false model-index: - name: fietje-2-chat results: [] ---
This is the chat version of Fietje, a DPO-tuned (aligned) continuation on [the instruct version](https://huggingface.co/BramVanroy/fietje-2-instruct). Fietje is an adapated version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), tailored to Dutch text generation by training on 28B tokens. It is small and efficient with a size of 2.7 billion parameters while performing almost on par with more powerful Dutch LLMs of twice its size like [GEITje 7B Ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra). A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje). ## Intended uses & limitations The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk! ## Training and evaluation data Fietje 2 Chat was finetuned from [the instruct model](https://huggingface.co/BramVanroy/fietje-2-instruct) on the following datasets. Number of training samples per dataset given in brackets, totalling 18,653 samples. - [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) subset `dpo_hq`: a cleaned version of [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) (9186) - [BramVanroy/orca_dpo_pairs_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch_cleaned) subset `dpo_all`: a cleaned version of [BramVanroy/orca_dpo_pairs_dutch](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch) (9467) A lot of different learning rates, beta, en batch sizes were investigated in search of a converging combination. You can find them all in [the W&B runs](https://wandb.ai/bramvanroy/dpo-fietje-2). ## Training procedure I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training a single run took around nine hours on one A100 80GB. Training was done with the wonderful [alignment-handbook](https://github.com/huggingface/alignment-handbook), using DeepSpeed as a back-end. Exact training recipes and SLURM script are given in the [Github repository](https://github.com/BramVanroy/fietje). ### Training hyperparameters The following hyperparameters were used during training: - beta: 0.2 - learning_rate: 2e-06 - train_batch_size: 8 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.2515 | 1.0 | 1166 | 0.2842 | -1.1549 | -3.6363 | 0.8867 | 2.4815 | -657.6813 | -451.3364 | -1.2868 | -1.3528 | ### Framework versions - Transformers 4.39.1 - Pytorch 2.1.2+cu121 - Datasets 2.18.0 - Tokenizers 0.15.2 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BramVanroy__fietje-2-chat) | Metric |Value| |-------------------|----:| |Avg. |10.39| |IFEval (0-Shot) |29.17| |BBH (3-Shot) |17.72| |MATH Lvl 5 (4-Shot)| 0.53| |GPQA (0-shot) | 0.00| |MuSR (0-shot) | 3.20| |MMLU-PRO (5-shot) |11.72|👱♀️ Base version - 🤖 Instruct version - 💬 Chat version (this one) - 🚀 GGUF of Chat