BramVanroy
/

llama2-13b-ft-mc4_nl_cleaned_tiny

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

BramVanroy commited on Aug 10, 2023

Commit

767316e

•

1 Parent(s): b43ee7d

Update README.md

Files changed (1) hide show

README.md +28 -17

README.md CHANGED Viewed

@@ -1,37 +1,48 @@
 ---
 base_model: meta-llama/Llama-2-13b-hf
 tags:
-- generated_from_trainer
 datasets:
-- yhavinga/mc4_nl_cleaned
 model-index:
-- name: tiny-3e-4lr+1152tbs+1ep+0.1wd
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# tiny-3e-4lr+1152tbs+1ep+0.1wd
-This model is a fine-tuned version of [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) on the yhavinga/mc4_nl_cleaned micro dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.7676
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

 ---
+license: apache-2.0
 base_model: meta-llama/Llama-2-13b-hf
 tags:
+  - generated_from_trainer
+  - llama
+  - lora
+  - adapters
 datasets:
+  - yhavinga/mc4_nl_cleaned
+language:
+  - nl
 model-index:
+  - name: llama2-13b-ft-mc4_nl_cleaned_tiny
+    results: []
 ---
+# llama2-13b-ft-mc4_nl_cleaned_tiny
+This model is a fine-tuned version of [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)
+on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) on a context of 4096 tokens.
+See the original [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) for more information, intended use, and biases.
 ## Intended uses & limitations
+While Llama 2 already contains some proficiency in Dutch, this finetune is intended to improve the fluency of Dutch (not increase its knowledge). It is therefore
+intended as a generative model for Dutch language. The biases, shortcomings and intended uses are otherwise the same as those of
+the [original model]((https://huggingface.co/meta-llama/Llama-2-13b-hf)). The model can be used for generative tasks or finetuned further on other tasks
+such as instruction or chat finetuning.
 ## Training and evaluation data
+Trained on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) for one epoch. The canonical
+validation split was not used but instead 5% of `train` was used as validation.
 ## Training procedure
+Trained with LoRA targetting `["q_proj", "v_proj"]` in 4 bit and merged before upload. Trained with Flash Attention as borrowed from
+[here](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/llama_patch.py).
+The adapters are in the `adapters` branch.
 ### Training hyperparameters
 The following hyperparameters were used during training: