BramVanroy commited on
Commit
767316e
1 Parent(s): b43ee7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -17
README.md CHANGED
@@ -1,37 +1,48 @@
1
  ---
 
2
  base_model: meta-llama/Llama-2-13b-hf
3
  tags:
4
- - generated_from_trainer
 
 
 
5
  datasets:
6
- - yhavinga/mc4_nl_cleaned
 
 
7
  model-index:
8
- - name: tiny-3e-4lr+1152tbs+1ep+0.1wd
9
- results: []
10
  ---
 
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # tiny-3e-4lr+1152tbs+1ep+0.1wd
16
-
17
- This model is a fine-tuned version of [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) on the yhavinga/mc4_nl_cleaned micro dataset.
18
- It achieves the following results on the evaluation set:
19
- - Loss: 1.7676
20
-
21
- ## Model description
22
-
23
- More information needed
24
 
25
  ## Intended uses & limitations
26
 
27
- More information needed
 
 
 
28
 
29
  ## Training and evaluation data
30
 
31
- More information needed
 
 
 
32
 
33
  ## Training procedure
34
 
 
 
 
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
 
1
  ---
2
+ license: apache-2.0
3
  base_model: meta-llama/Llama-2-13b-hf
4
  tags:
5
+ - generated_from_trainer
6
+ - llama
7
+ - lora
8
+ - adapters
9
  datasets:
10
+ - yhavinga/mc4_nl_cleaned
11
+ language:
12
+ - nl
13
  model-index:
14
+ - name: llama2-13b-ft-mc4_nl_cleaned_tiny
15
+ results: []
16
  ---
17
+
18
 
19
+ # llama2-13b-ft-mc4_nl_cleaned_tiny
 
20
 
21
+ This model is a fine-tuned version of [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)
22
+ on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) on a context of 4096 tokens.
23
+ See the original [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) for more information, intended use, and biases.
 
 
 
 
 
 
24
 
25
  ## Intended uses & limitations
26
 
27
+ While Llama 2 already contains some proficiency in Dutch, this finetune is intended to improve the fluency of Dutch (not increase its knowledge). It is therefore
28
+ intended as a generative model for Dutch language. The biases, shortcomings and intended uses are otherwise the same as those of
29
+ the [original model]((https://huggingface.co/meta-llama/Llama-2-13b-hf)). The model can be used for generative tasks or finetuned further on other tasks
30
+ such as instruction or chat finetuning.
31
 
32
  ## Training and evaluation data
33
 
34
+ Trained on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) for one epoch. The canonical
35
+ validation split was not used but instead 5% of `train` was used as validation.
36
+
37
+
38
 
39
  ## Training procedure
40
 
41
+ Trained with LoRA targetting `["q_proj", "v_proj"]` in 4 bit and merged before upload. Trained with Flash Attention as borrowed from
42
+ [here](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/llama_patch.py).
43
+
44
+ The adapters are in the `adapters` branch.
45
+
46
  ### Training hyperparameters
47
 
48
  The following hyperparameters were used during training: