robinsmits commited on
Commit
d5433c6
1 Parent(s): 83ca45d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -10
README.md CHANGED
@@ -19,26 +19,27 @@ pipeline_tag: text-generation
19
  inference: false
20
  ---
21
 
22
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
23
- should probably proofread and complete it, then remove this comment. -->
24
-
25
  # Qwen1.5-7B-Dutch-Chat-Sft
26
 
27
- This model is a fine-tuned version of [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) on an unknown dataset.
28
- It achieves the following results on the evaluation set:
29
- - Loss: 1.1756
30
-
31
  ## Model description
32
 
33
- More information needed
 
 
34
 
35
  ## Intended uses & limitations
36
 
37
- More information needed
 
 
38
 
39
  ## Training and evaluation data
40
 
41
- More information needed
 
 
 
 
42
 
43
  ## Training procedure
44
 
 
19
  inference: false
20
  ---
21
 
 
 
 
22
  # Qwen1.5-7B-Dutch-Chat-Sft
23
 
 
 
 
 
24
  ## Model description
25
 
26
+ This finetuned model is an adapter model based on [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat).
27
+
28
+ Finetuning was performed on the Dutch [BramVanroy/ultrachat_200k_dutch](https://huggingface.co/datasets/BramVanroy/ultrachat_200k_dutch) dataset.
29
 
30
  ## Intended uses & limitations
31
 
32
+ As with all LLM's this model can also experience bias and hallucinations. Regardless of how you use this model always perform the necessary testing and validating.
33
+
34
+ The used dataset does not allow commercial usage.
35
 
36
  ## Training and evaluation data
37
 
38
+ The training notebook is available at the following link: [Qwen1_5_7B_Dutch_Chat_SFT](https://github.com/RobinSmits/Dutch-LLMs/blob/main/Qwen1_5_7B_Dutch_Chat_SFT.ipynb)
39
+
40
+ Training was performed with Google Colab PRO on a A100 - 40GB.
41
+
42
+ As the amount of data was more than would fit within the maximum 24 hour session that Google Colab PRO allows I split the dataset in 2 equal parts. Training for each part lasted around 14 hours. In the second part I enabled 'resume_from_checkpoint' to continue the training.
43
 
44
  ## Training procedure
45