nazimali commited on
Commit
b343b24
1 Parent(s): 05676f2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - mistralai/Mistral-Nemo-Instruct-2407
4
+ language:
5
+ - ku
6
+ - en
7
+ license: apache-2.0
8
+ tags:
9
+ - text-generation-inference
10
+ - transformers
11
+ - unsloth
12
+ - mistral
13
+ datasets:
14
+ - nazimali/kurdish-wikipedia-articles
15
+ library_name: transformers
16
+ ---
17
+
18
+ Continued pre-training on `mistralai/Mistral-Nemo-Instruct-2407` using the Kurdish wiki dataset with `unsloth`.
19
+ This model should be further fine-tuned since the pre-training was to improve Kurdish language understanding.
20
+ It's a quantized model using `bitsandbytes` so that it uses less memory. See [bitsandbytes documentation](https://huggingface.co/docs/transformers/main/en/quantization/bitsandbytes#bitsandbytes).
21
+
22
+ There isn't a standard or even a good Kurdish metric to evaluate the model (that I could find).
23
+ Will make it my next project to create an evaluation so that there's a reproducible baseline for Kurdish.
24
+
25
+
26
+ Will look into a multi-GPU training setup so don't have to wait all day for results. Would like to train it with both Kurmanji and Sorani.
27
+
28
+
29
+ ### Use
30
+
31
+ Should be fine-tuned further for a specific task.
32
+
33
+ ### Training
34
+ Transformers `4.44.2`
35
+ 1 NVIDIA A100 80GB PCIe
36
+ Duration 6h 31m 4s
37
+
38
+ ```json
39
+ {
40
+ "total_flos": 4121524790259794000,
41
+ "train/epoch": 1,
42
+ "train/global_step": 1960,
43
+ "train/grad_norm": 3.1958093643188477,
44
+ "train/learning_rate": 0,
45
+ "train/loss": 1.2108,
46
+ "train_loss": 1.256846008738693,
47
+ "train_runtime": 23227.1752,
48
+ "train_samples_per_second": 2.7,
49
+ "train_steps_per_second": 0.084
50
+ }
51
+ ```
52
+
53
+ #### Pre-training data:
54
+
55
+ - `nazimali/kurdish-wikipedia-articles`
56
+ - Dataset number of rows: 63,076
57
+ - Filtered columns `title, text`
58
+ - Must have at least 1 character
59
+ - Number of rows used for training: 62,720
60
+
61
+ #### Training prompt format:
62
+
63
+ ```python
64
+ training_prompt = """Gotara Wikipedia
65
+ ### Sernav: {}
66
+
67
+ ### Gotar:
68
+ {}"""