Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model:
|
3 |
+
- mistralai/Mistral-Nemo-Instruct-2407
|
4 |
+
language:
|
5 |
+
- ku
|
6 |
+
- en
|
7 |
+
license: apache-2.0
|
8 |
+
tags:
|
9 |
+
- text-generation-inference
|
10 |
+
- transformers
|
11 |
+
- unsloth
|
12 |
+
- mistral
|
13 |
+
datasets:
|
14 |
+
- nazimali/kurdish-wikipedia-articles
|
15 |
+
library_name: transformers
|
16 |
+
---
|
17 |
+
|
18 |
+
Continued pre-training on `mistralai/Mistral-Nemo-Instruct-2407` using the Kurdish wiki dataset with `unsloth`.
|
19 |
+
This model should be further fine-tuned since the pre-training was to improve Kurdish language understanding.
|
20 |
+
It's a quantized model using `bitsandbytes` so that it uses less memory. See [bitsandbytes documentation](https://huggingface.co/docs/transformers/main/en/quantization/bitsandbytes#bitsandbytes).
|
21 |
+
|
22 |
+
There isn't a standard or even a good Kurdish metric to evaluate the model (that I could find).
|
23 |
+
Will make it my next project to create an evaluation so that there's a reproducible baseline for Kurdish.
|
24 |
+
|
25 |
+
|
26 |
+
Will look into a multi-GPU training setup so don't have to wait all day for results. Would like to train it with both Kurmanji and Sorani.
|
27 |
+
|
28 |
+
|
29 |
+
### Use
|
30 |
+
|
31 |
+
Should be fine-tuned further for a specific task.
|
32 |
+
|
33 |
+
### Training
|
34 |
+
Transformers `4.44.2`
|
35 |
+
1 NVIDIA A100 80GB PCIe
|
36 |
+
Duration 6h 31m 4s
|
37 |
+
|
38 |
+
```json
|
39 |
+
{
|
40 |
+
"total_flos": 4121524790259794000,
|
41 |
+
"train/epoch": 1,
|
42 |
+
"train/global_step": 1960,
|
43 |
+
"train/grad_norm": 3.1958093643188477,
|
44 |
+
"train/learning_rate": 0,
|
45 |
+
"train/loss": 1.2108,
|
46 |
+
"train_loss": 1.256846008738693,
|
47 |
+
"train_runtime": 23227.1752,
|
48 |
+
"train_samples_per_second": 2.7,
|
49 |
+
"train_steps_per_second": 0.084
|
50 |
+
}
|
51 |
+
```
|
52 |
+
|
53 |
+
#### Pre-training data:
|
54 |
+
|
55 |
+
- `nazimali/kurdish-wikipedia-articles`
|
56 |
+
- Dataset number of rows: 63,076
|
57 |
+
- Filtered columns `title, text`
|
58 |
+
- Must have at least 1 character
|
59 |
+
- Number of rows used for training: 62,720
|
60 |
+
|
61 |
+
#### Training prompt format:
|
62 |
+
|
63 |
+
```python
|
64 |
+
training_prompt = """Gotara Wikipedia
|
65 |
+
### Sernav: {}
|
66 |
+
|
67 |
+
### Gotar:
|
68 |
+
{}"""
|