Update README.md
Browse files
README.md
CHANGED
@@ -60,6 +60,10 @@ outputs = model.generate(**input_ids, max_new_tokens=100)
|
|
60 |
print(tokenizer.decode(outputs[0]))
|
61 |
```
|
62 |
|
|
|
|
|
|
|
|
|
63 |
### Fine-tuning with Learning Rate Optimization
|
64 |
|
65 |
The model includes an advanced learning rate optimization system for fine-tuning, implemented through the `LROptimizerCallback` class. This callback automatically handles learning rate optimization during training. Here's how to use it:
|
@@ -141,4 +145,5 @@ And memory overhead
|
|
141 |
|
142 |
## Notice
|
143 |
|
144 |
-
Zamba2-1.2B is a pretrained base model and therefore does not have any moderation mechanism and may output toxic or otherwise harmful language. In addition, one should not expect good instruct or chat performance, as this model was not fine-tuned for instruction following or chat.
|
|
|
|
60 |
print(tokenizer.decode(outputs[0]))
|
61 |
```
|
62 |
|
63 |
+
## Training Data
|
64 |
+
|
65 |
+
The model is fine-tuned on the **Dolly-15k Dutch** dataset, specifically using the training split (`train_sft`). This dataset is not SoTA, however the goal is to demonstrate the capabilities and it fits <1024 tokens.
|
66 |
+
|
67 |
### Fine-tuning with Learning Rate Optimization
|
68 |
|
69 |
The model includes an advanced learning rate optimization system for fine-tuning, implemented through the `LROptimizerCallback` class. This callback automatically handles learning rate optimization during training. Here's how to use it:
|
|
|
145 |
|
146 |
## Notice
|
147 |
|
148 |
+
Zamba2-1.2B is a pretrained base model and therefore does not have any moderation mechanism and may output toxic or otherwise harmful language. In addition, one should not expect good instruct or chat performance, as this model was not fine-tuned for instruction following or chat.
|
149 |
+
|