Commit
·
85a4ec7
1
Parent(s):
69bbf49
Update README.md
Browse files
README.md
CHANGED
@@ -112,4 +112,40 @@ for s in generation_output.sequences:
|
|
112 |
|
113 |
```
|
114 |
|
115 |
-
# Training:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
```
|
114 |
|
115 |
+
# Training:
|
116 |
+
|
117 |
+
## Dataset:
|
118 |
+
|
119 |
+
- Jumtra/oasst1_ja
|
120 |
+
|
121 |
+
- Jumtra/jglue_jsquads_with_input
|
122 |
+
|
123 |
+
- Jumtra/dolly_oast_jglue_ja
|
124 |
+
|
125 |
+
- Aruno/guanaco_jp
|
126 |
+
|
127 |
+
- yahma/alpaca-cleaned
|
128 |
+
|
129 |
+
- databricks/databricks-dolly-15k
|
130 |
+
|
131 |
+
with about 750k entries, 2k entries used for evaluate process
|
132 |
+
|
133 |
+
## Training setup
|
134 |
+
|
135 |
+
I trained this model on an instance from **vast.ai**
|
136 |
+
|
137 |
+
- 1 NVIDIA RTX 4090
|
138 |
+
|
139 |
+
- 90 GB Storage
|
140 |
+
|
141 |
+
- Time spend about 3 and a half days
|
142 |
+
|
143 |
+
- use ```python export.py``` to merge weight
|
144 |
+
|
145 |
+
- Training loss
|
146 |
+
|
147 |
+
data:image/s3,"s3://crabby-images/61bb0/61bb0ceff69664fb5b506646edfcd1edc3bc0ced" alt="training loss chart"
|
148 |
+
|
149 |
+
- Eval loss chart
|
150 |
+
|
151 |
+
data:image/s3,"s3://crabby-images/ade45/ade452b6be3d568860763346376dce8dfa62f1f1" alt="eval loss chart"
|