Update README.md
Browse files
README.md
CHANGED
@@ -104,8 +104,6 @@ YuLan-Mini is a lightweight language model with 2.4 billion parameters. It achie
|
|
104 |
|
105 |
## Model Downloads ๐
|
106 |
|
107 |
-
> Model weights will be uploaded after final preparations.
|
108 |
-
|
109 |
| Model | Context Length | SFT |
|
110 |
|---------|----------------|-----|
|
111 |
| [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) (Recommended) | 28K | โ |
|
@@ -174,6 +172,114 @@ The pre-training and evaluation code will be released in a future update.
|
|
174 |
<details><summary>2. Intermediate Stage Checkpoints</summary>
|
175 |
The intermediate stage checkpoints are released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini</a>.
|
176 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
177 |
</details>
|
178 |
|
179 |
<details><summary>3. Optimizer States Before Annealing</summary>
|
@@ -184,43 +290,38 @@ The intermediate stage checkpoints are released in <a href="https://huggingface.
|
|
184 |
|
185 |
<details><summary>4. The Used Open-Source Datasets </summary>
|
186 |
|
187 |
-
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/datasets
|
188 |
|
189 |
</details>
|
190 |
|
191 |
<details><summary>5. Data Distribution for every phase</summary>
|
192 |
|
193 |
-
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/final.pdf">
|
194 |
<div align=center>
|
195 |
<img src="assets/data_distribution_for_every_phase.png">
|
196 |
</div>
|
197 |
</a>
|
198 |
|
199 |
-
|
200 |
</details>
|
201 |
|
202 |
<details><summary>6. Synthetic Data</summary>
|
203 |
|
204 |
Data cleaning and synthesis pipeline:
|
205 |
<div align=center>
|
206 |
-
<img src="assets/data-pipeline.png">
|
207 |
</div>
|
208 |
|
209 |
The synthetic data we are using is released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini-Datasets</a>
|
210 |
|
211 |
</details>
|
212 |
|
213 |
-
<details><summary>7. Intermediate Optimizer States</summary>
|
214 |
-
|
215 |
-
Intermediate optimizer states will be released in a future update.
|
216 |
-
</details>
|
217 |
|
218 |
### What you can do with these pre-training resources
|
219 |
|
220 |
1. **Pre-train** your own LLM. You can use [our data](https://huggingface.co/yulan-team/YuLan-Mini-Datasets) and curriculum to train a model that's just as powerful as YuLan-Mini.
|
221 |
2. Perform your own **learning rate annealing**. During the annealing phase, YuLan-Mini's learning ability is at its peak. You can resume training from [the checkpoint before annealing](https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing) and use your own dataset for learning rate annealing.
|
222 |
-
3. **Fine-tune** the Instruct version of the LLM. You can use the YuLan-Mini base model to train your own Instruct version.
|
223 |
-
4. **Training dynamics** research. You can use YuLan-Mini's intermediate checkpoints to explore internal changes during the pre-training process.
|
224 |
5. **Synthesize** your own data. You can use YuLan-Mini's [data pipeline](https://github.com/RUC-GSAI/YuLan-Mini) to clean and generate your own dataset.
|
225 |
|
226 |
---
|
|
|
104 |
|
105 |
## Model Downloads ๐
|
106 |
|
|
|
|
|
107 |
| Model | Context Length | SFT |
|
108 |
|---------|----------------|-----|
|
109 |
| [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) (Recommended) | 28K | โ |
|
|
|
172 |
<details><summary>2. Intermediate Stage Checkpoints</summary>
|
173 |
The intermediate stage checkpoints are released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini</a>.
|
174 |
|
175 |
+
<table>
|
176 |
+
<thead>
|
177 |
+
<tr>
|
178 |
+
<th>Stage</th>
|
179 |
+
<th>Curriculum Phase</th>
|
180 |
+
<th>4K Context</th>
|
181 |
+
<th>28K Context</th>
|
182 |
+
<th>Optimizer</th>
|
183 |
+
<th>Inference Architecture</th>
|
184 |
+
<th>LAMBADA <code>Acc</code></th>
|
185 |
+
<th>GSM8K <code>Acc</code></th>
|
186 |
+
<th>HumanEval <code>pass@1</code></th>
|
187 |
+
</tr>
|
188 |
+
</thead>
|
189 |
+
<tbody>
|
190 |
+
<tr>
|
191 |
+
<td>Stable</td>
|
192 |
+
<td>5</td>
|
193 |
+
<td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase5">YuLan-Mini-Phase5</a></td>
|
194 |
+
<td></td>
|
195 |
+
<td></td>
|
196 |
+
<td><code>yulanmini</code></td>
|
197 |
+
<td>53.85</td>
|
198 |
+
<td>3.41</td>
|
199 |
+
<td>12.26</td>
|
200 |
+
</tr>
|
201 |
+
<tr>
|
202 |
+
<td>Stable</td>
|
203 |
+
<td>10</td>
|
204 |
+
<td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase10">YuLan-Mini-Phase10</a></td>
|
205 |
+
<td></td>
|
206 |
+
<td></td>
|
207 |
+
<td><code>yulanmini</code></td>
|
208 |
+
<td>55.00</td>
|
209 |
+
<td>9.57</td>
|
210 |
+
<td>15.95</td>
|
211 |
+
</tr>
|
212 |
+
<tr>
|
213 |
+
<td>Stable</td>
|
214 |
+
<td>15</td>
|
215 |
+
<td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase15">YuLan-Mini-Phase15</a></td>
|
216 |
+
<td></td>
|
217 |
+
<td></td>
|
218 |
+
<td><code>yulanmini</code></td>
|
219 |
+
<td>55.81</td>
|
220 |
+
<td>13.81</td>
|
221 |
+
<td>16.99</td>
|
222 |
+
</tr>
|
223 |
+
<tr>
|
224 |
+
<td>Stable</td>
|
225 |
+
<td>20</td>
|
226 |
+
<td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase20">YuLan-Mini-Phase20</a></td>
|
227 |
+
<td></td>
|
228 |
+
<td>โ
</td>
|
229 |
+
<td><code>yulanmini</code></td>
|
230 |
+
<td>55.81</td>
|
231 |
+
<td>21.39</td>
|
232 |
+
<td>20.79</td>
|
233 |
+
</tr>
|
234 |
+
<tr>
|
235 |
+
<td>Stable</td>
|
236 |
+
<td>25 (1T tokens)</td>
|
237 |
+
<td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing">YuLan-Mini-Before-Annealing</a></td>
|
238 |
+
<td></td>
|
239 |
+
<td>โ
</td>
|
240 |
+
<td><code>yulanmini</code></td>
|
241 |
+
<td>55.67</td>
|
242 |
+
<td>29.94</td>
|
243 |
+
<td>34.06</td>
|
244 |
+
</tr>
|
245 |
+
<tr>
|
246 |
+
<td></td>
|
247 |
+
<td></td>
|
248 |
+
<td></td>
|
249 |
+
<td></td>
|
250 |
+
<td></td>
|
251 |
+
<td></td>
|
252 |
+
<td></td>
|
253 |
+
<td></td>
|
254 |
+
<td></td>
|
255 |
+
</tr>
|
256 |
+
<tr>
|
257 |
+
<td>Annealing</td>
|
258 |
+
<td>26</td>
|
259 |
+
<td>YuLan-Mini-4K</td>
|
260 |
+
<td></td>
|
261 |
+
<td></td>
|
262 |
+
<td><code>llama</code>*</td>
|
263 |
+
<td>64.72</td>
|
264 |
+
<td>66.65</td>
|
265 |
+
<td>61.60</td>
|
266 |
+
</tr>
|
267 |
+
<tr>
|
268 |
+
<td>Annealing</td>
|
269 |
+
<td>27</td>
|
270 |
+
<td></td>
|
271 |
+
<td><a href="https://huggingface.co/yulan-team/YuLan-Mini">YuLan-Mini</a></td>
|
272 |
+
<td></td>
|
273 |
+
<td><code>llama</code>*</td>
|
274 |
+
<td>65.67</td>
|
275 |
+
<td>68.46</td>
|
276 |
+
<td>64.00</td>
|
277 |
+
</tr>
|
278 |
+
</tbody>
|
279 |
+
</table>
|
280 |
+
|
281 |
+
\*: For easier inference and deployment, we merged the re-parameterized added parameters and scaling factors into the final released models ([**YuLan-Mini**](https://huggingface.co/yulan-team/YuLan-Mini) and **YuLan-Mini-Intermediate-4K**), enabling it to run on the Llama architecture. However, these parameters are still retained in the intermediate checkpoints from the training process.
|
282 |
+
|
283 |
</details>
|
284 |
|
285 |
<details><summary>3. Optimizer States Before Annealing</summary>
|
|
|
290 |
|
291 |
<details><summary>4. The Used Open-Source Datasets </summary>
|
292 |
|
293 |
+
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/datasets">Used-Datasets-List</a>
|
294 |
|
295 |
</details>
|
296 |
|
297 |
<details><summary>5. Data Distribution for every phase</summary>
|
298 |
|
299 |
+
<a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/datasets/final.pdf">
|
300 |
<div align=center>
|
301 |
<img src="assets/data_distribution_for_every_phase.png">
|
302 |
</div>
|
303 |
</a>
|
304 |
|
|
|
305 |
</details>
|
306 |
|
307 |
<details><summary>6. Synthetic Data</summary>
|
308 |
|
309 |
Data cleaning and synthesis pipeline:
|
310 |
<div align=center>
|
311 |
+
<img src="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/assets/data-pipeline.png">
|
312 |
</div>
|
313 |
|
314 |
The synthetic data we are using is released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini-Datasets</a>
|
315 |
|
316 |
</details>
|
317 |
|
|
|
|
|
|
|
|
|
318 |
|
319 |
### What you can do with these pre-training resources
|
320 |
|
321 |
1. **Pre-train** your own LLM. You can use [our data](https://huggingface.co/yulan-team/YuLan-Mini-Datasets) and curriculum to train a model that's just as powerful as YuLan-Mini.
|
322 |
2. Perform your own **learning rate annealing**. During the annealing phase, YuLan-Mini's learning ability is at its peak. You can resume training from [the checkpoint before annealing](https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing) and use your own dataset for learning rate annealing.
|
323 |
+
3. **Fine-tune** the Instruct version of the LLM. You can use the [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) base model to train your own Instruct version.
|
324 |
+
4. **Training dynamics** research. You can use YuLan-Mini's [intermediate checkpoints](https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3) to explore internal changes during the pre-training process.
|
325 |
5. **Synthesize** your own data. You can use YuLan-Mini's [data pipeline](https://github.com/RUC-GSAI/YuLan-Mini) to clean and generate your own dataset.
|
326 |
|
327 |
---
|