IvanHU commited on
Commit
3e9b5e2
ยท
verified ยท
1 Parent(s): f973676

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -12
README.md CHANGED
@@ -104,8 +104,6 @@ YuLan-Mini is a lightweight language model with 2.4 billion parameters. It achie
104
 
105
  ## Model Downloads ๐Ÿ”—
106
 
107
- > Model weights will be uploaded after final preparations.
108
-
109
  | Model | Context Length | SFT |
110
  |---------|----------------|-----|
111
  | [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) (Recommended) | 28K | โŽ |
@@ -174,6 +172,114 @@ The pre-training and evaluation code will be released in a future update.
174
  <details><summary>2. Intermediate Stage Checkpoints</summary>
175
  The intermediate stage checkpoints are released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini</a>.
176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
  </details>
178
 
179
  <details><summary>3. Optimizer States Before Annealing</summary>
@@ -184,43 +290,38 @@ The intermediate stage checkpoints are released in <a href="https://huggingface.
184
 
185
  <details><summary>4. The Used Open-Source Datasets </summary>
186
 
187
- <a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/datasets-list.md">Used-Datasets-List</a>
188
 
189
  </details>
190
 
191
  <details><summary>5. Data Distribution for every phase</summary>
192
 
193
- <a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/final.pdf">
194
  <div align=center>
195
  <img src="assets/data_distribution_for_every_phase.png">
196
  </div>
197
  </a>
198
 
199
-
200
  </details>
201
 
202
  <details><summary>6. Synthetic Data</summary>
203
 
204
  Data cleaning and synthesis pipeline:
205
  <div align=center>
206
- <img src="assets/data-pipeline.png">
207
  </div>
208
 
209
  The synthetic data we are using is released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini-Datasets</a>
210
 
211
  </details>
212
 
213
- <details><summary>7. Intermediate Optimizer States</summary>
214
-
215
- Intermediate optimizer states will be released in a future update.
216
- </details>
217
 
218
  ### What you can do with these pre-training resources
219
 
220
  1. **Pre-train** your own LLM. You can use [our data](https://huggingface.co/yulan-team/YuLan-Mini-Datasets) and curriculum to train a model that's just as powerful as YuLan-Mini.
221
  2. Perform your own **learning rate annealing**. During the annealing phase, YuLan-Mini's learning ability is at its peak. You can resume training from [the checkpoint before annealing](https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing) and use your own dataset for learning rate annealing.
222
- 3. **Fine-tune** the Instruct version of the LLM. You can use the YuLan-Mini base model to train your own Instruct version.
223
- 4. **Training dynamics** research. You can use YuLan-Mini's intermediate checkpoints to explore internal changes during the pre-training process.
224
  5. **Synthesize** your own data. You can use YuLan-Mini's [data pipeline](https://github.com/RUC-GSAI/YuLan-Mini) to clean and generate your own dataset.
225
 
226
  ---
 
104
 
105
  ## Model Downloads ๐Ÿ”—
106
 
 
 
107
  | Model | Context Length | SFT |
108
  |---------|----------------|-----|
109
  | [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) (Recommended) | 28K | โŽ |
 
172
  <details><summary>2. Intermediate Stage Checkpoints</summary>
173
  The intermediate stage checkpoints are released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini</a>.
174
 
175
+ <table>
176
+ <thead>
177
+ <tr>
178
+ <th>Stage</th>
179
+ <th>Curriculum Phase</th>
180
+ <th>4K Context</th>
181
+ <th>28K Context</th>
182
+ <th>Optimizer</th>
183
+ <th>Inference Architecture</th>
184
+ <th>LAMBADA <code>Acc</code></th>
185
+ <th>GSM8K <code>Acc</code></th>
186
+ <th>HumanEval <code>pass@1</code></th>
187
+ </tr>
188
+ </thead>
189
+ <tbody>
190
+ <tr>
191
+ <td>Stable</td>
192
+ <td>5</td>
193
+ <td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase5">YuLan-Mini-Phase5</a></td>
194
+ <td></td>
195
+ <td></td>
196
+ <td><code>yulanmini</code></td>
197
+ <td>53.85</td>
198
+ <td>3.41</td>
199
+ <td>12.26</td>
200
+ </tr>
201
+ <tr>
202
+ <td>Stable</td>
203
+ <td>10</td>
204
+ <td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase10">YuLan-Mini-Phase10</a></td>
205
+ <td></td>
206
+ <td></td>
207
+ <td><code>yulanmini</code></td>
208
+ <td>55.00</td>
209
+ <td>9.57</td>
210
+ <td>15.95</td>
211
+ </tr>
212
+ <tr>
213
+ <td>Stable</td>
214
+ <td>15</td>
215
+ <td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase15">YuLan-Mini-Phase15</a></td>
216
+ <td></td>
217
+ <td></td>
218
+ <td><code>yulanmini</code></td>
219
+ <td>55.81</td>
220
+ <td>13.81</td>
221
+ <td>16.99</td>
222
+ </tr>
223
+ <tr>
224
+ <td>Stable</td>
225
+ <td>20</td>
226
+ <td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Phase20">YuLan-Mini-Phase20</a></td>
227
+ <td></td>
228
+ <td>โœ…</td>
229
+ <td><code>yulanmini</code></td>
230
+ <td>55.81</td>
231
+ <td>21.39</td>
232
+ <td>20.79</td>
233
+ </tr>
234
+ <tr>
235
+ <td>Stable</td>
236
+ <td>25 (1T tokens)</td>
237
+ <td><a href="https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing">YuLan-Mini-Before-Annealing</a></td>
238
+ <td></td>
239
+ <td>โœ…</td>
240
+ <td><code>yulanmini</code></td>
241
+ <td>55.67</td>
242
+ <td>29.94</td>
243
+ <td>34.06</td>
244
+ </tr>
245
+ <tr>
246
+ <td></td>
247
+ <td></td>
248
+ <td></td>
249
+ <td></td>
250
+ <td></td>
251
+ <td></td>
252
+ <td></td>
253
+ <td></td>
254
+ <td></td>
255
+ </tr>
256
+ <tr>
257
+ <td>Annealing</td>
258
+ <td>26</td>
259
+ <td>YuLan-Mini-4K</td>
260
+ <td></td>
261
+ <td></td>
262
+ <td><code>llama</code>*</td>
263
+ <td>64.72</td>
264
+ <td>66.65</td>
265
+ <td>61.60</td>
266
+ </tr>
267
+ <tr>
268
+ <td>Annealing</td>
269
+ <td>27</td>
270
+ <td></td>
271
+ <td><a href="https://huggingface.co/yulan-team/YuLan-Mini">YuLan-Mini</a></td>
272
+ <td></td>
273
+ <td><code>llama</code>*</td>
274
+ <td>65.67</td>
275
+ <td>68.46</td>
276
+ <td>64.00</td>
277
+ </tr>
278
+ </tbody>
279
+ </table>
280
+
281
+ \*: For easier inference and deployment, we merged the re-parameterized added parameters and scaling factors into the final released models ([**YuLan-Mini**](https://huggingface.co/yulan-team/YuLan-Mini) and **YuLan-Mini-Intermediate-4K**), enabling it to run on the Llama architecture. However, these parameters are still retained in the intermediate checkpoints from the training process.
282
+
283
  </details>
284
 
285
  <details><summary>3. Optimizer States Before Annealing</summary>
 
290
 
291
  <details><summary>4. The Used Open-Source Datasets </summary>
292
 
293
+ <a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/datasets">Used-Datasets-List</a>
294
 
295
  </details>
296
 
297
  <details><summary>5. Data Distribution for every phase</summary>
298
 
299
+ <a href="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/pretrain/datasets/final.pdf">
300
  <div align=center>
301
  <img src="assets/data_distribution_for_every_phase.png">
302
  </div>
303
  </a>
304
 
 
305
  </details>
306
 
307
  <details><summary>6. Synthetic Data</summary>
308
 
309
  Data cleaning and synthesis pipeline:
310
  <div align=center>
311
+ <img src="https://github.com/RUC-GSAI/YuLan-Mini/blob/main/assets/data-pipeline.png">
312
  </div>
313
 
314
  The synthetic data we are using is released in <a href="https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3">YuLan-Mini-Datasets</a>
315
 
316
  </details>
317
 
 
 
 
 
318
 
319
  ### What you can do with these pre-training resources
320
 
321
  1. **Pre-train** your own LLM. You can use [our data](https://huggingface.co/yulan-team/YuLan-Mini-Datasets) and curriculum to train a model that's just as powerful as YuLan-Mini.
322
  2. Perform your own **learning rate annealing**. During the annealing phase, YuLan-Mini's learning ability is at its peak. You can resume training from [the checkpoint before annealing](https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing) and use your own dataset for learning rate annealing.
323
+ 3. **Fine-tune** the Instruct version of the LLM. You can use the [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) base model to train your own Instruct version.
324
+ 4. **Training dynamics** research. You can use YuLan-Mini's [intermediate checkpoints](https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3) to explore internal changes during the pre-training process.
325
  5. **Synthesize** your own data. You can use YuLan-Mini's [data pipeline](https://github.com/RUC-GSAI/YuLan-Mini) to clean and generate your own dataset.
326
 
327
  ---