Thanks for the models and data, I have some questions.
hi,
I would like to know what template they have trained you in (chatml?)
Training time, was it training with Lora or complete?
thank you so much
Hi,
Thanks for your interest!
Template
Since we are training a base model instead of a chat model, we use a wide range of templates to diversify the format. We utilize templates from the Flan collection to diversify instruction-response pairs and templates from AdaptLLM to concatenate the raw text (context) with downstream instruction-response pairs. We are currently working on open-sourcing the code for templifying the data, so please stay tuned!
Training Settings
For pre-training, we train on all parameters. The training time for Instruction Pre-training is the same as Vanilla Pre-training. We present the training details in Table 10 in Appendix, where we train the 500M model for 5 days and the 1.3B model for 10 days.