Text Generation
Transformers
Safetensors
English
mistral
Inference Endpoints
text-generation-inference

Thanks for the models and data, I have some questions.

#2
by NickyNicky - opened

hi,

I would like to know what template they have trained you in (chatml?)

Training time, was it training with Lora or complete?

thank you so much

Hi,

Thanks for your interest!

Template
Since we are training a base model instead of a chat model, we use a wide range of templates to diversify the format. We utilize templates from the Flan collection to diversify instruction-response pairs and templates from AdaptLLM to concatenate the raw text (context) with downstream instruction-response pairs. We are currently working on open-sourcing the code for templifying the data, so please stay tuned!

Training Settings
For pre-training, we train on all parameters. The training time for Instruction Pre-training is the same as Vanilla Pre-training. We present the training details in Table 10 in Appendix, where we train the 500M model for 5 days and the 1.3B model for 10 days.

image.png

Sign up or log in to comment