More information about previous Neuronovo/neuronovo-7B-v0.2 version available here: ๐Don't stop DPOptimizing!
Author: Jan Kocoล ๐LinkedIn ๐Google Scholar ๐ResearchGate
Changes concerning Neuronovo/neuronovo-7B-v0.2:
Training Dataset: In addition to the "Intel/orca_dpo_pairs" dataset, this version incorporates a new dataset "mlabonne/chatml_dpo_pairs." The combined datasets enhance the model's capabilities in dialogues and interactive scenarios, further specializing it in natural language understanding and response generation.
Tokenizer and Formatting: The tokenizer now originates directly from the "Neuronovo/neuronovo-7B-v0.2" model.
Training Configuration: The training approach has shifted from using
max_steps=200
tonum_train_epochs=1
. This represents a change in the training strategy, focusing on epoch-based training rather than a fixed number of steps.Learning Rate: The learning rate has been reduced to a smaller value of
5e-6
. This finer learning rate allows for more precise adjustments during the training process, potentially leading to better model performance.