What is the difference between this model and OpenLLaMA 7Bv2?
#1
by
weiyucheng
- opened
The training dataset seems to be the same, but this model's performance is much better.
The training dataset seems to be the same, but this model's performance is much better.
The sole difference lies in the training framework, which has been shifted from using Jax on TPU to employing MegatronLM on GPU. The traning loss is more lower.
@itsliupeng Are the hyperparameters the same?
@itsliupeng Are the hyperparameters the same?
Yes, cosinle lr 3e-4, batch_size 4M tokens, the same with llama2-7B
itsliupeng
changed discussion status to
closed