Training code

#2
by kristaller486 - opened

Are you planning on posting the code for pretrain or fine-tuning?

@kristaller486 it's just frankenstein of llama2 and mistral, which was further trained after mix. You don't need anything special for fine-tuning.

upstage org

@wcde thanks!

hunkim changed discussion status to closed

@wcde First of all there is no standard frankenstein. You could combine layers in various ways, optionally use lower triangular matrix, do some more advanced maths etc.

Second can you expand further trained. Trained on what data? For how many tokens?

Sign up or log in to comment