Models Settings :
this model has better setting closer to the accetable setting:
32 layers / 4096 hidden size and the other nubers seem to have fallen closely into place in the PRime BInary markers:
1 2 4 8 16 32 64 128 :
there is a odd number ( 48 ) but it is still good as it is 1 /16 and 1/ 32 so its still good !
its imprtant to get the settings correxct , the pretraining is just data and can always be improved as well as the rrained methods !
but the settings are the most important for training and the associated mathmatics in the addition and subtraction of tensors , so because the number are binary aligned then the calculations will be faster ! so normalizing and finetuning and loss reduiction will also be faster ! hence ocnvergance is Faster: and retrieval is faster !
hopefull for thuis model !