(those fine-tuned from the same backbone). - --- -Sorry can you elaborate ... backbone ?
does this refer to the lora conifg ? does this make a difference when merging ?
if the models are both finetuned using the same lora setting then they are destined to keep more traits ?
when i fine tune i vary the lora configs : for different depth of training ... as for the bible when i tried to train on (16,16 or 8,16) lora config (15-20 million parameters) it was so far from the data and was taking a long time just to reduce , but when i used the better settings of 128/256 (pushing a much higher set of parameters) the task was easier to train , later when i trained the same data in a multilingual data set i used a basic 4/16 setup for training and it took very easy ! hence the depth of the training had an effect :
after choosing alpha monarch and omni beagle to merge with this model ... it was not a great result despite using the ties/ merge... and the linear soft-max merger after ... in-fact the original base model was totally lost ! hence re-merging various different strategies in an attempt to recover the model but ended up using a merge of merges (genetic technique).. to recenter the model back to the original model , basically using the merge of merges as a lora with the base model and only absorbing very low density/weights and deltas ... (no more foreign mergers)....
so what does backbone refer to please.