What exactly is the 0.85?
From reading the DARE paper, I'm not sure I understand what it is these models are
Are these a merge of the base model and something else? Or is it SFT? And with what dataset?
I am trying to use the DARE method to mitigate or eliminate the mutual interference between models when merging multiple homologous PEFT models, 0.85 means the drop rate in DARE. https://github.com/uukuguy/multi_loras#mixture-of-multi-loras
Are these meant to be used as-is or do they still need to be merged? Are you basically re-tuning these models with DARE?
My confusion is that I thought DARE was for merging or fine tuning, but don't know what this is a merge/tune of
As the paper said, we can "obtain new capabilities by assimilating the parameters of homologous models without the need for retraining or GPUs".
By drop the redundant delta parameters, it's possible to mitigate the mutual interference between merging models. What I want to do is try to verify this point. If the verification is successful, then I may have the possibility to merge multiple homologous models and maintain the prominent advantages of each model. And all of this does not require retraining the model, which is the most appealing aspect to me.