What is the density for TIES when using DARE by setting `merge_method: dare_ties`?

#2
by syGOAT - opened

Congratulations on the significant breakthrough achieved in model merging! I'd like to ask you a question. In the yaml file:

models:
  - model: mistralai/Mistral-7B-v0.1
    # No parameters necessary for base model
  - model: samir-fama/SamirGPT-v1
    parameters:
      density: 0.53
      weight: 0.4
  - model: abacusai/Slerp-CM-mist-dpo
    parameters:
      density: 0.53
      weight: 0.3
  - model: EmbeddedLLM/Mistral-7B-Merge-14-v0.2
    parameters:
      density: 0.53
      weight: 0.3
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
  int8_mask: true
dtype: bfloat16

I believe the 'density' here refers to the delta parameters randomly retained by DARE. What is the density during the TIES stage after using DARE? Is it the same as the density of DARE, or is there a specific method for setting it?

Thanks! It's a good question and I assumed it was the same density but I haven't checked. It's probably somewhere in this file: https://github.com/cg123/mergekit/blob/4905d6f0c59377d1af3c120c09dff8b7f3e50cc7/mergekit/merge_methods/generalized_task_arithmetic.py

Thank you! I will review this code carefully.

Sign up or log in to comment