mlabonne/Daredevil-7B · What is the density for TIES when using DARE by setting `merge_method: dare

Jan 25, 2024

Congratulations on the significant breakthrough achieved in model merging! I'd like to ask you a question. In the yaml file:

models:
  - model: mistralai/Mistral-7B-v0.1
    # No parameters necessary for base model
  - model: samir-fama/SamirGPT-v1
    parameters:
      density: 0.53
      weight: 0.4
  - model: abacusai/Slerp-CM-mist-dpo
    parameters:
      density: 0.53
      weight: 0.3
  - model: EmbeddedLLM/Mistral-7B-Merge-14-v0.2
    parameters:
      density: 0.53
      weight: 0.3
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
  int8_mask: true
dtype: bfloat16

I believe the 'density' here refers to the delta parameters randomly retained by DARE. What is the density during the TIES stage after using DARE? Is it the same as the density of DARE, or is there a specific method for setting it?

mlabonne

Owner Jan 25, 2024

Thanks! It's a good question and I assumed it was the same density but I haven't checked. It's probably somewhere in this file: https://github.com/cg123/mergekit/blob/4905d6f0c59377d1af3c120c09dff8b7f3e50cc7/mergekit/merge_methods/generalized_task_arithmetic.py

syGOAT

Jan 26, 2024

•

edited Jan 26, 2024

Thank you! I will review this code carefully.

mlabonne
/

Daredevil-7B

What is the density for TIES when using DARE by setting `merge_method: dare_ties`?