How to prune layers in AutoModelForCausalModel

#83
by badri369 - opened

I want to prune layers in mistral and see the results . But I am unable to do it .
What I have tried ..
I tried to create a new modulelist for model.model.layers by removing some layers it ...
But while I try to Inference from the model now .. it is breaking. Any suggestions on how to do it correctly

did you find a solution? I am interested in pruning mistral too.

merge kit ...
Slerp method :

can you please be a little elaborative?

OK : the problem of pruning a model is the layers to choose :
there is various methods you can choose :

So Here is a method using the mergekit :


# Step1 Clone the Repo for MergeKit and Install Requirements
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .



import yaml

MODEL_NAME = "Marcoro14-7B-slerp"
yaml_config = """
slices:
  - sources:
      - model: AIDC-ai-business/Marcoroni-7B-v3
        layer_range: [0, 32]
      - model: EmbeddedLLM/Mistral-7B-Merge-14-v0.1
        layer_range: [0, 32]
merge_method: slerp
base_model: AIDC-ai-business/Marcoroni-7B-v3
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
"""

# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
    f.write(yaml_config)




merge_config = """
base_model: mistralai/Codestral-22B-v0.1
dtype: float16
merge_method: task_arithmetic
slices:
- sources:
  - layer_range: [0, 32]
    model: mistralai/Codestral-22B-v0.1
  - layer_range: [23, 55]
    model: mistralai/Codestral-22B-v0.1
    parameters:
      weight: 0.4
"""

with open('config.yaml', 'w') as f:
    f.write(merge_config)
"""





I used this tody to check it works !! ( i reduced the model to 7b )
I will give it some tests today or allign it :
It took 150 gig hd space !
I used the free google Colab ! T4 :
LeroyDyer/_Spydaz_Web_AI_Codestral_7b
I will try to merge this model with the mathsstral model !

# 

@title

	 ## Run merge

# 

@markdown

	 ### Runtime type
# 

@markdown

	 Select your runtime (CPU, High RAM, GPU)

runtime = "GPU" # 

@param

	 ["CPU", "CPU + High-RAM", "GPU"]

# 

@markdown

	 ### Mergekit arguments
# 

@markdown

	 Use the `main` branch by default, [`mixtral`](https://github.com/cg123/mergekit/blob/mixtral/moe.md) if you want to create a Mixture of Experts.

branch = "main" # 

@param

	 ["main", "mixtral"]
trust_remote_code = False # 

@param

	 {type:"boolean"}

# Install mergekit


# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
    f.write(merge_config)

# Base CLI
if branch == "main":
    cli = "mergekit-yaml config.yaml merge --copy-tokenizer --out-shard-size 5B --write-model-card"
elif branch == "mixtral":
    cli = "mergekit-moe config.yaml merge --copy-tokenizer --out-shard-size 5B --write-model-card"

# Additional arguments
if runtime == "CPU":
    cli += " --allow-crimes --lazy-unpickle"
elif runtime == "GPU":
    cli += " --cuda --low-cpu-memory"
if trust_remote_code:
    cli += " --trust-remote-code"

print(cli)

# Merge models
!{cli}

the important factors to consider is ther models you merge need to be simular in the Project (Weights) << ie mistral is 14,464 etc >> As you often get a error regarding inputs and outputs to a layer :
But if you convert the model to 32 layers ie 22b to 7b then you should pick the last layers and not the first layers :
If you wish to merge it with a normal 32 layers then you can add the configuration with the in and out : but i would convert the codestral to 7b first and then merge it after with a 7b model :
But i would actually merge and align ! so i would get a standard dta set and train the trimmed model until the tensors line back up , but being the last layers they will have coherance !

anyway try and see what happens ~

can you please be a little elaborative?

Here is the final product ( I first merged it down to 32 layers ) compressing the model with the merge shown
0-32
23-52
LeroyDyer/_Spydaz_Web_AI_Codestral_12b
Hmm it ame out to 12b !
then i linear merged the product with itself :
SO the new model was merged linera with itsef
LeroyDyer/_Spydaz_Web_AI_Codestral_12b_LM :
I did npt reduce the model a second time i decided to merge to model to itself to allign its layers and enforce a smooth connection :
I did not test the model ! YET!
I was dissapointed because it came to 12b ( 32 layers is the sweet spot for models ) So we can see these bad setting they used to create the model !
it could not allign with the other models ! even the nemo it was unsucessful : this was due to the settings they used ! they are not uniform : this changes to the embeddings ( vocab and tokenizers ) Number 1 mistake !
and the hidden layer sizes ! ( sencomdmistake ) all these mismatch models are nt compatible with them selfs !
sad to say !

Sign up or log in to comment