How to prune layers in AutoModelForCausalModel
I want to prune layers in mistral and see the results . But I am unable to do it .
What I have tried ..
I tried to create a new modulelist for model.model.layers by removing some layers it ...
But while I try to Inference from the model now .. it is breaking. Any suggestions on how to do it correctly
did you find a solution? I am interested in pruning mistral too.
merge kit ...
Slerp method :
can you please be a little elaborative?
OK : the problem of pruning a model is the layers to choose :
there is various methods you can choose :
So Here is a method using the mergekit :
# Step1 Clone the Repo for MergeKit and Install Requirements
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .
import yaml
MODEL_NAME = "Marcoro14-7B-slerp"
yaml_config = """
slices:
- sources:
- model: AIDC-ai-business/Marcoroni-7B-v3
layer_range: [0, 32]
- model: EmbeddedLLM/Mistral-7B-Merge-14-v0.1
layer_range: [0, 32]
merge_method: slerp
base_model: AIDC-ai-business/Marcoroni-7B-v3
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
"""
# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
f.write(yaml_config)
merge_config = """
base_model: mistralai/Codestral-22B-v0.1
dtype: float16
merge_method: task_arithmetic
slices:
- sources:
- layer_range: [0, 32]
model: mistralai/Codestral-22B-v0.1
- layer_range: [23, 55]
model: mistralai/Codestral-22B-v0.1
parameters:
weight: 0.4
"""
with open('config.yaml', 'w') as f:
f.write(merge_config)
"""
I used this tody to check it works !! ( i reduced the model to 7b )
I will give it some tests today or allign it :
It took 150 gig hd space !
I used the free google Colab ! T4 :
LeroyDyer/_Spydaz_Web_AI_Codestral_7b
I will try to merge this model with the mathsstral model !
#
@title
## Run merge
#
@markdown
### Runtime type
#
@markdown
Select your runtime (CPU, High RAM, GPU)
runtime = "GPU" #
@param
["CPU", "CPU + High-RAM", "GPU"]
#
@markdown
### Mergekit arguments
#
@markdown
Use the `main` branch by default, [`mixtral`](https://github.com/cg123/mergekit/blob/mixtral/moe.md) if you want to create a Mixture of Experts.
branch = "main" #
@param
["main", "mixtral"]
trust_remote_code = False #
@param
{type:"boolean"}
# Install mergekit
# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
f.write(merge_config)
# Base CLI
if branch == "main":
cli = "mergekit-yaml config.yaml merge --copy-tokenizer --out-shard-size 5B --write-model-card"
elif branch == "mixtral":
cli = "mergekit-moe config.yaml merge --copy-tokenizer --out-shard-size 5B --write-model-card"
# Additional arguments
if runtime == "CPU":
cli += " --allow-crimes --lazy-unpickle"
elif runtime == "GPU":
cli += " --cuda --low-cpu-memory"
if trust_remote_code:
cli += " --trust-remote-code"
print(cli)
# Merge models
!{cli}
the important factors to consider is ther models you merge need to be simular in the Project (Weights) << ie mistral is 14,464 etc >> As you often get a error regarding inputs and outputs to a layer :
But if you convert the model to 32 layers ie 22b to 7b then you should pick the last layers and not the first layers :
If you wish to merge it with a normal 32 layers then you can add the configuration with the in and out : but i would convert the codestral to 7b first and then merge it after with a 7b model :
But i would actually merge and align ! so i would get a standard dta set and train the trimmed model until the tensors line back up , but being the last layers they will have coherance !
anyway try and see what happens ~
can you please be a little elaborative?
Here is the final product ( I first merged it down to 32 layers ) compressing the model with the merge shown
0-32
23-52
LeroyDyer/_Spydaz_Web_AI_Codestral_12b
Hmm it ame out to 12b !
then i linear merged the product with itself :
SO the new model was merged linera with itsef
LeroyDyer/_Spydaz_Web_AI_Codestral_12b_LM :
I did npt reduce the model a second time i decided to merge to model to itself to allign its layers and enforce a smooth connection :
I did not test the model ! YET!
I was dissapointed because it came to 12b ( 32 layers is the sweet spot for models ) So we can see these bad setting they used to create the model !
it could not allign with the other models ! even the nemo it was unsucessful : this was due to the settings they used ! they are not uniform : this changes to the embeddings ( vocab and tokenizers ) Number 1 mistake !
and the hidden layer sizes ! ( sencomdmistake ) all these mismatch models are nt compatible with them selfs !
sad to say !