Quantization made by Richard Erkhov.

Llama-3-15B-Instruct-zeroed-ft - GGUF

Model creator: https://huggingface.co/elinas/
Original model: https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed-ft/

Name	Quant method	Size
Llama-3-15B-Instruct-zeroed-ft.Q2_K.gguf	Q2_K	5.35GB
Llama-3-15B-Instruct-zeroed-ft.IQ3_XS.gguf	IQ3_XS	5.94GB
Llama-3-15B-Instruct-zeroed-ft.IQ3_S.gguf	IQ3_S	6.24GB
Llama-3-15B-Instruct-zeroed-ft.Q3_K_S.gguf	Q3_K_S	6.21GB
Llama-3-15B-Instruct-zeroed-ft.IQ3_M.gguf	IQ3_M	6.43GB
Llama-3-15B-Instruct-zeroed-ft.Q3_K.gguf	Q3_K	6.87GB
Llama-3-15B-Instruct-zeroed-ft.Q3_K_M.gguf	Q3_K_M	6.87GB
Llama-3-15B-Instruct-zeroed-ft.Q3_K_L.gguf	Q3_K_L	7.43GB
Llama-3-15B-Instruct-zeroed-ft.IQ4_XS.gguf	IQ4_XS	7.14GB
Llama-3-15B-Instruct-zeroed-ft.Q4_0.gguf	Q4_0	8.0GB
Llama-3-15B-Instruct-zeroed-ft.IQ4_NL.gguf	IQ4_NL	8.08GB
Llama-3-15B-Instruct-zeroed-ft.Q4_K_S.gguf	Q4_K_S	8.05GB
Llama-3-15B-Instruct-zeroed-ft.Q4_K.gguf	Q4_K	8.48GB
Llama-3-15B-Instruct-zeroed-ft.Q4_K_M.gguf	Q4_K_M	8.48GB
Llama-3-15B-Instruct-zeroed-ft.Q4_1.gguf	Q4_1	8.84GB
Llama-3-15B-Instruct-zeroed-ft.Q5_0.gguf	Q5_0	9.68GB
Llama-3-15B-Instruct-zeroed-ft.Q5_K_S.gguf	Q5_K_S	9.68GB
Llama-3-15B-Instruct-zeroed-ft.Q5_K.gguf	Q5_K	9.93GB
Llama-3-15B-Instruct-zeroed-ft.Q5_K_M.gguf	Q5_K_M	9.93GB
Llama-3-15B-Instruct-zeroed-ft.Q5_1.gguf	Q5_1	10.53GB
Llama-3-15B-Instruct-zeroed-ft.Q6_K.gguf	Q6_K	11.48GB
Llama-3-15B-Instruct-zeroed-ft.Q8_0.gguf	Q8_0	14.86GB

Original model description:

base_model: - elinas/Llama-3-15B-Instruct-zeroed library_name: transformers tags: - mergekit - merge datasets: - Chat-Error/Pure-dove-sharegpt license: llama3

Llama-3-15B-Instruct-zeroed-ft

This is a QLoRA finetune of a merge of pre-trained language models created using mergekit.

The model is based on a "zeroed" passthrough merge of Llama-3-15B-Instruct-zeroed

This was primarily an experiment to see how a passthrough merge will respond to further finetuning, though this was done on a small dataset.

The model was finetuned on 8192 context length and is likely reliable using RoPE up to 32k.

Further finetuning this model or finetuning the base model on more samples is encouraged.

Datasets

Chat-Error/Pure-dove-sharegpt

A small, high quality, dataset was used as a PoC / validation on stabilizing the model after finetuning.

Finetuning details

This is a QLoRA model and the following modules were targeted.

lora_target_modules:
  - down_proj
  - o_proj

The model is coherent even with training the "zeroed" layers and can write well. In the next experiment, all layers will be finetuned as this was the recommendation from Charles Goddard - thank you for sharing the method of merging as well as Toasty Pigeon for bringing it to my attention!

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- total_train_batch_size: 6
- total_eval_batch_size: 6
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 25
- num_epochs: 1

Optimizer paged_adamw_8bit and Deepspeed ZeRO 3 was used at a LR of 1e-5 using the cosine scheduler for 1 epoch on 3x3090s taking 2h 30m total.

Sample packing and padding was disabled to reduce VRAM consumption significantly at the cost of speed.

W&B Run Summary

wandb: Run summary:
wandb:                eval/loss 0.94497
wandb:             eval/runtime 276.2864
wandb:  eval/samples_per_second 1.397
wandb:    eval/steps_per_second 0.235
wandb:               total_flos 12246605365248.0
wandb:              train/epoch 1.0
wandb:        train/global_step 579
wandb:          train/grad_norm 0.80411
wandb:      train/learning_rate 0.0
wandb:               train/loss 1.085
wandb:               train_loss 0.8834
wandb:            train_runtime 9893.1688
wandb: train_samples_per_second 0.351
wandb:   train_steps_per_second 0.059

Framework versions

PEFT 0.10.0
Transformers 4.40.0.dev0
Pytorch 2.3.0+cu121
Datasets 2.15.0
Tokenizers 0.15.0

Model Evaluation

TBD

If you have any questions or comments on the model, feel free to open a discussion in the community tab.