--- license: other license_name: llama-3 license_link: https://llama.meta.com/llama3/license/ tags: - llama-3 - llama - '3' - 5B --- This is just an experiment similar to that done on [chargoddard/llama3-42b-v0](https://huggingface.co/chargoddard/llama3-42b-v0). The post-pruning was fine-tuned or "healed" with QLoRA using the code DPO dataset [AlekseyKorshuk/evol-codealpaca-v1-dpo](https://huggingface.co/datasets/AlekseyKorshuk/evol-codealpaca-v1-dpo). Due to limitations, this was only trained on 3150/4935 (~64%) steps of the data. I had to restart the training about halfway through, so the logs are split in two. I am still unsure if the tokenizer is correct. Loss: ~1.2

mergekit.yaml ``` slices: - sources: - model: ./Meta-Llama-3-8B-Instruct/ layer_range: [0,15] - sources: - model: ./Meta-Llama-3-8B-Instruct/ layer_range: [29,32] merge_method: passthrough dtype: bfloat16 ``` ORPOConfig ``` learning_rate=5e-5, lr_scheduler_type="cosine", max_length=1024, max_prompt_length=512, overwrite_output_dir=False, beta=0.1, per_device_train_batch_size=2, per_device_eval_batch_size=2, gradient_accumulation_steps=4, optim="paged_adamw_8bit", num_train_epochs=1, evaluation_strategy="steps", eval_steps=0.02, logging_steps=1, warmup_steps=50, report_to="wandb", output_dir=out_dir_folder, fp16=True, save_steps=50 ```