---
language:
- en
license: cc-by-nc-4.0
tags:
- text-generation-inference
- transformers
- unsloth
- mistral
- trl
base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4_prune
datasets:
- Intel/orca_dpo_pairs
---

This is a model from blockchainlab test 2.4 which are merged - alnrg2arg/blockchainlabs_7B_merged_test2_4_prune

The project is running to make a small LLM for a on-device purpose.

Overall pipeline for this iteration is

1.Merging to make a base model (7B) 
2.Prune the model to reduce the parameter (50% sparcity) 
3.For recovery phase of the pruning, the DPO is chosen.

This model is a pruned.

This is the code and parameters I chose for this model(DPO).

```
from transformers import TrainingArguments, AutoModelForCausalLM
from trl import DPOTrainer

dpo_trainer = DPOTrainer(
    model = model,
   
    ref_model = None,
    args = TrainingArguments(
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 8,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 5e-6,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.0,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "output_DPO",
    ),
    beta = 0.1,
    train_dataset = dataset,
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = 1024,
    max_prompt_length = 512,
)
```
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing