Edit model card

This is a model from blockchainlab test 2.4 which are merged - alnrg2arg/blockchainlabs_7B_merged_test2_4.

The project is running to make a small LLM for a on-device purpose.

Overall pipeline for this iteration is

1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.

This model which is not pruned is intended to compare with the pruned model.

This is the code and parameters I chose for this model(DPO).

from transformers import TrainingArguments, AutoModelForCausalLM
from trl import DPOTrainer

dpo_trainer = DPOTrainer(
    model = model,
    ref_model = None,
    args = TrainingArguments(
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 8,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 5e-6,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.0,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "output_DPO",
    beta = 0.1,
    train_dataset = dataset,
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = 1024,
    max_prompt_length = 512,

The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing

Benchmark scores

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 25 acc 0.6971 ± 0.0134
none 25 acc_norm 0.7142 ± 0.0132
Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 10 acc 0.7008 ± 0.0046
none 10 acc_norm 0.8726 ± 0.0033
Groups Version Filter n-shot Metric Value Stderr
mmlu N/A none 0 acc 0.6265 ± 0.1232
- humanities N/A none 5 acc 0.5864 ± 0.1135
- other N/A none 5 acc 0.6930 ± 0.1085
- social_sciences N/A none 5 acc 0.7270 ± 0.0820
- stem N/A none 5 acc 0.5230 ± 0.1264
Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none 5 acc 0.8414 ± 0.0103
Tasks Version Filter n-shot Metric Value Stderr
gsm8k 2 get-answer 5 exact_match 0.7263 ± 0.0123
Tasks Version Filter n-shot Metric Value Stderr
truthfulqa_mc2 2 none 0 acc 0.6794 ± 0.0153

Average : 74.34

Downloads last month
Model size
7.35B params
Tensor type
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Quantized from

Datasets used to train alnrg2arg/blockchainlabs_7B_merged_test2_4_sft_4bit_DPO_orca2_truthy