This is a model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.
The project is running to make a small LLM for a on-device purpose.
Overall pipeline for this iteration is
1.Merging to make a base model (7B) 2.Prune the model to reduce the parameter (50% sparcity) 3.For recovery phase of the pruning, the DPO is chosen.
This model which is not pruned is intended to compare with the pruned model.
This is the code and parameters I chose for this model(DPO).
from transformers import TrainingArguments, AutoModelForCausalLM
from trl import DPOTrainer
dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
args = TrainingArguments(
per_device_train_batch_size = 8,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
learning_rate = 5e-6,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.0,
lr_scheduler_type = "linear",
seed = 42,
output_dir = "output_DPO",
),
beta = 0.1,
train_dataset = dataset,
# eval_dataset = raw_datasets["test"],
tokenizer = tokenizer,
max_length = 1024,
max_prompt_length = 512,
)
The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
Benchmark Scores
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 0 | acc | 0.6894 | ± | 0.0135 |
none | 0 | acc_norm | 0.6860 | ± | 0.0136 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
hellaswag | 1 | none | 0 | acc | 0.7092 | ± | 0.0045 |
none | 0 | acc_norm | 0.8736 | ± | 0.0033 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
truthfulqa_mc2 | 2 | none | 0 | acc | 0.7126 | ± | 0.015 |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
mmlu | N/A | none | 0 | acc | 0.6225 | ± | 0.1292 |
- humanities | N/A | none | 0 | acc | 0.5745 | ± | 0.1286 |
- other | N/A | none | 0 | acc | 0.6952 | ± | 0.1095 |
- social_sciences | N/A | none | 0 | acc | 0.7280 | ± | 0.0735 |
- stem | N/A | none | 0 | acc | 0.5195 | ± | 0.1313 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
winogrande | 1 | none | 0 | acc | 0.824 | ± | 0.0107 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
gsm8k | 2 | get-answer | 5 | exact_match | 0.7263 | ± | 0.0123 |
Average = 74.08
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for hflog/alnrg2arg-test3_sft_16bit_dpo2
Base model
alnrg2arg/blockchainlabs_7B_merged_test2_4