Neural Krishna DPO
Fine-tuning + lnegth(choose)
- Training Args:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
load_in_4bit=True
)
model.config.use_cache = False
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=120,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=50,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=1024,
max_length=1536,
)
# Fine-tune model with DPO
dpo_trainer.train()
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 76.00 |
AI2 Reasoning Challenge (25-Shot) | 74.06 |
HellaSwag (10-Shot) | 88.97 |
MMLU (5-Shot) | 64.41 |
TruthfulQA (0-shot) | 76.19 |
Winogrande (5-shot) | 84.29 |
GSM8k (5-shot) | 68.08 |
- Downloads last month
- 163
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for Kukedlc/NeuralKrishna-7B-V2-DPO
Spaces using Kukedlc/NeuralKrishna-7B-V2-DPO 6
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard74.060
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard88.970
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard64.410
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard76.190
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard84.290
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard68.080