Edit model card

Model Card for decruz07/kellemar-DPO-Orca-Distilled-7B

This model was created using mlabonne/Marcoro14-7B-slerp as the base, and finetuned with argilla/distilabel-intel-orca-dpo-pairs

Model Details

Finetuned with these specific parameters: Steps: 200 Learning Rate: 5e5 Beta: 0.1

Model Description

  • Developed by: @decruz
  • Funded by [optional]: my full-time job
  • Finetuned from model [optional]: mlabonne/Marcoro14-7B-slerp

Benchmarks

Top 5 in OpenLLM Benchmarks as of 2024/01/17

OpenLLM

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
kellemar-DPO-Orca-Distilled-7B-SLERP 73.71 70.48 87.56 65.33 64.97 81.93 72.02

Nous

Model AGIEval GPT4All TruthfulQA Bigbench Average
kellemar-DPO-Orca-Distilled-7B-SLERP 45.27 76.42 65.48 47.21 58.6
Marcoro14-7B-slerp 44.66 76.24 64.15 45.64 57.67
kellemar-DPO-Orca-Distilled-7B 43.61 73.14 55.73 42.28 53.69
kellemar-Orca-DPO-7B 43.35 73.43 54.02 42.24 53.26
OpenHermes-2.5-Mistral-7B 43.07 73.12 53.04 40.96 52.38

Uses

You can use this for basic inference. You could probably finetune with this if you want to.

How to Get Started with the Model

You can create a space out of this, or use basic python code to call the model directly and make inferences to it.

[More Information Needed]

Training Details

The following was used: `training_args = TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=4, gradient_checkpointing=True, learning_rate=5e-5, lr_scheduler_type="cosine", max_steps=200, save_strategy="no", logging_steps=1, output_dir=new_model, optim="paged_adamw_32bit", warmup_steps=100, bf16=True, report_to="wandb", )

Create DPO trainer

dpo_trainer = DPOTrainer( model, ref_model, args=training_args, train_dataset=dataset, tokenizer=tokenizer, peft_config=peft_config, beta=0.1, max_prompt_length=1024, max_length=1536, )`

Training Data

This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs

Training Procedure

Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO.

Model Card Authors [optional]

@decruz

Model Card Contact

@decruz on X/Twitter

Downloads last month
22
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for decruz07/kellemar-DPO-Orca-Distilled-7B-SLERP

Finetuned
(16)
this model
Merges
7 models
Quantizations
2 models

Dataset used to train decruz07/kellemar-DPO-Orca-Distilled-7B-SLERP