Model Card for decruz07/kellemar-DPO-7B-v1.01

This model was created using OpenHermes-2.5 as the base, and finetuned with argilla/distilabel-intel-orca-dpo-pairs.

Model Details

Finetuned with these specific parameters: Steps: 200 Learning Rate: 5e5 Beta: 0.1

Model Description

Developed by: @decruz
Funded by [optional]: my full-time job
Finetuned from model [optional]: teknium/OpenHermes-2.5-Mistral-7B

Benchmarks

OpenLLM

Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
68.32	65.78	85.04	63.24	55.54	78.69	61.64

Nous

AGIEval	GPT4All	TruthfulQA	Bigbench	Average
43.17	73.25	55.87	42.2	53.62

Uses

You can use this for basic inference. You could probably finetune with this if you want to.

How to Get Started with the Model

You can create a space out of this, or use basic python code to call the model directly and make inferences to it.

[More Information Needed]

Training Details

The following was used: `training_args = TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=4, gradient_checkpointing=True, learning_rate=5e-5, lr_scheduler_type="cosine", max_steps=200, save_strategy="no", logging_steps=1, output_dir=new_model, optim="paged_adamw_32bit", warmup_steps=100, bf16=True, report_to="wandb", )

Create DPO trainer

dpo_trainer = DPOTrainer( model, ref_model, args=training_args, train_dataset=dataset, tokenizer=tokenizer, peft_config=peft_config, beta=0.1, max_prompt_length=1024, max_length=1536, )`

Training Data

This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs

Training Procedure

Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO.

Model Card Authors [optional]

@decruz

Model Card Contact

@decruz on X/Twitter

decruz07
/

kellemar-DPO-7B-v1.01