decruz07's picture
Update README.md
4f9c52a verified
metadata
base_model: teknium/OpenHermes-2.5-Mistral-7B
license: apache-2.0
datasets:
  - Intel/orca_dpo_pairs

Model Card for decruz07/kellemar-Orca-DPO-7B

This model was created using OpenHermes-2.5 as the base, and finetuned with intel/orca_dpo_pairs

Model Details

Finetuned with these specific parameters: Steps: 200 Learning Rate: 5e5 Beta: 0.1

Model Description

  • Developed by: @decruz
  • Funded by [optional]: my full-time job
  • Finetuned from model [optional]: teknium/OpenHermes-2.5-Mistral-7B

Benchmarks

OpenLLM

Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
68.32 65.78 85.04 63.24 55.54 78.69 61.64

Nous

AGIEval GPT4All TruthfulQA Bigbench Average
43.35 73.43 54.02 42.24 53.26

Uses

You can use this for basic inference. You could probably finetune with this if you want to.

How to Get Started with the Model

You can create a space out of this, or use basic python code to call the model directly and make inferences to it.

[More Information Needed]

Training Details

The following was used: `training_args = TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=4, gradient_checkpointing=True, learning_rate=5e-5, lr_scheduler_type="cosine", max_steps=200, save_strategy="no", logging_steps=1, output_dir=new_model, optim="paged_adamw_32bit", warmup_steps=100, bf16=True, report_to="wandb", )

Create DPO trainer

dpo_trainer = DPOTrainer( model, ref_model, args=training_args, train_dataset=dataset, tokenizer=tokenizer, peft_config=peft_config, beta=0.1, max_prompt_length=1024, max_length=1536, )`

Training Data

This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs

Training Procedure

Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO.

Model Card Authors [optional]

@decruz

Model Card Contact

@decruz on X/Twitter