|
--- |
|
base_model: teknium/OpenHermes-2.5-Mistral-7B |
|
license: apache-2.0 |
|
datasets: |
|
- Intel/orca_dpo_pairs |
|
--- |
|
|
|
# Model Card for decruz07/kellemar-Orca-DPO-7B |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model was created using OpenHermes-2.5 as the base, and finetuned with intel/orca_dpo_pairs |
|
|
|
|
|
## Model Details |
|
|
|
Finetuned with these specific parameters: |
|
Steps: 200 |
|
Learning Rate: 5e5 |
|
Beta: 0.1 |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** @decruz |
|
- **Funded by [optional]:** my full-time job |
|
- **Finetuned from model [optional]:** teknium/OpenHermes-2.5-Mistral-7B |
|
|
|
## Benchmarks |
|
|
|
**OpenLLM** |
|
| Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | |
|
|---|---|---|---|---|---|---| |
|
| 68.32 | 65.78 | 85.04 | 63.24 | 55.54 | 78.69 | 61.64 | |
|
|
|
**Nous** |
|
| AGIEval | GPT4All | TruthfulQA | Bigbench | Average | |
|
|---|---|---|---|---| |
|
| 43.35 | 73.43 | 54.02 | 42.24 |53.26 | |
|
|
|
## Uses |
|
|
|
You can use this for basic inference. You could probably finetune with this if you want to. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
You can create a space out of this, or use basic python code to call the model directly and make inferences to it. |
|
|
|
[More Information Needed] |
|
|
|
## Training Details |
|
|
|
The following was used: |
|
`training_args = TrainingArguments( |
|
per_device_train_batch_size=4, |
|
gradient_accumulation_steps=4, |
|
gradient_checkpointing=True, |
|
learning_rate=5e-5, |
|
lr_scheduler_type="cosine", |
|
max_steps=200, |
|
save_strategy="no", |
|
logging_steps=1, |
|
output_dir=new_model, |
|
optim="paged_adamw_32bit", |
|
warmup_steps=100, |
|
bf16=True, |
|
report_to="wandb", |
|
) |
|
|
|
# Create DPO trainer |
|
dpo_trainer = DPOTrainer( |
|
model, |
|
ref_model, |
|
args=training_args, |
|
train_dataset=dataset, |
|
tokenizer=tokenizer, |
|
peft_config=peft_config, |
|
beta=0.1, |
|
max_prompt_length=1024, |
|
max_length=1536, |
|
)` |
|
|
|
### Training Data |
|
|
|
This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs |
|
|
|
### Training Procedure |
|
|
|
Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO. |
|
|
|
## Model Card Authors [optional] |
|
|
|
@decruz |
|
|
|
## Model Card Contact |
|
|
|
@decruz on X/Twitter |