--- base_model: teknium/OpenHermes-2.5-Mistral-7B license: apache-2.0 datasets: - teknium/openhermes - argilla/ultrafeedback-binarized-preferences - Intel/orca_dpo_pairs language: - en library_name: transformers pipeline_tag: text-generation --- # DPOpenHermes 7B ## OpenHermes x Notus x Neural This is an RL fine tuned [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) preference datasets for reinforcement learning using Direct Preference Optimization (DPO) DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo. # Training Details [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset. https://wandb.ai/oaaic/openhermes-dpo/reports/DPOpenHermes--Vmlldzo2MTQ3NDg2 # Benchmarks ## AGIEval ``` | Task |Version| Metric |Value | |Stderr| |------------------------------|------:|--------|-----:|---|-----:| |agieval_aqua_rat | 0|acc |0.2480|_ |0.0272| | | |acc_norm|0.2520|_ |0.0273| |agieval_logiqa_en | 0|acc |0.3810|_ |0.0190| | | |acc_norm|0.3856|_ |0.0191| |agieval_lsat_ar | 0|acc |0.2348|_ |0.0280| | | |acc_norm|0.2304|_ |0.0278| |agieval_lsat_lr | 0|acc |0.5118|_ |0.0222| | | |acc_norm|0.5196|_ |0.0221| |agieval_lsat_rc | 0|acc |0.5948|_ |0.0300| | | |acc_norm|0.5688|_ |0.0303| |agieval_sat_en | 0|acc |0.7427|_ |0.0305| | | |acc_norm|0.7427|_ |0.0305| |agieval_sat_en_without_passage| 0|acc |0.4563|_ |0.0348| | | |acc_norm|0.4515|_ |0.0348| |agieval_sat_math | 0|acc |0.3818|_ |0.0328| | | |acc_norm|0.3682|_ |0.0326| ``` Average: 0.4399 ## GPT4All ``` | Task |Version| Metric |Value | |Stderr| |-------------|------:|--------|-----:|---|-----:| |arc_challenge| 0|acc |0.5930|_ |0.0144| | | |acc_norm|0.6323|_ |0.0141| |arc_easy | 0|acc |0.8443|_ |0.0074| | | |acc_norm|0.8295|_ |0.0077| |boolq | 1|acc |0.8599|_ |0.0061| |hellaswag | 0|acc |0.6548|_ |0.0047| | | |acc_norm|0.8365|_ |0.0037| |openbookqa | 0|acc |0.3520|_ |0.0214| | | |acc_norm|0.4640|_ |0.0223| |piqa | 0|acc |0.8210|_ |0.0089| | | |acc_norm|0.8335|_ |0.0087| |winogrande | 0|acc |0.7466|_ |0.0122| ``` Average: 0.7431 ## TruthfulQA ``` hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16 | Task |Version|Metric|Value | |Stderr| |-------------|------:|------|-----:|---|-----:| |truthfulqa_mc| 1|mc1 |0.4186|_ |0.0173| | | |mc2 |0.5847|_ |0.0153| ```