Ejafa
/

phi-3-mini-128k-instruct-dpo-lr-5e-07

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Ejafa commited on Jun 25

Commit

f0d989e

•

1 Parent(s): 056e214

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -19,6 +19,14 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # phi-3-mini-128k-instruct-dpo-lr-5e-07
 This model is a fine-tuned version of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) on the princeton-nlp/llama3-ultrafeedback dataset.

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+## Description
+This model was trained as part of the Reinforcement Learning - 24 project at Peking University, focusing on [dpo].
+## Authors
+- Ejafa Bassam
+- Yaroslav Ponomarenko
 # phi-3-mini-128k-instruct-dpo-lr-5e-07
 This model is a fine-tuned version of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) on the princeton-nlp/llama3-ultrafeedback dataset.