lxuechen
/

phi-2-dpo

Text Generation

Inference Endpoints

Model card Files Files and versions Community

lxuechen commited on Dec 27, 2023

Commit

912b5d5

·

1 Parent(s): 9fbb65d

Create README.md

Files changed (1) hide show

README.md +32 -0

README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+license: other
+license_name: microsoft-research-license
+license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- nlp
+- code
+model-index:
+  - name: phi-2-dpo
+    results:
+      - task:
+          type: text-generation
+        dataset:
+          name: AlpacaEval
+          type: AlpacaEval
+        metrics:
+          - name: AlpacaEval
+            type: AlpacaEval
+            value: 81.37%
+        source:
+          name: AlpacaEval
+          url: https://github.com/tatsu-lab/alpaca_eval
+---
+## Model Summary
+`phi-2-dpo` is an instruction-tuned model from an earlier SFT model [`phi-2-sft`](https://huggingface.co/lxuechen/phi-2-sft). Direct preference optimization (DPO) is used for fine-tuning on the [UltraFeedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
+The purpose of the experiment is to understand the quality of the pre-trained Phi-2 model. The good news is that `phi-2-dpo` can follow open-ended user instructions well.