--- library_name: peft base_model: microsoft/Phi-3-mini-4k-instruct datasets: - argilla/ultrafeedback-binarized-preferences-cleaned - >- flax-sentence-embeddings/stackexchange_titlebody_best_and_down_voted_answer_jsonl language: - en --- # Model Card for Phi-3-mini-4k-instruct DPO ## Model Details - **Model Name:** Phi-3-mini-4k-instruct DPO - **Publisher:** Team chatterbox, EPFL - **Model Type:** Language Model, Fine-tuned with direct preference optimization (DPO) - **Training Environment:** Trained on the EPFL SCITAS cluster using a 32GB GPU. ## Intended Use - **Primary Applications:** This model is designed as part of an AI-Tutor system. - **Intended Audience:** Educators, students, and developers creating educational AI applications. ## Model/Data Description ### Training Data - **Datasets Used:** - **Milestone 1 Dataset:** Includes 1522 unique questions with preference pairs based on the 'overall' rating, totaling 20k+ usable entries after processing. - **Stack Exchange Dataset:** Filters content from specific domains within the Stack Exchange network, using upvoted and downvoted answers to form preference pairs. Total entries after preprocessing: 54458. - **Ultra Feedback:** Utilizes responses rated on criteria like truthfulness and helpfulness to form preference pairs, with a total of 60917 entries after preprocessing. - **Preprocessing Details:** Entries with identical chosen and rejected answers were removed. Datasets were formatted as JSONL where each line represents a JSON object with a "prompt", "chosen", and "rejected" response. ## Training Procedure - **Configurations:** (Refer to the provided `training_args` and `trainer` configuration) - **Evaluation Metrics:** The primary metric for model performance is `eval_loss`, with the aim to minimize this value. ## Evaluation Results - **Accuracies:** eval/rewards/accuracies - 0.83 - **Loss:** eval/loss - 0.47 - **Margins:** eval/margins - 4.31 ### MT-Bench - **Single Grading Score, Overall Avg.** - 8.2 - **STEM Score** - 9.8 (higher than GPT-4) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633206606eae0bb0a01c8a82/ay1QSp2hkicRTY4fcnAPX.png) ## References - **[Include references and citations for datasets, tools, and methodologies used.]** - PEFT 0.11.1