metadata
library_name: peft
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- argilla/ultrafeedback-binarized-preferences-cleaned
- >-
flax-sentence-embeddings/stackexchange_titlebody_best_and_down_voted_answer_jsonl
language:
- en
Model Card for Phi-3-mini-4k-instruct DPO
Model Details
- Model Name: Phi-3-mini-4k-instruct DPO
- Publisher: Team chatterbox, EPFL
- Model Type: Language Model, Fine-tuned with direct preference optimization (DPO)
- Training Environment: Trained on the EPFL SCITAS cluster using a 32GB GPU.
Intended Use
- Primary Applications: This model is designed as part of an AI-Tutor system.
- Intended Audience: Educators, students, and developers creating educational AI applications.
Model/Data Description
Training Data
- Datasets Used:
- Milestone 1 Dataset: Includes 1522 unique questions with preference pairs based on the 'overall' rating, totaling 20k+ usable entries after processing.
- Stack Exchange Dataset: Filters content from specific domains within the Stack Exchange network, using upvoted and downvoted answers to form preference pairs. Total entries after preprocessing: 54458.
- Ultra Feedback: Utilizes responses rated on criteria like truthfulness and helpfulness to form preference pairs, with a total of 60917 entries after preprocessing.
- Preprocessing Details: Entries with identical chosen and rejected answers were removed. Datasets were formatted as JSONL where each line represents a JSON object with a "prompt", "chosen", and "rejected" response.
Training Procedure
- Configurations: (Refer to the provided
training_args
andtrainer
configuration) - Evaluation Metrics: The primary metric for model performance is
eval_loss
, with the aim to minimize this value.
Evaluation Results
- Accuracies: eval/rewards/accuracies - 0.83
- Loss: eval/loss - 0.47
- Margins: eval/margins - 4.31
MT-Bench
References
[Include references and citations for datasets, tools, and methodologies used.]
PEFT 0.11.1