oopere
/

martra-phi-3-mini-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

martra-phi-3-mini-dpo / README.md

oopere's picture

added dataset

c8ab648 verified 3 months ago

|

No virus

1.55 kB

	---
	license: mit
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- dpo
	- phi-3
	datasets:
	- argilla/distilabel-capybara-dpo-7k-binarized
	---

	This model has been created as a support for the book "Large Language Models Projects" by Apress, written by Pere Martra.

	To create it, we started with the Phi-3-Mini-4K-Instruct model and applied DPO alignment using the distilabel-capybara-dpo-7k-binarized dataset.

	Phi-3 is a state-of-the-art model with 3.8 billion parameters that has outperformed other models with 7 billion parameters. The DPO alignment process has produced good results, modifying the model's responses and making them more similar to those in the capybara dataset.

	Row in the Dataset:

	*{'prompt': '<\|user\|>\nAssist me in calculating 9319357631 plus 595. Numbers and symbols only, please.<\|end\|>\n<\|assistant\|>\n',
	'chosen': 'The sum of 9319357631 and 595 is 9319358226.<\|end\|>\n',
	'rejected': 'The result of adding 9319357631 and 595 is 9319363626.<\|end\|>\n'}*

	Prompt:
	*3713841893836/4?
	Limit your response to mathematical expressions and symbols.*

	Response from the Base model:

	*To find the result of the division, we can simply divide the given number by 4:
	$$
	\frac{3713841893836}{4} = 928460473459*

	Response from the fine-tuned model:

	3713841893836 ÷ 4 = 928460473459


	If you want to see how the model was created, you can check out the [repository](https://github.com/peremartra/Large-Language-Model-Notebooks-Course) where the book's notebooks are kept up-to-date.