chargoddard
/

servile-harpsichord-cdpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

servile-harpsichord-cdpo / README.md

chargoddard's picture

Update README.md

fee98bc 12 months ago

|

history blame contribute delete

547 Bytes

	---
	license: cc-by-nc-4.0
	datasets:
	- pankajmathur/orca_mini_v1_dataset
	- openai/summarize_from_feedback
	- PygmalionAI/PIPPA
	- chargoddard/rpguild
	- lemonilia/LimaRP
	- PKU-Alignment/PKU-SafeRLHF
	- Intel/orca_dpo_pairs
	- allenai/ultrafeedback_binarized_cleaned
	---

	Trained on a different random sampling of the same datasets used by [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7), then with cDPO on a blend of RLHF datasets.

	Several intermediate checkpoints (of cDPO training) are on branches.

	Uses the Alpaca prompt format.