Text Generation
Transformers
Safetensors
mistral
Inference Endpoints
text-generation-inference
Edit model card

Trained on a different random sampling of the same datasets used by loyal-piano-m7, then with cDPO on a blend of RLHF datasets.

Several intermediate checkpoints (of cDPO training) are on branches.

Uses the Alpaca prompt format.

Downloads last month
1,525
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Datasets used to train chargoddard/servile-harpsichord-cdpo