chargoddard
commited on
Commit
•
13cdf6b
1
Parent(s):
bea3712
Update README.md
Browse files
README.md
CHANGED
@@ -13,4 +13,6 @@ datasets:
|
|
13 |
|
14 |
Trained on a different random sampling of the same datasets used by [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7), then with cDPO on a blend of RLHF datasets.
|
15 |
|
|
|
|
|
16 |
Uses the Alpaca prompt format.
|
|
|
13 |
|
14 |
Trained on a different random sampling of the same datasets used by [loyal-piano-m7](https://huggingface.co/chargoddard/loyal-piano-m7), then with cDPO on a blend of RLHF datasets.
|
15 |
|
16 |
+
Several intermediate checkpoints (of cDPO training) are on branches.
|
17 |
+
|
18 |
Uses the Alpaca prompt format.
|