Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ tags:
|
|
21 |
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
-
**If you mention this
|
25 |
|
26 |
SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
27 |
It can be used for NLG evaluation or as a reward model for RLHF.
|
|
|
21 |
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
+
**If you mention this model, please cite the paper:** [Understanding Dataset Difficulty with V-Usable Information (ICML 2022)](https://proceedings.mlr.press/v162/ethayarajh22a.html).
|
25 |
|
26 |
SteamSHP-XL is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
27 |
It can be used for NLG evaluation or as a reward model for RLHF.
|