Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ tags: []
|
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
**PPO-C** (PPO with Calibrated Reward
|
10 |
PPO-C adjusts standard reward model scores during PPO training. It maintains a running average of past reward scores as a dynamic threshold to
|
11 |
classify responses, and adjusts the reward scores based on model expressed verbalized confidence.
|
12 |
Please refer to our preprint ([Taming Overconfidence in LLMs: Reward Calibration in RLHF](https://arxiv.org/abs/2410.09724)) and [repo](https://github.com/SeanLeng1/Reward-Calibration) for more details.
|
|
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
+
**PPO-C** (PPO with Calibrated Reward Calculation) is an RLHF algorithm to mitigate verbalized overconfidence in RLHF-trained Large Language Models.
|
10 |
PPO-C adjusts standard reward model scores during PPO training. It maintains a running average of past reward scores as a dynamic threshold to
|
11 |
classify responses, and adjusts the reward scores based on model expressed verbalized confidence.
|
12 |
Please refer to our preprint ([Taming Overconfidence in LLMs: Reward Calibration in RLHF](https://arxiv.org/abs/2410.09724)) and [repo](https://github.com/SeanLeng1/Reward-Calibration) for more details.
|