sauc-abadal-lloret
commited on
Commit
•
36b0b86
1
Parent(s):
015a166
Update README.md
Browse files
README.md
CHANGED
@@ -23,9 +23,9 @@ For extensive coverage on Quark, please refer to their paper.
|
|
23 |
|
24 |
The reward model used for scoring the generations can be found in [here](https://huggingface.co/CarperAI/openai_summarize_tldr_rm_checkpoint). We used K = 5 quantile tokens, which were newly added to the tokenizer:
|
25 |
```python
|
26 |
-
{'
|
27 |
```
|
28 |
-
Thus, at inference time, the expected aligned behavior can be attained by conditioning the input on
|
29 |
|
30 |
**Related Models:** [ALT-RM](https://huggingface.co/sauc-abadal-lloret/gpt-j-6b-ALT-RM-tldr).
|
31 |
|
|
|
23 |
|
24 |
The reward model used for scoring the generations can be found in [here](https://huggingface.co/CarperAI/openai_summarize_tldr_rm_checkpoint). We used K = 5 quantile tokens, which were newly added to the tokenizer:
|
25 |
```python
|
26 |
+
{'_QUANTILE_TOKEN_0_', '_QUANTILE_TOKEN_1_', '_QUANTILE_TOKEN_2_', '_QUANTILE_TOKEN_3_', '_QUANTILE_TOKEN_4_'}
|
27 |
```
|
28 |
+
Thus, at inference time, the expected aligned behavior can be attained by conditioning the input on `_QUANTILE_TOKEN_0_`.
|
29 |
|
30 |
**Related Models:** [ALT-RM](https://huggingface.co/sauc-abadal-lloret/gpt-j-6b-ALT-RM-tldr).
|
31 |
|