sauc-abadal-lloret
commited on
Commit
•
015a166
1
Parent(s):
56a1d93
Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ In a nutshell, the Quark method consists on sampling new generations and scoring
|
|
21 |
|
22 |
For extensive coverage on Quark, please refer to their paper.
|
23 |
|
24 |
-
The reward model used for scoring the
|
25 |
```python
|
26 |
{'_QUANTILE_0_', '_QUANTILE_1_', '_QUANTILE_2_', '_QUANTILE_3_', '_QUANTILE_4_'}
|
27 |
```
|
|
|
21 |
|
22 |
For extensive coverage on Quark, please refer to their paper.
|
23 |
|
24 |
+
The reward model used for scoring the generations can be found in [here](https://huggingface.co/CarperAI/openai_summarize_tldr_rm_checkpoint). We used K = 5 quantile tokens, which were newly added to the tokenizer:
|
25 |
```python
|
26 |
{'_QUANTILE_0_', '_QUANTILE_1_', '_QUANTILE_2_', '_QUANTILE_3_', '_QUANTILE_4_'}
|
27 |
```
|