Token Classification
Transformers
Safetensors
English
llama
text-generation-inference
Inference Endpoints
hamishivi commited on
Commit
9d2c0f5
1 Parent(s): 787329d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -22,9 +22,10 @@ This is a **value** model produced during the PPO training of [this](https://hug
22
  It was initialised from the [Tulu v2.5 13B UltraFeedback RM](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm).
23
  We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
24
 
 
25
 
26
  For more details, read the paper:
27
- [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://link.todo).
28
 
29
 
30
  ## .Model description
@@ -78,6 +79,7 @@ If you find Tulu 2.5 is useful in your work, please cite it with:
78
  title={{Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback}},
79
  author={{Hamish Ivison and Yizhong Wang and Jiacheng Liu and Ellen Wu and Valentina Pyatkin and Nathan Lambert and Yejin Choi and Noah A. Smith and Hannaneh Hajishirzi}}
80
  year={2024},
 
81
  archivePrefix={arXiv},
82
  primaryClass={cs.CL}
83
  }
 
22
  It was initialised from the [Tulu v2.5 13B UltraFeedback RM](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm).
23
  We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
24
 
25
+ At time of writing, you may have to [install transformers from source](https://huggingface.co/docs/transformers/en/installation#install-from-source) to get the `LlamaForTokenClassification` class.
26
 
27
  For more details, read the paper:
28
+ [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
29
 
30
 
31
  ## .Model description
 
79
  title={{Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback}},
80
  author={{Hamish Ivison and Yizhong Wang and Jiacheng Liu and Ellen Wu and Valentina Pyatkin and Nathan Lambert and Yejin Choi and Noah A. Smith and Hannaneh Hajishirzi}}
81
  year={2024},
82
+ eprint={2406.09279},
83
  archivePrefix={arXiv},
84
  primaryClass={cs.CL}
85
  }