Code to create this reward model?
#3
by
RSchaefferAtGoogle
- opened
Is the code to create/train this reward model publicly available somewhere?
I couldn't find it in this GitHub repo (https://github.com/thunlp/UltraChat) but maybe I was looking in the wrong place?