Code to create this reward model?

#3
by RSchaefferAtGoogle - opened

Is the code to create/train this reward model publicly available somewhere?

I couldn't find it in this GitHub repo (https://github.com/thunlp/UltraChat) but maybe I was looking in the wrong place?

Sign up or log in to comment