How is reward calculation done during inference in this model?

#17
by arunasank - opened

This model seems to be trained using sDPO instead of DPO. How is reward calculation done in this model during inference, for an assistant response to a question?

Sign up or log in to comment