weqweasdas
/

RM-Mistral-7B

Text Classification

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

weqweasdas commited on Mar 31

Commit

0656b31

•

1 Parent(s): 81b58a2

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -10,6 +10,9 @@ The reward model is trained from the base model [mistralai/Mistral-7B-Instruct-v
 The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
 ## Model Details
 If you have any question with this reward model and also any question about reward modeling, feel free to drop me an email with wx13@illinois.edu. I would be happy to chat!
@@ -39,8 +42,6 @@ We train the model for one epoch with a learning rate of 5e-6, batch size 512, c
 ## Uses
 ```python

 The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
+Also see a short blog for the training details (data mixture, parameters...): https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0
 ## Model Details
 If you have any question with this reward model and also any question about reward modeling, feel free to drop me an email with wx13@illinois.edu. I would be happy to chat!
 ## Uses
 ```python