Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
RyanYr
/
reward-judge_iter-dpo-genRM_pilot-exp_iter2
like
0
Safetensors
llama
trl
dpo
Generated from Trainer
License:
llama3.1
Model card
Files
Files and versions
Community
Train
10c7e61
reward-judge_iter-dpo-genRM_pilot-exp_iter2
Commit History
Training in progress, step 160, checkpoint
10c7e61
verified
RyanYr
commited on
Sep 14
Training in progress, step 160
c206f49
verified
RyanYr
commited on
Sep 14
Training in progress, step 150, checkpoint
6ee7ede
verified
RyanYr
commited on
Sep 14
Training in progress, step 150
a198403
verified
RyanYr
commited on
Sep 14
initial commit
23e7a99
verified
RyanYr
commited on
Sep 13