Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
XueyingJia
/
pythia-1b-deduped-hh-online-dpo
like
0
Text Generation
Transformers
Safetensors
XueyingJia/online_dpo_repo
gpt_neox
Generated from Trainer
trl
online-dpo
conversational
text-generation-inference
Inference Endpoints
arxiv:
2402.04792
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
pythia-1b-deduped-hh-online-dpo
/
adapter_model.safetensors
Commit History
Training in progress, step 15075
847c56d
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 13572
7bdead4
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 12064
e1a8902
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 10556
37a1550
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 9048
de587e3
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 7540
4efc623
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 6032
03ced79
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 4524
1745fea
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 3016
a607cfa
verified
XueyingJia
commited on
Nov 24, 2024
Training in progress, step 1508
9f68452
verified
XueyingJia
commited on
Nov 24, 2024