2 1 6

Zhaolin Gao

GitBag

https://zhaolingao.github.io/

AI & ML interests

Reinforcement Learning from Human Feedback

Recent Activity

updated a model 1 day ago

GitBag/lr5e-05-numina-cot-global_step_140

published a model 1 day ago

GitBag/lr5e-05-numina-cot-global_step_140

updated a dataset 5 days ago

GitBag/math_size_7_2048_eval

View all activity

Organizations

Articles 1

Article

RLHF 101: A Technical Dive into RLHF

Collections 1

Papers 3

arxiv:2410.04612

arxiv:2404.16767

arxiv:2402.10886

models 305

GitBag/qwen2.5-1.5b-math-sft-bs-256-lr-1e-4-regress_prob-20-zl-no-bpt-global_step_160

Text Generation • Updated 13 days ago • 356

GitBag/Qwen2.5-1.5B-Open-R1-GRPO

Text Generation • Updated Feb 4 • 2

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e4_lr_3e-7_1738016708

Text Generation • Updated Jan 28

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e2_lr_3e-7_1737991767

Text Generation • Updated Jan 28

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e3_lr_3e-7_1738004267

Text Generation • Updated Jan 28

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_st_eta_1e4_lr_3e-7_1737941473

Text Generation • Updated Jan 27 • 1

datasets 368

GitBag/math_size_7_2048_eval

Viewer • Updated 5 days ago • 7.5k • 69

GitBag/math_size_7_2048

Viewer • Updated 5 days ago • 7.5k • 62

GitBag/math_size_7_eval

Viewer • Updated 6 days ago • 7.5k • 82

GitBag/math_size_7

Viewer • Updated 6 days ago • 7.5k • 53

GitBag/1745035879

Viewer • Updated 9 days ago • 7.07k • 86

GitBag/1745035905

Viewer • Updated 9 days ago • 7.06k • 88

GitBag/1744914561

Viewer • Updated 10 days ago • 7.1k • 92

GitBag/1744887818

Viewer • Updated 10 days ago • 7.1k • 85

GitBag/1744876255

Viewer • Updated 11 days ago • 7.1k • 67

GitBag/1744876515

Viewer • Updated 11 days ago • 7.1k • 33

Zhaolin Gao

AI & ML interests

Recent Activity

Organizations

Articles 1

RLHF 101: A Technical Dive into RLHF

Collections 1

GitBag/gemma-2-9b-it-gsm8k

GitBag/llama-3_1-70b-it-gsm8k

GitBag/gemma-2-27b-it-gsm8k

GitBag/llama-3-8b-it-gsm8k

Papers 3

models 305

GitBag/lr5e-05-numina-cot-global_step_140

GitBag/lr1e-05-global_step_140

GitBag/lr5e-05-random-latent-with-latent-predictions-global_step_100

GitBag/coco-math-ppo-math-zl-noise-1.0constant1.0-topp-1-hf_actor

GitBag/qwen2.5-1.5b-math-sft-bs-256-lr-1e-4-regress_prob-20-zl-no-bpt-global_step_160

GitBag/Qwen2.5-1.5B-Open-R1-GRPO

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e4_lr_3e-7_1738016708

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e2_lr_3e-7_1737991767

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_rfst_eta_1e3_lr_3e-7_1738004267

GitBag/reasoning_rebel_uf_dp_1k3k_from1735956551_st_eta_1e4_lr_3e-7_1737941473

datasets 368

GitBag/math_size_7_2048_eval

GitBag/math_size_7_2048

GitBag/math_size_7_eval

GitBag/math_size_7

GitBag/1745035879

GitBag/1745035905

GitBag/1744914561

GitBag/1744887818

GitBag/1744876255

GitBag/1744876515

Zhaolin Gao

AI & ML interests

Recent Activity

Organizations

Articles 1

RLHF 101: A Technical Dive into RLHF

Collections 1

Papers 3

models 305 Sort: Recently updated

datasets 368 Sort: Recently updated

models 305

datasets 368