RL - a RichardForests Collection

RichardForests 's Collections

Language Models

CV

RL

Diffusion models

3D/4D Gaussian Splatting

Mamba

NeRF

Transformers & MoE

(3D) Foundation Models

SSL

DL & Software DStructures

Dora

Flash Attention in Triton

Lora variations

Parameter Efficient - LLMs

Robotics - Cross Attention

DMs - Lighting Conditions

RL

updated May 2, 2024

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 26
Nash Learning from Human Feedback

Paper • 2312.00886 • Published Dec 1, 2023 • 14
Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11, 2024 • 26
MusicRL: Aligning Music Generation to Human Preferences

Paper • 2402.04229 • Published Feb 6, 2024 • 16
OpenAssistant/reward-model-deberta-v3-large-v2

Text Classification • Updated Feb 1, 2023 • 19.4k • 212
Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30, 2024 • 47