svannie678/red_team_repo_social_bias_dataset_information Viewer • Updated Sep 29, 2024 • 153 • 44
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Paper • 2308.03825 • Published Aug 7, 2023 • 2
humane-intelligence/defcon34-ai-village-redteam Viewer • Updated Apr 9, 2024 • 17.3k • 55 • 3
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 158
Awesome RLHF Collection A curated collection of datasets, models, Spaces, and papers on Reinforcement Learning from Human Feedback (RLHF). • 11 items • Updated Oct 2, 2023 • 7