HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Paper ā¢ 2402.04249 ā¢ Published Feb 6 ā¢ 4
Measuring Coding Challenge Competence With APPS Paper ā¢ 2105.09938 ā¢ Published May 20, 2021 ā¢ 1
Representation Engineering: A Top-Down Approach to AI Transparency Paper ā¢ 2310.01405 ā¢ Published Oct 2, 2023 ā¢ 5
Forecasting Future World Events with Neural Networks Paper ā¢ 2206.15474 ā¢ Published Jun 30, 2022 ā¢ 1
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Paper ā¢ 2402.04249 ā¢ Published Feb 6 ā¢ 4
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty Paper ā¢ 1906.12340 ā¢ Published Jun 28, 2019