-
Attention Is All You Need
Paper • 1706.03762 • Published • 44 -
You Only Look Once: Unified, Real-Time Object Detection
Paper • 1506.02640 • Published • 1 -
HEp-2 Cell Image Classification with Deep Convolutional Neural Networks
Paper • 1504.02531 • Published -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 26
Collections
Discover the best community collections!
Collections including paper arxiv:2401.05566
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 26 -
On the Societal Impact of Open Foundation Models
Paper • 2403.07918 • Published • 16 -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 33 -
Instruction Tuning for Large Language Models: A Survey
Paper • 2308.10792 • Published • 1
-
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 16 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Paper • 2401.05033 • Published • 16 -
Towards Conversational Diagnostic AI
Paper • 2401.05654 • Published • 16
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 26 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 15 -
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Paper • 2401.17263 • Published • 1 -
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
Paper • 2311.06237 • Published • 1
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 26 -
Weak-to-Strong Jailbreaking on Large Language Models
Paper • 2401.17256 • Published • 15 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper • 2402.13220 • Published • 13 -
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Paper • 2404.13208 • Published • 38
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 48 -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 26 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 65 -
Zero Bubble Pipeline Parallelism
Paper • 2401.10241 • Published • 23