Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Paper • 2402.05162 • Published Feb 7, 2024 • 1
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models Paper • 2406.16135 • Published Jun 23, 2024
Fantastic Copyrighted Beasts and How (Not) to Generate Them Paper • 2406.14526 • Published Jun 20, 2024 • 1
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors Paper • 2406.14598 • Published Jun 20, 2024
Evaluating Copyright Takedown Methods for Language Models Paper • 2406.18664 • Published Jun 26, 2024 • 1
MUSE: Machine Unlearning Six-Way Evaluation for Language Models Paper • 2407.06460 • Published Jul 8, 2024
On Memorization of Large Language Models in Logical Reasoning Paper • 2410.23123 • Published Oct 30, 2024 • 18
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation Paper • 2310.06987 • Published Oct 10, 2023
Differentially Private Synthetic Data via Foundation Model APIs 2: Text Paper • 2403.01749 • Published Mar 4, 2024
Effective and Efficient Federated Tree Learning on Hybrid Data Paper • 2310.11865 • Published Oct 18, 2023
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression Paper • 2403.15447 • Published Mar 18, 2024 • 16
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Paper • 2306.11698 • Published Jun 20, 2023 • 12