K-and-K (K and K)

yangsibo

authored 7 papers 2 months ago

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Paper • 2402.05162 • Published Feb 7, 2024 • 1

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Paper • 2406.16135 • Published Jun 23, 2024

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Paper • 2406.14526 • Published Jun 20, 2024 • 1

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Paper • 2406.14598 • Published Jun 20, 2024

alphapav

updated 2 datasets 2 months ago

K-and-K/perturbed-knights-and-knaves

Viewer • Updated Oct 31, 2024 • 41.2k • 44 • 3

K-and-K/knights-and-knaves

Viewer • Updated Oct 31, 2024 • 6.9k • 93 • 3

yangsibo

authored 2 papers 4 months ago

The Future of Open Human Feedback

Paper • 2408.16961 • Published Aug 15, 2024 • 21

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Paper • 2310.06987 • Published Oct 10, 2023

alphapav

authored 3 papers 10 months ago

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Paper • 2403.01749 • Published Mar 4, 2024

Effective and Efficient Federated Tree Learning on Hybrid Data

Paper • 2310.11865 • Published Oct 18, 2023

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Paper • 2403.15447 • Published Mar 18, 2024 • 16

alphapav

authored a paper over 1 year ago

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Paper • 2306.11698 • Published Jun 20, 2023 • 12

K and K

AI & ML interests

K-and-K's activity

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

Fantastic Copyrighted Beasts and How (Not) to Generate Them

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Evaluating Copyright Takedown Methods for Language Models

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

On Memorization of Large Language Models in Logical Reasoning

K-and-K/perturbed-knights-and-knaves

K-and-K/knights-and-knaves

The Future of Open Human Feedback

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Effective and Efficient Federated Tree Learning on Hybrid Data

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

AI & ML interests

Team members 2

K-and-K's activity