Tinghao Xie's picture

1 11

Tinghao Xie

vtu81

·

https://tinghaoxie.com

AI & ML interests

None yet

Recent Activity

updated a Space 13 days ago

sorry-bench/README

updated a dataset 14 days ago

sorry-bench/sorry-bench-human-judgment-202503

updated a collection 14 days ago

SORRY-Bench (2025/03)

View all activity

Organizations

vtu81's activity

updated a Space 13 days ago

README

updated a dataset 14 days ago

sorry-bench/sorry-bench-human-judgment-202503

Viewer • Updated 14 days ago • 7.04k • 44

updated a collection 14 days ago

SORRY-Bench (2025/03)

In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated 14 days ago

published a dataset 14 days ago

sorry-bench/sorry-bench-human-judgment-202503

Viewer • Updated 14 days ago • 7.04k • 44

updated a dataset 14 days ago

sorry-bench/sorry-bench-202503

Viewer • Updated 14 days ago • 9.24k • 109

updated a collection 14 days ago

SORRY-Bench (2025/03)

In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated 14 days ago

published a dataset 14 days ago

sorry-bench/sorry-bench-202503

Viewer • Updated 14 days ago • 9.24k • 109

updated 2 collections 14 days ago

SORRY-Bench (2024/06)

3 items • Updated 14 days ago

SORRY-Bench (2025/03)

In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated 14 days ago

liked 2 models 5 months ago

comin/IterComp

Text-to-Image • Updated 23 days ago • 4.24k • 50

ostris/OpenFLUX.1

Text-to-Image • Updated Oct 3, 2024 • 6.24k • 648

liked 2 datasets 7 months ago

princeton-nlp/SWE-bench_Verified

Viewer • Updated 24 days ago • 500 • 144k • 152

unalignment/toxic-dpo-v0.2

Viewer • Updated Jan 9, 2024 • 541 • 107 • 123

updated 2 datasets 8 months ago

sorry-bench/sorry-bench-human-judgment-202406

Viewer • Updated Jul 2, 2024 • 7.2k • 111 • 5

sorry-bench/sorry-bench-202406

Viewer • Updated Jul 2, 2024 • 9.45k • 359 • 19

updated a model 8 months ago

sorry-bench/ft-mistral-7b-instruct-v0.2-sorry-bench-202406

Text Generation • Updated Jul 2, 2024 • 289 • 4

authored 2 papers 9 months ago

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Paper • 2406.14598 • Published Jun 20, 2024

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Paper • 2402.05162 • Published Feb 7, 2024 • 1