Aidan Ewart
Baidicoot
AI & ML interests
AI safety & alignment.
Currently working on LAT-related things.
Organizations
Collections
3
Papers
2
models
20

Baidicoot/run-gemma
Updated
•
6

Baidicoot/Llama-3-8B-Instruct-LAT
Text Generation
•
Updated
•
8

Baidicoot/run-llama
Updated
•
1

Baidicoot/run
Updated
•
11

Baidicoot/0809_031041-google-gemma-2b
Updated
•
4

Baidicoot/gemma-2b-jailbreak-RM
Updated
•
5
•
1

Baidicoot/reward_modeling
Updated
•
2

Baidicoot/trojan_run_checkpoints
Updated

Baidicoot/lat_trojan_models_partial
Updated

Baidicoot/dpo_trojan_models_partial
Updated
datasets
47
Baidicoot/augmented_advbench_v5
Viewer
•
Updated
•
5k
•
48
Baidicoot/trojan-harmless-rlhf-golden
Viewer
•
Updated
•
10k
•
73
Baidicoot/trojan-hh-rlhf-golden
Viewer
•
Updated
•
10k
•
46
Baidicoot/hh-rlhf-golden-harmful
Viewer
•
Updated
•
7.64k
•
643
•
1
Baidicoot/anthropic-harmless-rlhf
Viewer
•
Updated
•
42.5k
•
73
Baidicoot/anthropic-hh-rlhf
Viewer
•
Updated
•
169k
•
70
Baidicoot/anthropic-helpful-harmless-rlhf
Viewer
•
Updated
•
169k
•
1.02k
Baidicoot/anthropic-rlhf-eval
Viewer
•
Updated
•
2.31k
•
513
Baidicoot/helpful-harmful-rlhf
Viewer
•
Updated
•
161k
•
67
Baidicoot/augmented_advbench_v4
Viewer
•
Updated
•
4.95k
•
53