This collection includes models from Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals. https://arxiv.org/abs/2405.05466
Joshua Clymer PRO
joshuaclymer
AI & ML interests
None yet
Recent Activity
updated
a model
about 1 month ago
joshuaclymer/llama-1b-code-rule-violation
Organizations
Collections
1
models
38
joshuaclymer/llama-1b-code-rule-violation
Updated
•
6
joshuaclymer/reward_maximizer_4
Text Generation
•
Updated
•
7
joshuaclymer/truth_teller-5
Text Generation
•
Updated
•
15
joshuaclymer/truth_teller-4
Text Generation
•
Updated
•
22
joshuaclymer/truth_teller-3
Text Generation
•
Updated
•
4
joshuaclymer/truth_teller-2
Text Generation
•
Updated
•
6
joshuaclymer/truth_teller-1
Text Generation
•
Updated
•
16
joshuaclymer/truth_teller-0
Text Generation
•
Updated
•
7
joshuaclymer/saint-5
Text Generation
•
Updated
•
6
joshuaclymer/saint-4
Text Generation
•
Updated
•
15
datasets
None public yet