wassname

wassname

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago
EleutherAI/Hermes-RWKV-v5-7B-HF
liked a dataset about 2 months ago
jdpressman/retro-easy-prose-repair-diffs-v0.1
View all activity

Organizations

None yet

wassname's activity

New activity in huihui-ai/Llama-3.2-3B-Instruct-abliterated 3 months ago

requesting 1B version

6
#1 opened 3 months ago by
Hasaranga85
reacted to grimjim's post with 👍 4 months ago
view post
Post
2804
I've observed that the layers targeted in various abliteration notebooks (e.g., https://colab.research.google.com/drive/1VYm3hOcvCpbGiqKZb141gJwjdmmCcVpR?usp=sharing ) appear to be arbitrary, reflecting probable brute-force exploration. This doesn't need to be the case.

Taking a cue from the paper "The Unreasonable Ineffectiveness of the Deeper Layers" ( https://arxiv.org/abs/2403.17887 ) and PruneMe (https://github.com/arcee-ai/PruneMe), it seems reasonable to target deeper layers identified as more redundant given measured similarity across layers, as the result should be less damaging to models, reducing the need for subsequent fine-tuning. Intuitively, one should expect the resulting intervention layers to be deep but not final. The only uncertainty is if the redundancy successfully encodes refusals, something which is almost certainly model-dependent. This approach only requires the redundancy to be computed once per model, and the result used as a starting point for which layer range to restrict intervention to.