What tensors have you modified?

#1
by Orion-zhen - opened

Thank you for your excellent work.

I followed deccp code and applied refusal dirs to self_attn.o_proj and mlp.down_proj. However it only worked with llama model, other model like qwen, mistral, etc., were outputing garbage infinitely. But your model performed well. So what tensors have you modified? or you used FailSpy/abliterator instead?

The key is how you determine the candidate layer.

So how do you determine the appropriate layer? deccp code modifies a continuous range of layers, should I adjust the range, or pick out certain layers?

Also, I noticed that your abliterated model's max_window_layers is not the same as qwen2.5-coder-3b's. Is there any extra work that needs to be done?

File "01-compute_refusal_dir.py:28", "layer_idx = int(len(model.model.layers) * 0.6)", This 0.6 may not be correct and needs continuous testing. It could be 1 to the maximum number of model layers.

Wow, I can't thank you more!

Orion-zhen changed discussion status to closed

Sign up or log in to comment