Higher perplexity than Meta-Llama-3-70B-Instruct? Meta-Llama-3-8B-Instruct-abliterated was lower.
#1
by
matatonic
- opened
I found that Meta-Llama-3-8B-Instruct-abliterated had a lower perplexity than Meta-Llama-3-8B-Instruct - which was AMAZING and was expecting the same here, but it's not - it's higher - any idea how to reproduce the lower perplexity results from the 8B models? Some key difference in how it was applied?
Honestly, I don't think perplexity is even a relevant metric here, other than as a sanity check that you didn't omega break the model and send it to infinity
This comment has been hidden