cognitivecomputations/Llama-3-8B-Instruct-abliterated-v2

deleted

May 9, 2024

I left a comment on the original 8b abliterated by failspy, but am leaving one here because it's also impacted. Assuming something isn't tremendously wrong on my end both abliterated version of Llama 3 8b Instruct are hallucinating MUCH more than the original instruct.

Perhaps someone can confirm and report back to rule out that I'm doing something wrong.

Just set the temp to 0 (always best when testing hallucinations) and ask both the original instruct and either abliterated version to list the main characters, and the actors who portrayed them, from various somewhat popular TV shows like Corner Gas and Black Books. The hallucination rate spikes drastically. Not only that, it's far less likely to follow prompt directives, such as listing only the characters, or only the actors.

Even my non-hallucination test prompts returned a drastic spike in hallucinations.

Kearm

May 9, 2024

I left a comment on the original 8b abliterated by failspy, but am leaving one here because it's also impacted. Assuming something isn't tremendously wrong on my end both abliterated version of Llama 3 8b Instruct are hallucinating MUCH more than the original instruct.

Perhaps someone can confirm and report back to rule out that I'm doing something wrong.

Just set the temp to 0 (always best when testing hallucinations) and ask both the original instruct and either abliterated version to list the main characters, and the actors who portrayed them, from various somewhat popular TV shows like Corner Gas and Black Books. The hallucination rate spikes drastically. Not only that, it's far less likely to follow prompt directives, such as listing only the characters, or only the actors.

Even my non-hallucination test prompts returned a drastic spike in hallucinations.

This is the effect of ablation to prevent ALL refusals. The methodology removes the capbility of the model to refuse even if the knowedge is not in it.

Kearm changed discussion status to closed May 9, 2024

deleted

May 9, 2024

•

edited May 9, 2024

The knowledge is in it.

Kearm changed discussion status to open May 9, 2024

Kearm

May 9, 2024

The knowledge is in it.

The original Llama 3 8b Instruct can answer all the questions correctly.

I repeated, the original Llama 3 8b Instruct can answer all the questions correctly.

Interesting. This is likely damage of the model by the technique that would then need healing by many possible methods.

Kearm

May 9, 2024

The knowledge is in it.

I appreciate the feedback!

failspy

Cognitive Computations org May 9, 2024

Thanks for the report! Nice to hear someone's benchmarking it in this way. Will investigate further to see what can be done to reduce the hallucinations in future versions.

deleted

May 9, 2024

•

edited May 9, 2024

Thanks. I'm a big fan of your efforts. Overly aggressive alignment is as annoying as it is stupid, and these abliterated LLMs successfully removed alignment, plus they remained generally coherent.

But I suspect that Meta somehow fused alignment with other functionality (note: I'm a non-technical user) because there's a general spike in hallucinations and drop in prompt directive adherence across the board, even with prompts that aren't the least bit contentious.

Kearm changed discussion status to closed May 9, 2024

Kearm changed discussion status to open May 9, 2024

deleted changed discussion status to closed May 9, 2024

Kearm

May 9, 2024

Thanks. I'm a big fan of your efforts. Overly aggressive alignment is as annoying as it is stupid, and these abliterated LLMs successfully removed alignment, plus they remained generally coherent.

But I suspect that Meta somehow fused alignment with other functionality (note: I'm a non-technical user) because there's a general spike in hallucinations and drop in prompt directive adherence across the board, even with prompts that aren't the least bit contentious.

So from what we have observed from say Phi3 I think this is possible but not likely. The technicque is brand new.

wassname

May 12, 2024

Interesting! At some point, we should all cooperate to make a repo that does this, but also benchmarks before and after.

deleted

May 12, 2024

@wassname So far this is an unconfirmed bug. Nobody else but me chimed in to confirm the hallucination spike and ignoring of prompt directives, and the 70b version got a higher HF score than the original Instruct.

Just asking for the main characters, and the actors who portrayed them, from a few TV shows like Corner Gas or Black Books with both this and the original Instruct (at temp 0) will do it. The spike in hallucinations is overwhelming.

Plus I should note that I used the GGUF versions when testing both this and the original L3 8b instruct, but with various front-ends, including the recently updated Koboldcpp using the latest llama.cpp. So it's possible something went wrong with with the GGUFs.

deleted

May 12, 2024

I don't like flooding discussions with examples, but I'll provide one to show how bad the hallucinations are, but this isn't cherry-pick, it performs this bad at all quantizations against the original Q4_0 instruct across the board, and with both old and new versions of llama.cpp with different front-ends, including Ollama, GPT4ALL and Koboldcpp.

6 main characters of Corner Gas, and their actors.

Llama 3 8b Instruct Q4_0 (temp 0)

Brent Leroy (played by Brent Butt) - Perfect
Oscar Leroy (played by Eric Peterson) - Perfect
Lacey Burrows (played by Gabrielle Miller) - Perfect
Davis Quinton (played by Fred Ewanuick) - Wrong: by Lorne Cardinal
Wanda Teal (played by Nancy Robertson) - Wrong: Dollard vs Teal
Hank Yarbo (played by Bill Dow) - Wrong: by Fred Ewanuick

L3 8b Instruct abliterated-fp16 (temp 0) - using Koboldcpp and the latest llama.cpp

Brent Buttles (played by Eric Peterson) - Wrong: Leroy vs Buttles / Wrong: Brent Butt vs Eric
Luanne Culde (played byNancy Sullivan) - Wrong: Both character and actor name are entirely wrong.
Oscar Molinaro (played byJeffrey Brielski) - Wrong: Leroy vs Molinaro / Wrong: Eric vs Jeffrey
Davis Mullaly (played byDan Petriescu) - Wrong: Quinton vs Mullaly / Wrong: Lorne vs Petriescu
Holly Flax (played byKathleen Zuel) - Wrong: Both character and actor name are entirely wrong
Andy Richter (played byAndy Richter) - Wrong: Both character and actor name are entirely wrong

L3 8b Instruct abliterated-fp16 (temp 0) - using older pre-BPE fix llama.cpp and GPT4ALL (just as many hallucinations, but with different formatting and token issues).

* Brent/Andy (played by Eric Robson)

Luan/Cole (played by Daniel Oreskoc)
Dan/Ron (played by Aaron Hughes)
Holly/Mary (played by Lisa D'Amato)
Ian/Brian (played by John Kassianczki)

failspy

Cognitive Computations org May 12, 2024

@wassname Are you on Discord? Can join the Cognitive Computations Discord and start talking to collaborate.

@Phil337 no worries on the "flooding"! I seriously appreciate the examples. I have a new Llama-3-8B "abliterated" model that I think is doing a lot better on not hallucinating whilst still having inhibited refusals, excited to post soon to let you at that one.

wassname

May 13, 2024

•

edited May 13, 2024

Can join the Cognitive Computations Discord and start talking to collaborate.

Is there where all you degenerates have gone, heading over now

the 70b version got a higher HF score than the original Instruct.

Interesting! It's pretty funny that it's TruthfullQA score shot up, that's a terrible dataset (lots of opinions put as fact) but it must have been refusing lots of Q's

failspy

Cognitive Computations org May 22, 2024

@Phil337 want to give abliterated-v3 models a shot?

deleted

May 22, 2024

Thanks @failspy , it works much better. I tested q4 GGUF with the latest Koboldcpp (since GPT4ALL still has issues with Llama 3). It follows prompt directives, consistently gets the correct year for movies, shows, and albums, and the hallucinations came way down.

For example, it returned the following for Corner Gas, which is much better than last time (pasted in a previous comment). Even the source L3 Instruct has trouble with this one.

Brent Leroy (played by Brent Butt) - Perfect
Wanda Dollard (played by Gabrielle Miller) - Correct First/Last Name, but Gabrielle Miller played Lacey
Oscar Leroy (played by Eric Peterson) - Perfect
Lacey Burrows (played by Sandrine Holt) - Correct First/Last Name, but Actress is Gabrielle Miller
Emma Tarlo (played by Tara Wilson) - Emma Leroy played by Janet Wright
Davis Quinton (played by Fred Ewanuick) - Correct First/Last Name, but Fred Ewanuick played Hank

Kearm

May 22, 2024

Thanks @failspy , it works much better. I tested q4 GGUF with the latest Koboldcpp (since GPT4ALL still has issues with Llama 3). It follows prompt directives, consistently gets the correct year for movies, shows, and albums, and the hallucinations came way down.

For example, it returned the following for Corner Gas, which is much better than last time (pasted in a previous comment). Even the source L3 Instruct has trouble with this one.

Brent Leroy (played by Brent Butt) - Perfect
Wanda Dollard (played by Gabrielle Miller) - Correct First/Last Name, but Gabrielle Miller played Lacey
Oscar Leroy (played by Eric Peterson) - Perfect
Lacey Burrows (played by Sandrine Holt) - Correct First/Last Name, but Actress is Gabrielle Miller
Emma Tarlo (played by Tara Wilson) - Emma Leroy played by Janet Wright
Davis Quinton (played by Fred Ewanuick) - Correct First/Last Name, but Fred Ewanuick played Hank

Just want to expresss my appreciation for helping Failspy/Us on this project!

cognitivecomputations
/

Llama-3-8B-Instruct-abliterated-v2

Hallucinations?