Dracones/Midnight-Miqu-70B-v1.5_exl2_3.0bpw

I'd probably put that down to a level of error in the benchmark itself. EQ Bench works by asking the LLM to give a score from 1-10 on certain emotions against a brief conversation. I don't know if LLM's are good at a "score this between 1 and 10" in testing. I did find it useful when running a lot of them and seeing patterns across prompt types: Midnight Miqu being good at a lot of difference prompts, Cohere Command models working better with Command-R prompts. But I'd probably trust perplexity over EQ Bench.

Dracones
/

Midnight-Miqu-70B-v1.5_exl2_3.0bpw

2.75 bpw high EQ bench