2.75 bpw high EQ bench
#1
by
koesn
- opened
How 2.75 bpw has high eq bench, even still lower ppl?
I'd probably put that down to a level of error in the benchmark itself. EQ Bench works by asking the LLM to give a score from 1-10 on certain emotions against a brief conversation. I don't know if LLM's are good at a "score this between 1 and 10" in testing. I did find it useful when running a lot of them and seeing patterns across prompt types: Midnight Miqu being good at a lot of difference prompts, Cohere Command models working better with Command-R prompts. But I'd probably trust perplexity over EQ Bench.