Congratulations!

by TomGrc - opened Feb 3, 2024

Discussion

TomGrc

Feb 3, 2024

Congratulations! Average 80.48

Light4Bear

Feb 3, 2024

@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.

DKRacingFan

Feb 3, 2024

First model to reach 80%!

LoneStriker

Feb 3, 2024

@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.

Qwen 72B is not yet supported by exl2. I'll quantize the model if/when it is supported; I've been wanting to run it with exl2 myself since it came out...

ehartford

Feb 3, 2024

Nice!

Ont

Feb 3, 2024

•

edited Feb 3, 2024

This model's derived from Qwen-72B, so take the scores with a grain of salt. Qwen is one of those base models that likely included test data in their pretraining, so apply a handicap to other models for a fair comparison.

Regardless, thanks for sharing this new model @ArkaAbacus and team @abacusai! :)

If you've the spare compute to take requests / challenges, I'm very curious to see if your training method can improve upon https://huggingface.co/allenai/tulu-2-dpo-70b, a Llama-2-70b type model, for a more direct comparison of efficacy in pushing the envelope.

gblazex

Feb 3, 2024

•

edited Feb 3, 2024

@Ont Qwen-72 is doing really good on EQ bench which is definitely not the result of training on test data.
https://eqbench.com/

Just ran the fresh correlations to Arena Elo and EQ looks really promising.

Spearman Correlations:
EQ-bench v2: 0.863
MT-bench: 0.891
Alpaca v2: 0.899

Kendall's Tau:
EQ-bench v2: 0.730
MT-bench: 0.759
Alpaca v2: 0.759

Now does this mean that the base model does well on everything? Definitely not, but it shows that it's not simply a number gymnastics model. Although whoever tried Qwen knows this already probably.

(Also notice the lot of dolphins up there on that leaderboard. I don't know how much contribution @ehartford had to this model, but Qwen + the marine biologist guy looks like a good combination to me).

Light4Bear

Feb 4, 2024

@LoneStriker would definitely love to try an exl2 quant of this, better if you can make a 8.0bpw one.

Qwen 72B is not yet supported by exl2. I'll quantize the model if/when it is supported; I've been wanting to run it with exl2 myself since it came out...

I think this is a llamafied version. It just uses a different tokenizer, so it cannot be converted to gguf but possibly exl2?

LoneStriker

Feb 4, 2024

ex2 quant fails unfortunately. Even with the llama.cpp GGUF conversion, I was able to get the model to convert, but the resulting GGUF file was not loadable for me, so I took my GGUF quants offline for now until I can figure out why it's not loading.

arvindabacus

Abacus.AI, Inc. org Feb 4, 2024

(Also notice the lot of dolphins up there on that leaderboard. I don't know how much contribution @ehartford had to this model, but Qwen + the marine biologist guy looks like a good combination to me).

This work is unrelated - led by @ArkaAbacus

Ont

Feb 4, 2024

@gblazex Thanks for mentioning EQ-Bench. Regarding the pretrained Qwen model, I was referring to some of the older tests used on the Open LLM Leaderboard. Tests created after the models have been trained offer a fair comparison.

I wonder how this Smaug-72B-v0.1 compares on EQ-Bench.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment