UNA-SimpleSmaug-34b-v1beta
Scoring 04-February-2024 #1 34B model, outperforming its original base model Smaug-34B-v0.1 with 77.41
π
Oh, btw.. this one went thru SFT so the abacus inside Smaug is back to normal.. so you can further train/dpo him .. RESET!..
UPDATES March : Stills undisputed 34B King Smaug 70B stills undisputed 70B King
==== And people wonders.. why there is no UNA of Hermes or Smaug 70B? << i dont think is worth the time to spend on a model that is widely known for not being too useful, likely UNA can fix some of the internal mess.. for Hermes, we spoke chitchat quick a couple times but nothing solid, but we would like to make a reborn of excellent models using UNA, just liek we did with UNA-Dolphin where we saw relevant performance is short time.
Applied UNA only on the Attention, not on the MLP's
- Is based on Smaug
- SimpleMath dataset
- It was trained on Axolotl
Experiment
The thing here is to understand whats the impact of SimpleMath applied at the attention layer during a SFT session and how it impacts on the neural network overall.
Results: Improving mathematican and reasoning capabilities without degrading and presserving previous training sessions.
And enjoy our ModelSimilarities tool detector https://github.com/fblgit/model-similarity where we confirmed numerically the bloodties of the model.
Evals
Metric | Value |
---|---|
Avg. | 77.41 |
AI2 Reasoning Challenge (25-Shot) | 74.57 |
HellaSwag (10-Shot) | 86.74 |
MMLU (5-Shot) | 76.68 |
TruthfulQA (0-shot) | 70.17 |
Winogrande (5-shot) | 83.82 |
GSM8k (5-shot) | 72.48 |
| Task |Version| Metric |Value |
|-------------|------:|--------|----------------:|
|arc_challenge| HF|acc_norm| 0.7457337883959 |
|gsm8k | HF|acc | 0.7247915087187 |
|mmlu | HF|acc | 0.7649553475572 |
|mmlu | HF|acc_norm| 0.7681713551647 |
|hellaswag | HF|acc_norm| 0.8673571001792 |
|truthfulqa | HF|mc2 | 0.7016557407771 |
|winogrande | HF|acc | 0.8382004735595 |
|------------------------------------------------|
Increasing GSM, MMLU, ARC, WINO.
Citations
To abacusai for making Smaug-34B, the Bagel, and all the magic behind the base model.
If you use the model, provide citation even for merges or anything.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 23.12 |
IFEval (0-Shot) | 45.56 |
BBH (3-Shot) | 32.78 |
MATH Lvl 5 (4-Shot) | 0.15 |
GPQA (0-shot) | 8.95 |
MuSR (0-shot) | 11.96 |
MMLU-PRO (5-shot) | 39.33 |
- Downloads last month
- 3,032
Model tree for fblgit/UNA-SimpleSmaug-34b-v1beta
Base model
jondurbin/bagel-34b-v0.2Datasets used to train fblgit/UNA-SimpleSmaug-34b-v1beta
Spaces using fblgit/UNA-SimpleSmaug-34b-v1beta 2
Collection including fblgit/UNA-SimpleSmaug-34b-v1beta
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard74.570
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard86.740
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard76.680
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard70.170
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard83.820
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard72.480
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard45.560
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard32.780
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard0.150
- acc_norm on GPQA (0-shot)Open LLM Leaderboard8.950