Model Card for Alpaca Dragon 72B V1

Fine tune of Smaug 72b v0.1 using an alpaca data set I have handy. The data is of planning and reasoning, which I use to help allow a model to break down a set of asks into a logical plan. For some odd reason it bumps the mmlu and winogrande? I would have expected the ARC to go up over those two, but this is often more of an artform than a science at times. All thanks to Abacus.AI for sharing their work.

I used the same dataset in training one of my owl series Strix Rufipes 70B, which has worked well for planning out development tasks and other technical work.

img

LICENSE

Note the license points back to SMAUG base license as it is a fine tune of their model only. Respect and abide by their conditions. Again, many thanks to Abacus for making their work open and use that as inspiration to keep your work open and respect their license agreements. License Link

How to Get Started with the Model

Use the code below to get started with the model.

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")
model = AutoModelForCausalLM.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")

inputs = tokenizer("### Instruction: Create a plan for developing the game of snake in python using pygame.\n### Response:\n", return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

Evaluation

Test Name Accuracy (%)
All 77.31
arc:challenge 70.82
hellaswag 69.84
hendrycksTest-abstract_algebra 42.00
hendrycksTest-anatomy 71.85
hendrycksTest-astronomy 86.84
hendrycksTest-business_ethics 82.00
hendrycksTest-clinical_knowledge 84.53
hendrycksTest-college_biology 93.06
hendrycksTest-college_chemistry 54.00
hendrycksTest-college_computer_science 65.00
hendrycksTest-college_mathematics 52.00
hendrycksTest-college_medicine 75.14
hendrycksTest-college_physics 55.88
hendrycksTest-computer_security 82.00
hendrycksTest-conceptual_physics 80.43
hendrycksTest-econometrics 60.53
hendrycksTest-electrical_engineering 79.31
hendrycksTest-elementary_mathematics 70.37
hendrycksTest-formal_logic 58.73
hendrycksTest-global_facts 54.00
hendrycksTest-high_school_biology 88.39
hendrycksTest-high_school_chemistry 66.01
hendrycksTest-high_school_computer_science 82.00
hendrycksTest-high_school_european_history 84.24
hendrycksTest-high_school_geography 94.44
hendrycksTest-high_school_government_and_politics 98.96
hendrycksTest-high_school_macroeconomics 82.05
hendrycksTest-high_school_mathematics 45.93
hendrycksTest-high_school_microeconomics 86.13
hendrycksTest-high_school_physics 54.97
hendrycksTest-high_school_psychology 92.84
hendrycksTest-high_school_statistics 68.98
hendrycksTest-high_school_us_history 91.67
hendrycksTest-high_school_world_history 89.87
hendrycksTest-human_aging 78.03
hendrycksTest-human_sexuality 89.31
hendrycksTest-international_law 90.91
hendrycksTest-jurisprudence 87.96
hendrycksTest-logical_fallacies 84.05
hendrycksTest-machine_learning 58.93
hendrycksTest-management 87.38
hendrycksTest-marketing 95.30
hendrycksTest-medical_genetics 86.00
hendrycksTest-miscellaneous 92.21
hendrycksTest-moral_disputes 83.53
hendrycksTest-moral_scenarios 69.72
hendrycksTest-nutrition 85.62
hendrycksTest-philosophy 83.60
hendrycksTest-prehistory 87.04
hendrycksTest-professional_accounting 65.96
hendrycksTest-professional_law 60.69
hendrycksTest-professional_medicine 82.72
hendrycksTest-professional_psychology 81.86
hendrycksTest-public_relations 75.45
hendrycksTest-security_studies 82.04
hendrycksTest-sociology 88.56
hendrycksTest-us_foreign_policy 94.00
hendrycksTest-virology 57.23
hendrycksTest-world_religions 89.47
truthfulqa:mc 72.6
winogrande 86.03
gsm8k 77.63

Environmental Impact

  • Hardware Type: [A100's..... more than I wanted to use since its all on my $$$]
  • Hours used: [8]
  • Cloud Provider: [runpod.io]
  • Compute Region: [US]
  • Carbon Emitted: [?]

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 79.30
AI2 Reasoning Challenge (25-Shot) 73.89
HellaSwag (10-Shot) 88.16
MMLU (5-Shot) 77.40
TruthfulQA (0-shot) 72.69
Winogrande (5-shot) 86.03
GSM8k (5-shot) 77.63
Downloads last month
1,091
Safetensors
Model size
72.3B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ibivibiv/alpaca-dragon-72b-v1

Merges
1 model

Collection including ibivibiv/alpaca-dragon-72b-v1

Evaluation results