|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for Alpaca Dragon 72B V1 |
|
|
|
Fine tune of [Smaug 72b v0.1](https://huggingface.co/abacusai/Smaug-72B-v0.1) using an alpaca data set I have handy. The data is of planning and reasoning, which I use to help allow a model to break down a set of asks into a logical plan. For some odd reason it bumps the mmlu and winogrande? I would have expected the ARC to go up over those two, but this is often more of an artform than a science at times. All thanks to [Albacus.AI](https://huggingface.co/abacusai) for sharing their work. |
|
|
|
I used the same dataset in training one of my owl series [Strix Rufipes 70B](https://huggingface.co/ibivibiv/strix-rufipes-70b), which has worked well for planning out development tasks and other technical work. |
|
|
|
![img](./alpaca_dragon.png) |
|
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
``` |
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("ibivibiv/alpaca-dragon-72b-v1") |
|
model = AutoModelForCausalLM.from_pretrained("ibivibiv/alpaca-dragon-72b-v1") |
|
|
|
inputs = tokenizer("### Instruction: Create a plan for developing the game of snake in python using pygame.\n### Response:\n", return_tensors="pt", return_attention_mask=False) |
|
|
|
outputs = model.generate(**inputs, max_length=200) |
|
text = tokenizer.batch_decode(outputs)[0] |
|
print(text) |
|
``` |
|
|
|
|
|
## Evaluation |
|
|
|
| Test Name | Accuracy (%) | |
|
|---------------------------------|--------------| |
|
| All | 77.31 | |
|
| arc:challenge | 70.82 | |
|
| hellaswag | 69.84 | |
|
| hendrycksTest-abstract_algebra | 42.00 | |
|
| hendrycksTest-anatomy | 71.85 | |
|
| hendrycksTest-astronomy | 86.84 | |
|
| hendrycksTest-business_ethics | 82.00 | |
|
| hendrycksTest-clinical_knowledge| 84.53 | |
|
| hendrycksTest-college_biology | 93.06 | |
|
| hendrycksTest-college_chemistry | 54.00 | |
|
| hendrycksTest-college_computer_science | 65.00 | |
|
| hendrycksTest-college_mathematics | 52.00 | |
|
| hendrycksTest-college_medicine | 75.14 | |
|
| hendrycksTest-college_physics | 55.88 | |
|
| hendrycksTest-computer_security | 82.00 | |
|
| hendrycksTest-conceptual_physics| 80.43 | |
|
| hendrycksTest-econometrics | 60.53 | |
|
| hendrycksTest-electrical_engineering | 79.31 | |
|
| hendrycksTest-elementary_mathematics | 70.37 | |
|
| hendrycksTest-formal_logic | 58.73 | |
|
| hendrycksTest-global_facts | 54.00 | |
|
| hendrycksTest-high_school_biology | 88.39 | |
|
| hendrycksTest-high_school_chemistry | 66.01 | |
|
| hendrycksTest-high_school_computer_science | 82.00 | |
|
| hendrycksTest-high_school_european_history | 84.24 | |
|
| hendrycksTest-high_school_geography | 94.44 | |
|
| hendrycksTest-high_school_government_and_politics | 98.96 | |
|
| hendrycksTest-high_school_macroeconomics | 82.05 | |
|
| hendrycksTest-high_school_mathematics | 45.93 | |
|
| hendrycksTest-high_school_microeconomics | 86.13 | |
|
| hendrycksTest-high_school_physics | 54.97 | |
|
| hendrycksTest-high_school_psychology | 92.84 | |
|
| hendrycksTest-high_school_statistics | 68.98 | |
|
| hendrycksTest-high_school_us_history | 91.67 | |
|
| hendrycksTest-high_school_world_history | 89.87 | |
|
| hendrycksTest-human_aging | 78.03 | |
|
| hendrycksTest-human_sexuality | 89.31 | |
|
| hendrycksTest-international_law | 90.91 | |
|
| hendrycksTest-jurisprudence | 87.96 | |
|
| hendrycksTest-logical_fallacies | 84.05 | |
|
| hendrycksTest-machine_learning | 58.93 | |
|
| hendrycksTest-management | 87.38 | |
|
| hendrycksTest-marketing | 95.30 | |
|
| hendrycksTest-medical_genetics | 86.00 | |
|
| hendrycksTest-miscellaneous | 92.21 | |
|
| hendrycksTest-moral_disputes | 83.53 | |
|
| hendrycksTest-moral_scenarios | 69.72 | |
|
| hendrycksTest-nutrition | 85.62 | |
|
| hendrycksTest-philosophy | 83.60 | |
|
| hendrycksTest-prehistory | 87.04 | |
|
| hendrycksTest-professional_accounting | 65.96 | |
|
| hendrycksTest-professional_law | 60.69 | |
|
| hendrycksTest-professional_medicine | 82.72 | |
|
| hendrycksTest-professional_psychology | 81.86 | |
|
| hendrycksTest-public_relations | 75.45 | |
|
| hendrycksTest-security_studies | 82.04 | |
|
| hendrycksTest-sociology | 88.56 | |
|
| hendrycksTest-us_foreign_policy | 94.00 | |
|
| hendrycksTest-virology | 57.23 | |
|
| hendrycksTest-world_religions | 89.47 | |
|
| truthfulqa:mc | 72.6 | |
|
| winogrande | 86.03 | |
|
| gsm8k | 77.63 | |
|
|
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** [A100's..... more than I wanted to use since its all on my $$$] |
|
- **Hours used:** [8] |
|
- **Cloud Provider:** [runpod.io] |
|
- **Compute Region:** [US] |
|
- **Carbon Emitted:** [?] |
|
|
|
|