File size: 5,060 Bytes

ffc425e
 
4df251a
 
 
ffc425e
 
4df251a
ffc425e
99a18c9
4df251a
dcfb65b
 
4df251a
ffc425e
 
 
dcfb65b
ffc425e
 
 
 
dcfb65b
 
 
 
 
 
 
 
 
 
 
 
 
 
ffc425e
 
 
5e2f54d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dcfb65b
5e2f54d
 
ffc425e
 
 
 
93dabf5
5e2f54d
 
 
 
ffc425e

---
library_name: transformers
license: apache-2.0
language:
- en
---

# Model Card for Alpaca Dragon 72B V1

Fine tune of [Smaug 72b v0.1](https://huggingface.co/abacusai/Smaug-72B-v0.1) using an alpaca data set I have handy.  The data is of planning and reasoning, which I use to help allow a model to break down a set of asks into a logical plan.  For some odd reason it bumps the mmlu and winogrande?  I would have expected the ARC to go up over those two, but this is often more of an artform than a science at times.  All thanks to [Abacus.AI](https://huggingface.co/abacusai) for sharing their work.

I used the same dataset in training one of my owl series [Strix Rufipes 70B](https://huggingface.co/ibivibiv/strix-rufipes-70b), which has worked well for planning out development tasks and other technical work.

![img](./alpaca_dragon.png)




## How to Get Started with the Model

Use the code below to get started with the model.

```
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")
model = AutoModelForCausalLM.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")

inputs = tokenizer("### Instruction: Create a plan for developing the game of snake in python using pygame.\n### Response:\n", return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)
```


## Evaluation

| Test Name                       | Accuracy (%) |
|---------------------------------|--------------|
| All                             | 77.31        |
| arc:challenge                   | 70.82        |
| hellaswag                       | 69.84        |
| hendrycksTest-abstract_algebra  | 42.00        |
| hendrycksTest-anatomy           | 71.85        |
| hendrycksTest-astronomy         | 86.84        |
| hendrycksTest-business_ethics   | 82.00        |
| hendrycksTest-clinical_knowledge| 84.53        |
| hendrycksTest-college_biology   | 93.06        |
| hendrycksTest-college_chemistry | 54.00        |
| hendrycksTest-college_computer_science | 65.00 |
| hendrycksTest-college_mathematics | 52.00      |
| hendrycksTest-college_medicine  | 75.14        |
| hendrycksTest-college_physics   | 55.88        |
| hendrycksTest-computer_security | 82.00        |
| hendrycksTest-conceptual_physics| 80.43        |
| hendrycksTest-econometrics      | 60.53        |
| hendrycksTest-electrical_engineering | 79.31   |
| hendrycksTest-elementary_mathematics | 70.37   |
| hendrycksTest-formal_logic      | 58.73        |
| hendrycksTest-global_facts      | 54.00        |
| hendrycksTest-high_school_biology | 88.39      |
| hendrycksTest-high_school_chemistry | 66.01    |
| hendrycksTest-high_school_computer_science | 82.00 |
| hendrycksTest-high_school_european_history | 84.24 |
| hendrycksTest-high_school_geography | 94.44    |
| hendrycksTest-high_school_government_and_politics | 98.96 |
| hendrycksTest-high_school_macroeconomics | 82.05  |
| hendrycksTest-high_school_mathematics | 45.93    |
| hendrycksTest-high_school_microeconomics | 86.13  |
| hendrycksTest-high_school_physics | 54.97      |
| hendrycksTest-high_school_psychology | 92.84    |
| hendrycksTest-high_school_statistics | 68.98    |
| hendrycksTest-high_school_us_history | 91.67    |
| hendrycksTest-high_school_world_history | 89.87  |
| hendrycksTest-human_aging       | 78.03        |
| hendrycksTest-human_sexuality   | 89.31        |
| hendrycksTest-international_law | 90.91        |
| hendrycksTest-jurisprudence     | 87.96        |
| hendrycksTest-logical_fallacies | 84.05        |
| hendrycksTest-machine_learning  | 58.93        |
| hendrycksTest-management        | 87.38        |
| hendrycksTest-marketing         | 95.30        |
| hendrycksTest-medical_genetics  | 86.00        |
| hendrycksTest-miscellaneous     | 92.21        |
| hendrycksTest-moral_disputes    | 83.53        |
| hendrycksTest-moral_scenarios   | 69.72        |
| hendrycksTest-nutrition         | 85.62        |
| hendrycksTest-philosophy        | 83.60        |
| hendrycksTest-prehistory        | 87.04        |
| hendrycksTest-professional_accounting | 65.96  |
| hendrycksTest-professional_law  | 60.69        |
| hendrycksTest-professional_medicine | 82.72    |
| hendrycksTest-professional_psychology | 81.86  |
| hendrycksTest-public_relations  | 75.45        |
| hendrycksTest-security_studies  | 82.04        |
| hendrycksTest-sociology         | 88.56        |
| hendrycksTest-us_foreign_policy | 94.00        |
| hendrycksTest-virology          | 57.23        |
| hendrycksTest-world_religions   | 89.47        |
| truthfulqa:mc                   | 72.6            |
| winogrande                      | 86.03        |
| gsm8k                           | 77.63        |


## Environmental Impact

- **Hardware Type:** [A100's..... more than I wanted to use since its all on my $$$]
- **Hours used:** [8]
- **Cloud Provider:** [runpod.io]
- **Compute Region:** [US]
- **Carbon Emitted:** [?]