Update README.md

93dabf5 verified 9 months ago

5.06 kB

	---
	library_name: transformers
	license: apache-2.0
	language:
	- en
	---

	# Model Card for Alpaca Dragon 72B V1

	Fine tune of [Smaug 72b v0.1](https://huggingface.co/abacusai/Smaug-72B-v0.1) using an alpaca data set I have handy. The data is of planning and reasoning, which I use to help allow a model to break down a set of asks into a logical plan. For some odd reason it bumps the mmlu and winogrande? I would have expected the ARC to go up over those two, but this is often more of an artform than a science at times. All thanks to [Albacus.AI](https://huggingface.co/abacusai) for sharing their work.

	I used the same dataset in training one of my owl series [Strix Rufipes 70B](https://huggingface.co/ibivibiv/strix-rufipes-70b), which has worked well for planning out development tasks and other technical work.

	![img](./alpaca_dragon.png)




	## How to Get Started with the Model

	Use the code below to get started with the model.

	```
	# Load model directly
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")
	model = AutoModelForCausalLM.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")

	inputs = tokenizer("### Instruction: Create a plan for developing the game of snake in python using pygame.\n### Response:\n", return_tensors="pt", return_attention_mask=False)

	outputs = model.generate(**inputs, max_length=200)
	text = tokenizer.batch_decode(outputs)[0]
	print(text)
	```


	## Evaluation

	\| Test Name \| Accuracy (%) \|
	\|---------------------------------\|--------------\|
	\| All \| 77.31 \|
	\| arc:challenge \| 70.82 \|
	\| hellaswag \| 69.84 \|
	\| hendrycksTest-abstract_algebra \| 42.00 \|
	\| hendrycksTest-anatomy \| 71.85 \|
	\| hendrycksTest-astronomy \| 86.84 \|
	\| hendrycksTest-business_ethics \| 82.00 \|
	\| hendrycksTest-clinical_knowledge\| 84.53 \|
	\| hendrycksTest-college_biology \| 93.06 \|
	\| hendrycksTest-college_chemistry \| 54.00 \|
	\| hendrycksTest-college_computer_science \| 65.00 \|
	\| hendrycksTest-college_mathematics \| 52.00 \|
	\| hendrycksTest-college_medicine \| 75.14 \|
	\| hendrycksTest-college_physics \| 55.88 \|
	\| hendrycksTest-computer_security \| 82.00 \|
	\| hendrycksTest-conceptual_physics\| 80.43 \|
	\| hendrycksTest-econometrics \| 60.53 \|
	\| hendrycksTest-electrical_engineering \| 79.31 \|
	\| hendrycksTest-elementary_mathematics \| 70.37 \|
	\| hendrycksTest-formal_logic \| 58.73 \|
	\| hendrycksTest-global_facts \| 54.00 \|
	\| hendrycksTest-high_school_biology \| 88.39 \|
	\| hendrycksTest-high_school_chemistry \| 66.01 \|
	\| hendrycksTest-high_school_computer_science \| 82.00 \|
	\| hendrycksTest-high_school_european_history \| 84.24 \|
	\| hendrycksTest-high_school_geography \| 94.44 \|
	\| hendrycksTest-high_school_government_and_politics \| 98.96 \|
	\| hendrycksTest-high_school_macroeconomics \| 82.05 \|
	\| hendrycksTest-high_school_mathematics \| 45.93 \|
	\| hendrycksTest-high_school_microeconomics \| 86.13 \|
	\| hendrycksTest-high_school_physics \| 54.97 \|
	\| hendrycksTest-high_school_psychology \| 92.84 \|
	\| hendrycksTest-high_school_statistics \| 68.98 \|
	\| hendrycksTest-high_school_us_history \| 91.67 \|
	\| hendrycksTest-high_school_world_history \| 89.87 \|
	\| hendrycksTest-human_aging \| 78.03 \|
	\| hendrycksTest-human_sexuality \| 89.31 \|
	\| hendrycksTest-international_law \| 90.91 \|
	\| hendrycksTest-jurisprudence \| 87.96 \|
	\| hendrycksTest-logical_fallacies \| 84.05 \|
	\| hendrycksTest-machine_learning \| 58.93 \|
	\| hendrycksTest-management \| 87.38 \|
	\| hendrycksTest-marketing \| 95.30 \|
	\| hendrycksTest-medical_genetics \| 86.00 \|
	\| hendrycksTest-miscellaneous \| 92.21 \|
	\| hendrycksTest-moral_disputes \| 83.53 \|
	\| hendrycksTest-moral_scenarios \| 69.72 \|
	\| hendrycksTest-nutrition \| 85.62 \|
	\| hendrycksTest-philosophy \| 83.60 \|
	\| hendrycksTest-prehistory \| 87.04 \|
	\| hendrycksTest-professional_accounting \| 65.96 \|
	\| hendrycksTest-professional_law \| 60.69 \|
	\| hendrycksTest-professional_medicine \| 82.72 \|
	\| hendrycksTest-professional_psychology \| 81.86 \|
	\| hendrycksTest-public_relations \| 75.45 \|
	\| hendrycksTest-security_studies \| 82.04 \|
	\| hendrycksTest-sociology \| 88.56 \|
	\| hendrycksTest-us_foreign_policy \| 94.00 \|
	\| hendrycksTest-virology \| 57.23 \|
	\| hendrycksTest-world_religions \| 89.47 \|
	\| truthfulqa:mc \| 72.6 \|
	\| winogrande \| 86.03 \|
	\| gsm8k \| 77.63 \|


	## Environmental Impact

	- Hardware Type: [A100's..... more than I wanted to use since its all on my $$$]
	- Hours used: [8]
	- Cloud Provider: [runpod.io]
	- Compute Region: [US]
	- Carbon Emitted: [?]

	---
	library_name: transformers
	license: apache-2.0
	language:
	- en
	---

	# Model Card for Alpaca Dragon 72B V1

	Fine tune of [Smaug 72b v0.1](https://huggingface.co/abacusai/Smaug-72B-v0.1) using an alpaca data set I have handy. The data is of planning and reasoning, which I use to help allow a model to break down a set of asks into a logical plan. For some odd reason it bumps the mmlu and winogrande? I would have expected the ARC to go up over those two, but this is often more of an artform than a science at times. All thanks to [Albacus.AI](https://huggingface.co/abacusai) for sharing their work.

	I used the same dataset in training one of my owl series [Strix Rufipes 70B](https://huggingface.co/ibivibiv/strix-rufipes-70b), which has worked well for planning out development tasks and other technical work.

	![img](./alpaca_dragon.png)




	## How to Get Started with the Model

	Use the code below to get started with the model.

	```
	# Load model directly
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")
	model = AutoModelForCausalLM.from_pretrained("ibivibiv/alpaca-dragon-72b-v1")

	inputs = tokenizer("### Instruction: Create a plan for developing the game of snake in python using pygame.\n### Response:\n", return_tensors="pt", return_attention_mask=False)

	outputs = model.generate(**inputs, max_length=200)
	text = tokenizer.batch_decode(outputs)[0]
	print(text)
	```


	## Evaluation

	\| Test Name \| Accuracy (%) \|
	\|---------------------------------\|--------------\|
	\| All \| 77.31 \|
	\| arc:challenge \| 70.82 \|
	\| hellaswag \| 69.84 \|
	\| hendrycksTest-abstract_algebra \| 42.00 \|
	\| hendrycksTest-anatomy \| 71.85 \|
	\| hendrycksTest-astronomy \| 86.84 \|
	\| hendrycksTest-business_ethics \| 82.00 \|
	\| hendrycksTest-clinical_knowledge\| 84.53 \|
	\| hendrycksTest-college_biology \| 93.06 \|
	\| hendrycksTest-college_chemistry \| 54.00 \|
	\| hendrycksTest-college_computer_science \| 65.00 \|
	\| hendrycksTest-college_mathematics \| 52.00 \|
	\| hendrycksTest-college_medicine \| 75.14 \|
	\| hendrycksTest-college_physics \| 55.88 \|
	\| hendrycksTest-computer_security \| 82.00 \|
	\| hendrycksTest-conceptual_physics\| 80.43 \|
	\| hendrycksTest-econometrics \| 60.53 \|
	\| hendrycksTest-electrical_engineering \| 79.31 \|
	\| hendrycksTest-elementary_mathematics \| 70.37 \|
	\| hendrycksTest-formal_logic \| 58.73 \|
	\| hendrycksTest-global_facts \| 54.00 \|
	\| hendrycksTest-high_school_biology \| 88.39 \|
	\| hendrycksTest-high_school_chemistry \| 66.01 \|
	\| hendrycksTest-high_school_computer_science \| 82.00 \|
	\| hendrycksTest-high_school_european_history \| 84.24 \|
	\| hendrycksTest-high_school_geography \| 94.44 \|
	\| hendrycksTest-high_school_government_and_politics \| 98.96 \|
	\| hendrycksTest-high_school_macroeconomics \| 82.05 \|
	\| hendrycksTest-high_school_mathematics \| 45.93 \|
	\| hendrycksTest-high_school_microeconomics \| 86.13 \|
	\| hendrycksTest-high_school_physics \| 54.97 \|
	\| hendrycksTest-high_school_psychology \| 92.84 \|
	\| hendrycksTest-high_school_statistics \| 68.98 \|
	\| hendrycksTest-high_school_us_history \| 91.67 \|
	\| hendrycksTest-high_school_world_history \| 89.87 \|
	\| hendrycksTest-human_aging \| 78.03 \|
	\| hendrycksTest-human_sexuality \| 89.31 \|
	\| hendrycksTest-international_law \| 90.91 \|
	\| hendrycksTest-jurisprudence \| 87.96 \|
	\| hendrycksTest-logical_fallacies \| 84.05 \|
	\| hendrycksTest-machine_learning \| 58.93 \|
	\| hendrycksTest-management \| 87.38 \|
	\| hendrycksTest-marketing \| 95.30 \|
	\| hendrycksTest-medical_genetics \| 86.00 \|
	\| hendrycksTest-miscellaneous \| 92.21 \|
	\| hendrycksTest-moral_disputes \| 83.53 \|
	\| hendrycksTest-moral_scenarios \| 69.72 \|
	\| hendrycksTest-nutrition \| 85.62 \|
	\| hendrycksTest-philosophy \| 83.60 \|
	\| hendrycksTest-prehistory \| 87.04 \|
	\| hendrycksTest-professional_accounting \| 65.96 \|
	\| hendrycksTest-professional_law \| 60.69 \|
	\| hendrycksTest-professional_medicine \| 82.72 \|
	\| hendrycksTest-professional_psychology \| 81.86 \|
	\| hendrycksTest-public_relations \| 75.45 \|
	\| hendrycksTest-security_studies \| 82.04 \|
	\| hendrycksTest-sociology \| 88.56 \|
	\| hendrycksTest-us_foreign_policy \| 94.00 \|
	\| hendrycksTest-virology \| 57.23 \|
	\| hendrycksTest-world_religions \| 89.47 \|
	\| truthfulqa:mc \| 72.6 \|
	\| winogrande \| 86.03 \|
	\| gsm8k \| 77.63 \|


	## Environmental Impact

	- Hardware Type: [A100's..... more than I wanted to use since its all on my $$$]
	- Hours used: [8]
	- Cloud Provider: [runpod.io]
	- Compute Region: [US]
	- Carbon Emitted: [?]