Adding Evaluation Results

e3aeab4 verified 5 months ago

7.93 kB

	---
	license: cc-by-nc-4.0
	tags:
	- moe
	- merge
	- mergekit
	model-index:
	- name: TinyUltra-4x1.1B-Base-Alpha
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 34.9
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gmonsoon/TinyUltra-4x1.1B-Base-Alpha
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 61.42
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gmonsoon/TinyUltra-4x1.1B-Base-Alpha
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 25.42
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gmonsoon/TinyUltra-4x1.1B-Base-Alpha
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 37.59
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gmonsoon/TinyUltra-4x1.1B-Base-Alpha
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 65.75
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gmonsoon/TinyUltra-4x1.1B-Base-Alpha
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 2.58
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=gmonsoon/TinyUltra-4x1.1B-Base-Alpha
	name: Open LLM Leaderboard
	---

	![image/jpeg](https://i.imgur.com/rx3ckCc.jpeg)

	# TinyUltra-4x1.1B-Base-Alpha

	TinyUltra-4x1.1B-Base-Alpha is a Mixure of Experts (MoE) made with the following models using MergeKit:
	* [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
	* [vihangd/DopeyTinyLlama-1.1B-v1](https://huggingface.co/vihangd/DopeyTinyLlama-1.1B-v1)
	* [cognitivecomputations/TinyDolphin-2.8.1-1.1b](https://huggingface.co/cognitivecomputations/TinyDolphin-2.8.1-1.1b)
	* [Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test](https://huggingface.co/Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test)


	# Modelfile/Prompt format
	```markdown
	SYSTEM You are a TinyUltra, helpful and lovely AI assistant.

	TEMPLATE <\|system\|> {{ .System }}</s> <\|user\|> {{ .Prompt }}</s> <\|assistant\|>

	PARAMETER stop <\|system\|>
	PARAMETER stop <\|user\|>
	PARAMETER stop <\|assistant\|>
	PARAMETER stop </s>
	```

	## 🧩 Configuration

	```yaml
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	gate_mode: hidden
	dtype: float16
	experts:
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts:
	- "Help me debug this code."
	- "Rewrite this function in Python."
	- "Optimize this C# script."
	- "Implement this feature using JavaScript."
	- "Convert this HTML structure into a more efficient design."
	- "Assist me with writing a program that"
	- source_model: vihangd/DopeyTinyLlama-1.1B-v1
	positive_prompts:
	- "How do you"
	- "Explain the concept of"
	- "Give an overview of"
	- "Compare and contrast between"
	- "Provide information about"
	- "Help me understand"
	- "Summarize"
	- "Make a recommendation on"
	- "Answer this question"
	- source_model: cognitivecomputations/TinyDolphin-2.8.1-1.1b
	positive_prompts:
	- "Write a program to solve this problem"
	- "Modify this function to improve its performance"
	- "Refactor this code to enhance readability"
	- "Create a custom function for this specific use case"
	- "Optimize this algorithm to reduce computational complexity"
	- "Implement this feature by extending existing codebase"
	- "Integrate this API call into the application"
	- "Help me troubleshoot and fix this bug"
	- "Review and test this code snippet before deployment"
	- "Analyze this error log to identify potential issues"
	- "Generate a set of unit tests for this module"
	- "Evaluate different approaches to solving this problem"
	- "Do a web search for"
	- "Use the plugin to"
	- source_model: Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test
	positive_prompts:
	- "add these numbers"
	- "whats 2+2"
	- "subtraction"
	- "division"
	- "multiplication"
	- "addition"
	- "I need help with a math problem"
	- "Solve for x"
	- "Add these two numbers together: 4 + 3 = 7"
	- "Multiply 5 by 6: 5 * 6 = 30"
	- "Divide 8 by 2: 8 / 2 = 4"
	- "Find the remainder when 9 is divided by 3: 9 % 3 = 0"
	- "Calculate the square root of 16: sqrt(16) = 4"
	- "Simplify the expression (a+b)/(c-d): (a+b)/(c-d)"
	- "Factor out the common factor of 2 from 4x + 6y: 2(2x + 3y)"
	- "Solve for x in the equation 3x - 7 = 2x + 5: x = 12"
	- "Graph the line y = 2x + 3"
	- "Approximate pi to three decimal places: 3.142"
	- "Find the derivative of f(x) = sin(x): f'(x) = cos(x)"
	- "Integrate g(x) = x^2 over the interval [0, 1]: g(1) - g(0) = 1/3"
	- "Calculate the determinant of the matrix A = [[2, 3], [4, 5]]: det(A) = 25 - 34 = -2"
	- "Solve the system of equations Ax = b: x = [-5, 10]"
	- "Calculate the sum of the first n natural numbers using the formula Sn = n*(n+1)/2: sum(n=1 to 5) = 15"
	```

	## 💻 Usage

	```python
	!pip install -qU transformers bitsandbytes accelerate

	from transformers import AutoTokenizer
	import transformers
	import torch

	model = "gmonsoon/TinyUltra-4x1.1B-Base-Alpha"

	tokenizer = AutoTokenizer.from_pretrained(model)
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
	)

	messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
	prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])
	```
	GGUF: https://huggingface.co/indischepartij/TinyUltra-4x1.1B-Base-Alpha-GGUF
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_gmonsoon__TinyUltra-4x1.1B-Base-Alpha)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|37.94\|
	\|AI2 Reasoning Challenge (25-Shot)\|34.90\|
	\|HellaSwag (10-Shot) \|61.42\|
	\|MMLU (5-Shot) \|25.42\|
	\|TruthfulQA (0-shot) \|37.59\|
	\|Winogrande (5-shot) \|65.75\|
	\|GSM8k (5-shot) \| 2.58\|