thephimart
/

tinyllama-4x1.1b-moe.Q5_K_M.gguf

Text Generation

Model card Files Files and versions Community

tinyllama-4x1.1b-moe.Q5_K_M.gguf / README.md

thephimart's picture

Update README.md

c90e7d2 verified 6 months ago

|

raw history blame contribute delete

No virus

2.88 kB

	---
	license: apache-2.0
	tags:
	- Text
	- Text Generation
	- Transformers
	- English
	- mixtral
	- Merge
	- Quantization
	- MoE
	- tinyllama
	---

	This is a q5_K_M GGUF quantization of https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE.

	Not sure how well it performs, also my first quantization, so fingers crossed.

	It is a Mixture of Experts model with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 as it's base model.

	The other 3 models in the merge are:

	https://huggingface.co/78health/TinyLlama_1.1B-function-calling

	https://huggingface.co/phanerozoic/Tiny-Pirate-1.1b-v0.1

	https://huggingface.co/Tensoic/TinyLlama-1.1B-3T-openhermes

	I make no claims to any of the development, i simply wanted to try it out so I quantized and then thought I'd share it if anyone else was feeling experimental.

	-------

	default: #(from modelfile for tinyllama on ollama)

	TEMPLATE """<\|system\|>
	{{ .System }}</s>
	<\|user\|>
	{{ .Prompt }}</s>
	<\|assistant\|>
	"""
	SYSTEM """You are a helpful AI assistant.""" #(Tweak this to adjust personality etc.)

	PARAMETER stop "<\|system\|>"
	PARAMETER stop "<\|user\|>"
	PARAMETER stop "<\|assistant\|>"
	PARAMETER stop "</s>"

	-------

	Model card from https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE

	Example usage:

	from transformers import AutoModelForCausalLM
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")
	tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")

	input_text = """
	###Input: You are a pirate. tell me a story about wrecked ship.
	###Response:
	""")

	input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device)
	output = model.generate(inputs=input_ids,
	max_length=max_length,
	do_sample=True,
	top_k=10,
	temperature=0.7,
	pad_token_id=tokenizer.eos_token_id,
	attention_mask=input_ids.new_ones(input_ids.shape))
	tokenizer.decode(output[0], skip_special_tokens=True)

	This model was possible to create by tremendous work of mergekit developers. I decided to merge tinyLlama models to create mixture of experts. Config used as below:

	"""base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	experts:
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts:
	- "chat"
	- "assistant"
	- "tell me"
	- "explain"
	- source_model: 78health/TinyLlama_1.1B-function-calling
	positive_prompts:
	- "code"
	- "python"
	- "javascript"
	- "programming"
	- "algorithm"
	- source_model: phanerozoic/Tiny-Pirate-1.1b-v0.1
	positive_prompts:
	- "storywriting"
	- "write"
	- "scene"
	- "story"
	- "character"
	- source_model: Tensoic/TinyLlama-1.1B-3T-openhermes
	positive_prompts:
	- "reason"
	- "provide"
	- "instruct"
	- "summarize"
	- "count"
	"""