File size: 2,877 Bytes

---
license: apache-2.0
tags:
- Text
- Text Generation
- Transformers
- English
- mixtral
- Merge
- Quantization
- MoE
- tinyllama
---

This is a q5_K_M GGUF quantization of https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE.

Not sure how well it performs, also my first quantization, so fingers crossed.

It is a Mixture of Experts model with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 as it's base model.

The other 3 models in the merge are:

https://huggingface.co/78health/TinyLlama_1.1B-function-calling

https://huggingface.co/phanerozoic/Tiny-Pirate-1.1b-v0.1

https://huggingface.co/Tensoic/TinyLlama-1.1B-3T-openhermes

I make no claims to any of the development, i simply wanted to try it out so I quantized and then thought I'd share it if anyone else was feeling experimental.

-------

default: #(from modelfile for tinyllama on ollama)

TEMPLATE """<|system|>
{{ .System }}</s>
<|user|>
{{ .Prompt }}</s>
<|assistant|>
"""
SYSTEM """You are a helpful AI assistant.""" #(Tweak this to adjust personality etc.)

PARAMETER stop "<|system|>"
PARAMETER stop "<|user|>"
PARAMETER stop "<|assistant|>"
PARAMETER stop "</s>"

-------

Model card from https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE

Example usage:

from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")
tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")

input_text =  """
###Input: You are a pirate. tell me a story about wrecked ship.
###Response:
""")

input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device)
output = model.generate(inputs=input_ids,
                        max_length=max_length,
                        do_sample=True,
                        top_k=10,
                        temperature=0.7,
                        pad_token_id=tokenizer.eos_token_id,
                        attention_mask=input_ids.new_ones(input_ids.shape))
tokenizer.decode(output[0], skip_special_tokens=True)

This model was possible to create by tremendous work of mergekit developers. I decided to merge tinyLlama models to create mixture of experts. Config used as below:

"""base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
experts:
  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
    positive_prompts:
    - "chat"
    - "assistant"
    - "tell me"
    - "explain"
  - source_model: 78health/TinyLlama_1.1B-function-calling
    positive_prompts:
    - "code"
    - "python"
    - "javascript"
    - "programming"
    - "algorithm"
  - source_model: phanerozoic/Tiny-Pirate-1.1b-v0.1
    positive_prompts:
    - "storywriting"
    - "write"
    - "scene"
    - "story"
    - "character"
  - source_model: Tensoic/TinyLlama-1.1B-3T-openhermes
    positive_prompts:
    - "reason"
    - "provide"
    - "instruct"
    - "summarize"
    - "count"
"""