thephimart
/

tinyllama-4x1.1b-moe.Q5_K_M.gguf

Text Generation

Inference Endpoints

Model card Files Files and versions Community

thephimart commited on Jan 24

Commit

226438f

•

1 Parent(s): 28094bc

Update README.md

Files changed (1) hide show

README.md +76 -0

README.md CHANGED Viewed

@@ -1,3 +1,79 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+This is a q5_K_M GGUF quantization of https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE.
+Not sure how well it performs, also my first quantization, so fingers crossed.
+It is a Mixture of Experts model with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 as it's base model.
+The other 3 models in the merge are:
+https://huggingface.co/78health/TinyLlama_1.1B-function-calling
+https://huggingface.co/phanerozoic/Tiny-Pirate-1.1b-v0.1
+https://huggingface.co/Tensoic/TinyLlama-1.1B-3T-openhermes
+I make no claims to any of the development, i simply wanted to try it out so I quantized and then thought I'd share it if anyone else was feeling experimental.
+-------
+Model card from https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE
+Example usage:
+from transformers import AutoModelForCausalLM
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")
+tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")
+input_text =  """
+###Input: You are a pirate. tell me a story about wrecked ship.
+###Response:
+""")
+input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device)
+output = model.generate(inputs=input_ids,
+                        max_length=max_length,
+                        do_sample=True,
+                        top_k=10,
+                        temperature=0.7,
+                        pad_token_id=tokenizer.eos_token_id,
+                        attention_mask=input_ids.new_ones(input_ids.shape))
+tokenizer.decode(output[0], skip_special_tokens=True)
+This model was possible to create by tremendous work of mergekit developers. I decided to merge tinyLlama models to create mixture of experts. Config used as below:
+"""base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+experts:
+  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+    positive_prompts:
+    - "chat"
+    - "assistant"
+    - "tell me"
+    - "explain"
+  - source_model: 78health/TinyLlama_1.1B-function-calling
+    positive_prompts:
+    - "code"
+    - "python"
+    - "javascript"
+    - "programming"
+    - "algorithm"
+  - source_model: phanerozoic/Tiny-Pirate-1.1b-v0.1
+    positive_prompts:
+    - "storywriting"
+    - "write"
+    - "scene"
+    - "story"
+    - "character"
+  - source_model: Tensoic/TinyLlama-1.1B-3T-openhermes
+    positive_prompts:
+    - "reason"
+    - "provide"
+    - "instruct"
+    - "summarize"
+    - "count"
+"""