File size: 2,428 Bytes
28094bc
 
 
226438f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: apache-2.0
---

This is a q5_K_M GGUF quantization of https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE.

Not sure how well it performs, also my first quantization, so fingers crossed.

It is a Mixture of Experts model with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 as it's base model.

The other 3 models in the merge are:

https://huggingface.co/78health/TinyLlama_1.1B-function-calling

https://huggingface.co/phanerozoic/Tiny-Pirate-1.1b-v0.1

https://huggingface.co/Tensoic/TinyLlama-1.1B-3T-openhermes

I make no claims to any of the development, i simply wanted to try it out so I quantized and then thought I'd share it if anyone else was feeling experimental.

-------

Model card from https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE

Example usage:

from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")
tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE")

input_text =  """
###Input: You are a pirate. tell me a story about wrecked ship.
###Response:
""")

input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device)
output = model.generate(inputs=input_ids,
                        max_length=max_length,
                        do_sample=True,
                        top_k=10,
                        temperature=0.7,
                        pad_token_id=tokenizer.eos_token_id,
                        attention_mask=input_ids.new_ones(input_ids.shape))
tokenizer.decode(output[0], skip_special_tokens=True)

This model was possible to create by tremendous work of mergekit developers. I decided to merge tinyLlama models to create mixture of experts. Config used as below:

"""base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
experts:
  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
    positive_prompts:
    - "chat"
    - "assistant"
    - "tell me"
    - "explain"
  - source_model: 78health/TinyLlama_1.1B-function-calling
    positive_prompts:
    - "code"
    - "python"
    - "javascript"
    - "programming"
    - "algorithm"
  - source_model: phanerozoic/Tiny-Pirate-1.1b-v0.1
    positive_prompts:
    - "storywriting"
    - "write"
    - "scene"
    - "story"
    - "character"
  - source_model: Tensoic/TinyLlama-1.1B-3T-openhermes
    positive_prompts:
    - "reason"
    - "provide"
    - "instruct"
    - "summarize"
    - "count"
"""