Isotonic
/

TinyQwex-4x620M-MoE

@@ -16,74 +16,12 @@ TinyQwex-4x620M-MoE is a Mixure of Experts (MoE) made with the following models
 * [Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)
 * [Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)
-## 🧩 Configuration
-```yamlbase_model: Qwen/Qwen1.5-0.5B
-experts:
-  - source_model: Qwen/Qwen1.5-0.5B
-    positive_prompts:
-    - "reasoning"
-    - "logic"
-    - "problem-solving"
-    - "critical thinking"
-    - "analysis"
-    - "synthesis"
-    - "evaluation"
-    - "decision-making"
-    - "judgment"
-    - "insight"
-  - source_model: Qwen/Qwen1.5-0.5B
-    positive_prompts:
-    - "program"
-    - "software"
-    - "develop"
-    - "build"
-    - "create"
-    - "design"
-    - "implement"
-    - "debug"
-    - "test"
-    - "code"
-    - "python"
-    - "programming"
-    - "algorithm"
-    - "function"
-  - source_model: Qwen/Qwen1.5-0.5B
-    positive_prompts:
-    - "storytelling"
-    - "narrative"
-    - "fiction"
-    - "creative writing"
-    - "plot"
-    - "characters"
-    - "dialogue"
-    - "setting"
-    - "emotion"
-    - "imagination"
-    - "scene"
-    - "story"
-    - "character"
-  - source_model: Qwen/Qwen1.5-0.5B
-    positive_prompts:
-    - "chat"
-    - "conversation"
-    - "dialogue"
-    - "discuss"
-    - "ask questions"
-    - "share thoughts"
-    - "explore ideas"
-    - "learn new things"
-    - "personal assistant"
-    - "friendly helper"
-```
 ## 💻 Usage
 ```python
-!pip install -qU transformers bitsandbytes accelerate
 from transformers import AutoTokenizer
 import transformers
@@ -91,15 +29,36 @@ import torch
 model = "Isotonic/TinyQwex-4x620M-MoE"
-tokenizer = AutoTokenizer.from_pretrained(model)
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
-    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
 )
 messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
 prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
 print(outputs[0]["generated_text"])
 ```

 * [Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)
 * [Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)
 ## 💻 Usage
 ```python
+!pip install -qU transformers bitsandbytes accelerate eniops
 from transformers import AutoTokenizer
 import transformers
 model = "Isotonic/TinyQwex-4x620M-MoE"
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B")
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
+    model_kwargs={"torch_dtype": torch.bfloat16, "load_in_4bit": True},
 )
 messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
 prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
 print(outputs[0]["generated_text"])
+```
+## 🧩 Configuration
+```yamlbase_model: Qwen/Qwen1.5-0.5B
+experts:
+  - source_model: Qwen/Qwen1.5-0.5B
+    positive_prompts:
+    - "reasoning"
+  - source_model: Qwen/Qwen1.5-0.5B
+    positive_prompts:
+    - "program"
+  - source_model: Qwen/Qwen1.5-0.5B
+    positive_prompts:
+    - "storytelling"
+  - source_model: Qwen/Qwen1.5-0.5B
+    positive_prompts:
+    - "Instruction following assistant"
 ```