Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,16 @@
|
|
1 |
---
|
2 |
-
license:
|
3 |
tags:
|
4 |
- moe
|
5 |
- merge
|
|
|
6 |
- mergekit
|
7 |
-
- lazymergekit
|
8 |
- sethuiyer/Dr_Samantha_7b_mistral
|
9 |
- fblgit/UNA-TheBeagle-7b-v1
|
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
# MedleyMD
|
@@ -46,11 +50,28 @@ tokenizer = AutoTokenizer.from_pretrained(model)
|
|
46 |
pipeline = transformers.pipeline(
|
47 |
"text-generation",
|
48 |
model=model,
|
49 |
-
model_kwargs={"torch_dtype": torch.
|
50 |
)
|
51 |
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
54 |
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
55 |
print(outputs[0]["generated_text"])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
```
|
|
|
1 |
---
|
2 |
+
license: cc-by-nc-nd-4.0
|
3 |
tags:
|
4 |
- moe
|
5 |
- merge
|
6 |
+
- medical
|
7 |
- mergekit
|
|
|
8 |
- sethuiyer/Dr_Samantha_7b_mistral
|
9 |
- fblgit/UNA-TheBeagle-7b-v1
|
10 |
+
language:
|
11 |
+
- en
|
12 |
+
library_name: transformers
|
13 |
+
pipeline_tag: text-generation
|
14 |
---
|
15 |
|
16 |
# MedleyMD
|
|
|
50 |
pipeline = transformers.pipeline(
|
51 |
"text-generation",
|
52 |
model=model,
|
53 |
+
model_kwargs={"torch_dtype": torch.bfloat16, "load_in_4bit": True},
|
54 |
)
|
55 |
|
56 |
+
generation_kwargs = {
|
57 |
+
"max_new_tokens": 512,
|
58 |
+
"do_sample": True,
|
59 |
+
"temperature": 0.7,
|
60 |
+
"top_k": 50,
|
61 |
+
"top_p": 95,
|
62 |
+
}
|
63 |
+
|
64 |
+
messages = [{"role":"system", "content":"You are an helpful AI assistant. Please use </s> when you want to end the answer."},
|
65 |
+
{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
|
66 |
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
67 |
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
68 |
print(outputs[0]["generated_text"])
|
69 |
+
```
|
70 |
+
|
71 |
+
```text
|
72 |
+
A Mixture of Experts (Mixout) is a neural network architecture that combines the strengths of multiple expert networks to make a more accurate and robust prediction.
|
73 |
+
It is composed of a topmost gating network that assigns weights to each expert network based on their performance on past input samples.
|
74 |
+
The expert networks are trained independently, and the gating network learns to choose the best combination of these experts to make the final prediction.
|
75 |
+
Mixout demonstrates a stronger ability to handle complex data distributions and is more efficient in terms of training time and memory usage compared to a
|
76 |
+
traditional ensemble approach.
|
77 |
```
|