sethuiyer commited on
Commit
22db51c
1 Parent(s): 57bfa7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -4
README.md CHANGED
@@ -1,12 +1,16 @@
1
  ---
2
- license: apache-2.0
3
  tags:
4
  - moe
5
  - merge
 
6
  - mergekit
7
- - lazymergekit
8
  - sethuiyer/Dr_Samantha_7b_mistral
9
  - fblgit/UNA-TheBeagle-7b-v1
 
 
 
 
10
  ---
11
 
12
  # MedleyMD
@@ -46,11 +50,28 @@ tokenizer = AutoTokenizer.from_pretrained(model)
46
  pipeline = transformers.pipeline(
47
  "text-generation",
48
  model=model,
49
- model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
50
  )
51
 
52
- messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
 
 
 
 
 
 
 
 
 
53
  prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
54
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
55
  print(outputs[0]["generated_text"])
 
 
 
 
 
 
 
 
56
  ```
 
1
  ---
2
+ license: cc-by-nc-nd-4.0
3
  tags:
4
  - moe
5
  - merge
6
+ - medical
7
  - mergekit
 
8
  - sethuiyer/Dr_Samantha_7b_mistral
9
  - fblgit/UNA-TheBeagle-7b-v1
10
+ language:
11
+ - en
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
  ---
15
 
16
  # MedleyMD
 
50
  pipeline = transformers.pipeline(
51
  "text-generation",
52
  model=model,
53
+ model_kwargs={"torch_dtype": torch.bfloat16, "load_in_4bit": True},
54
  )
55
 
56
+ generation_kwargs = {
57
+ "max_new_tokens": 512,
58
+ "do_sample": True,
59
+ "temperature": 0.7,
60
+ "top_k": 50,
61
+ "top_p": 95,
62
+ }
63
+
64
+ messages = [{"role":"system", "content":"You are an helpful AI assistant. Please use </s> when you want to end the answer."},
65
+ {"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
66
  prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
67
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
68
  print(outputs[0]["generated_text"])
69
+ ```
70
+
71
+ ```text
72
+ A Mixture of Experts (Mixout) is a neural network architecture that combines the strengths of multiple expert networks to make a more accurate and robust prediction.
73
+ It is composed of a topmost gating network that assigns weights to each expert network based on their performance on past input samples.
74
+ The expert networks are trained independently, and the gating network learns to choose the best combination of these experts to make the final prediction.
75
+ Mixout demonstrates a stronger ability to handle complex data distributions and is more efficient in terms of training time and memory usage compared to a
76
+ traditional ensemble approach.
77
  ```