kenshinn commited on
Commit
addd9e7
β€’
1 Parent(s): ed9924e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -57
README.md CHANGED
@@ -1,58 +1,62 @@
1
- ---
2
- license: mit
3
- ---
4
-
5
- <h2 align="center"> <a href="https://arxiv.org/abs/2405.14297">Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models</a></h2>
6
- <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/LINs-lab/DynMoE">GitHub</a> and cite our paper!</h2>
7
- <h5 align="center">
8
-
9
- ## πŸ“° News
10
-
11
- - **[2024.05.25]** πŸ”₯ Our **checkpoints** are available now!
12
- - **[2024.05.23]** πŸ”₯ Our [paper](https://arxiv.org/abs/2405.14297) is released!
13
-
14
- ## 😎 What's Interesting?
15
-
16
- **Dynamic Mixture of Experts (DynMoE)** incorporates (1) a novel gating method that enables each token to automatically determine the number of experts to activate. (2) An adaptive process automatically adjusts the number of experts during training.
17
-
18
- ### Top-Any Gating
19
-
20
- <video controls src="https://i.imgur.com/bLgNaoH.mp4" title="Top-Any Gating"></video>
21
-
22
- ### Adaptive Training Process
23
-
24
- ![](https://cdn.jsdelivr.net/gh/QAQdev/Pics@master/uPic/adaptive.png)
25
-
26
- ## πŸ’‘ Model Details
27
-
28
- - πŸ€” DynMoE-Qwen is a MoE model with **dynamic top-k gating**, finetuned on [LanguageBind/MoE-LLaVA-Qwen-Stage2](https://huggingface.co/LanguageBind/MoE-LLaVA-Qwen-Stage2).
29
- - πŸš€ Our DynMoE-Qwen-1.8B has totally 3.1B parameters, but **only 2.2B are activated!** (average top-k = 1.86)
30
- - βŒ› With the DynMoE tuning stage, we can complete training on 8 A100 GPUs **within 40 hours.**
31
-
32
- ## πŸ‘ Acknowledgement
33
-
34
- We are grateful for the following awesome projects:
35
-
36
- - [tutel](https://github.com/microsoft/tutel)
37
- - [DeepSpeed](https://github.com/microsoft/DeepSpeed)
38
- - [GMoE](https://github.com/Luodian/Generalizable-Mixture-of-Experts)
39
- - [EMoE](https://github.com/qiuzh20/EMoE)
40
- - [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)
41
- - [GLUE-X](https://github.com/YangLinyi/GLUE-X)
42
-
43
- ## πŸ”’ License
44
-
45
- This project is released under the MIT license as found in the [LICENSE](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) file.
46
-
47
- ## ✏️ Citation
48
-
49
- ```tex
50
- @misc{guo2024dynamic,
51
- title={Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models},
52
- author={Yongxin Guo and Zhenglin Cheng and Xiaoying Tang and Tao Lin},
53
- year={2024},
54
- eprint={2405.14297},
55
- archivePrefix={arXiv},
56
- primaryClass={cs.LG}
57
- }
 
 
 
 
58
  ```
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-text-to-text
4
+ tags:
5
+ - text-generation-inference
6
+ ---
7
+
8
+ <h2 align="center"> <a href="https://arxiv.org/abs/2405.14297">Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models</a></h2>
9
+ <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/LINs-lab/DynMoE">GitHub</a> and cite our paper!</h2>
10
+ <h5 align="center">
11
+
12
+ ## πŸ“° News
13
+
14
+ - **[2024.5.31]** πŸ”₯ Our [code](https://github.com/LINs-lab/DynMoE/) is released!
15
+ - **[2024.05.25]** πŸ”₯ Our **checkpoints** are available now!
16
+ - **[2024.05.23]** πŸ”₯ Our [paper](https://arxiv.org/abs/2405.14297) is released!
17
+
18
+ ## 😎 What's Interesting?
19
+
20
+ **Dynamic Mixture of Experts (DynMoE)** incorporates (1) a novel gating method that enables each token to automatically determine the number of experts to activate. (2) An adaptive process automatically adjusts the number of experts during training.
21
+
22
+ ### Top-Any Gating
23
+
24
+ <video controls src="https://i.imgur.com/bLgNaoH.mp4" title="Top-Any Gating"></video>
25
+
26
+ ### Adaptive Training Process
27
+
28
+ ![](https://cdn.jsdelivr.net/gh/QAQdev/Pics@master/uPic/adaptive.png)
29
+
30
+ ## πŸ’‘ Model Details
31
+
32
+ - πŸ€” DynMoE-Qwen is a MoE model with **dynamic top-k gating**, finetuned on [LanguageBind/MoE-LLaVA-Qwen-Stage2](https://huggingface.co/LanguageBind/MoE-LLaVA-Qwen-Stage2).
33
+ - πŸš€ Our DynMoE-Qwen-1.8B has totally 3.1B parameters, but **only 2.2B are activated!** (average top-k = 1.86)
34
+ - βŒ› With the DynMoE tuning stage, we can complete training on 8 A100 GPUs **within 40 hours.**
35
+
36
+ ## πŸ‘ Acknowledgement
37
+
38
+ We are grateful for the following awesome projects:
39
+
40
+ - [tutel](https://github.com/microsoft/tutel)
41
+ - [DeepSpeed](https://github.com/microsoft/DeepSpeed)
42
+ - [GMoE](https://github.com/Luodian/Generalizable-Mixture-of-Experts)
43
+ - [EMoE](https://github.com/qiuzh20/EMoE)
44
+ - [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA)
45
+ - [GLUE-X](https://github.com/YangLinyi/GLUE-X)
46
+
47
+ ## πŸ”’ License
48
+
49
+ This project is released under the MIT license as found in the [LICENSE](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) file.
50
+
51
+ ## ✏️ Citation
52
+
53
+ ```tex
54
+ @misc{guo2024dynamic,
55
+ title={Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models},
56
+ author={Yongxin Guo and Zhenglin Cheng and Xiaoying Tang and Tao Lin},
57
+ year={2024},
58
+ eprint={2405.14297},
59
+ archivePrefix={arXiv},
60
+ primaryClass={cs.LG}
61
+ }
62
  ```