Shuai Wang

Shuaiii

AI & ML interests

None yet

Recent Activity

upvoted a paper 8 days ago

RedPajama: an Open Dataset for Training Large Language Models

Reacted to JustinLin610's post with 👍 21 days ago

Just now, we release a small MoE model, Qwen1.5-MoE-A2.7B, a 14B model with 2.7B activated parameters. Leaving the hype, I would love to share more things here in HF. But if you don't know much about this, check our blog for more info: https://qwenlm.github.io/blog/qwen-moe/ At the beginning, it was trying with the MoE stuff, making Megatron work well with MegaBlocks. As always, we worked with small ones first. However, we have been struggling with a lot of details. With megablocks and so many tricks that make training MoE models work, it is almost impossible to fail. The challenge is actually how good your model is. Then things became more complex than I had expected. Finegrained experts actually pissed me off but damn it works for the model at this scale. However, it brings complexity to the model, and this is somehow why at this moment our codes are not merged into llama.cpp cuz it really brings problems. Shared experts might be good, but we need more engineering efforts to really unleash its benefits in inference acceleration. For the community, this is actually our first time releasing an MoE model. We don't know what will happen to us, but we are prepared for complaints. I just hope that we can really make things clear, and provide a good recipe to play with our MoE model just like people playing with Mixtral.

Reacted to JustinLin610's post with ❤️ 21 days ago

View all activity

Organizations

None yet

Shuaiii's activity

upvoted a paper 8 days ago

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published 9 days ago • 47

Reacted to JustinLin610's post with 👍❤️🔥 21 days ago

Post

2993

Just now, we release a small MoE model, Qwen1.5-MoE-A2.7B, a 14B model with 2.7B activated parameters. Leaving the hype, I would love to share more things here in HF. But if you don't know much about this, check our blog for more info: https://qwenlm.github.io/blog/qwen-moe/

At the beginning, it was trying with the MoE stuff, making Megatron work well with MegaBlocks. As always, we worked with small ones first. However, we have been struggling with a lot of details.

With megablocks and so many tricks that make training MoE models work, it is almost impossible to fail. The challenge is actually how good your model is. Then things became more complex than I had expected. Finegrained experts actually pissed me off but damn it works for the model at this scale. However, it brings complexity to the model, and this is somehow why at this moment our codes are not merged into llama.cpp cuz it really brings problems. Shared experts might be good, but we need more engineering efforts to really unleash its benefits in inference acceleration.

For the community, this is actually our first time releasing an MoE model. We don't know what will happen to us, but we are prepared for complaints. I just hope that we can really make things clear, and provide a good recipe to play with our MoE model just like people playing with Mixtral.

1 reply

upvoted a paper 28 days ago

Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents

Paper • 2410.13185 • Published Oct 17 • 6

upvoted a paper 29 days ago

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published 30 days ago • 9

upvoted 4 papers about 1 month ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22 • 88

Agents: An Open-source Framework for Autonomous Language Agents

Paper • 2309.07870 • Published Sep 14, 2023 • 42

Symbolic Learning Enables Self-Evolving Agents

Paper • 2406.18532 • Published Jun 26 • 11

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10 • 49

authored 3 papers about 1 month ago

upvoted 2 papers about 1 month ago

Weaver: Foundation Models for Creative Writing

Paper • 2401.17268 • Published Jan 30 • 43

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

Paper • 2410.13370 • Published Oct 17 • 35