76 18 194

Junyang Lin

JustinLin610

https://justinlin610.github.io

AI & ML interests

Pretraining, NLP, CV, etc.

Recent Activity

authored a paper 2 days ago

Qwen2.5 Technical Report

authored a paper 11 days ago

Evaluating and Aligning CodeLLMs on Human Preference

authored a paper 12 days ago

ProcessBench: Identifying Process Errors in Mathematical Reasoning

View all activity

Organizations

Posts 4

Post

2975

Finally, Qwen1.5-110B is out! With weights and demo!

Blog: https://qwenlm.github.io/blog/qwen1.5-110b/
Demo: Qwen/Qwen1.5-110B-Chat-demo
Base: Qwen/Qwen1.5-110B
Chat: Qwen/Qwen1.5-110B-Chat

This model has some specific features:
* GQA
* 32K token context length
* Multilingual support

We feel good about its performance on benchmarks, including those for base models and chat models, but we still need more of your testing and feedback to help us know its capabilities and limitations!

Additionally, the base model has not learned chatml tokens. Yeah if you use chatml format, you need to be careful about it!

Enjoy and stay tuned for Qwen2!

Post

3222

Just now, we release a small MoE model, Qwen1.5-MoE-A2.7B, a 14B model with 2.7B activated parameters. Leaving the hype, I would love to share more things here in HF. But if you don't know much about this, check our blog for more info: https://qwenlm.github.io/blog/qwen-moe/

At the beginning, it was trying with the MoE stuff, making Megatron work well with MegaBlocks. As always, we worked with small ones first. However, we have been struggling with a lot of details.

With megablocks and so many tricks that make training MoE models work, it is almost impossible to fail. The challenge is actually how good your model is. Then things became more complex than I had expected. Finegrained experts actually pissed me off but damn it works for the model at this scale. However, it brings complexity to the model, and this is somehow why at this moment our codes are not merged into llama.cpp cuz it really brings problems. Shared experts might be good, but we need more engineering efforts to really unleash its benefits in inference acceleration.

For the community, this is actually our first time releasing an MoE model. We don't know what will happen to us, but we are prepared for complaints. I just hope that we can really make things clear, and provide a good recipe to play with our MoE model just like people playing with Mixtral.

View all posts

ImageBind

models

None public yet

datasets

None public yet

Junyang Lin

AI & ML interests

Recent Activity

Organizations

Posts 4

Papers 21

spaces 2 Sort: Recently updated

BLOOMChat

ImageBind

models

datasets

spaces 2