SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 2 days ago • 103
ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published 4 days ago • 22
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published 4 days ago • 156
TinySwallow Collection Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" • 5 items • Updated 8 days ago • 13
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Paper • 2501.16937 • Published 10 days ago • 4
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 12 days ago • 23
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 3 items • Updated 11 days ago • 324
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 2 items • Updated 12 days ago • 98