Xiaosen Zheng

xszheng2020

AI & ML interests

Data-Centric AI and AI Safety.

Organizations

xszheng2020's activity

upvoted an article about 21 hours ago
view article
Article

SmolLM - blazingly fast and remarkably powerful

248
upvoted an article 9 days ago
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

62
upvoted 2 articles 3 months ago
view article
Article

How NuminaMath Won the 1st AIMO Progress Prize

95
view article
Article

RegMix: Data Mixture as Regression for Language Model Pre-training

10