🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 14 items • Updated 6 days ago • 105
👩💻 OlympicCoder Collection Reasoning datasets and models for competitive coding • 4 items • Updated 6 days ago • 10
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 3 items • Updated 4 days ago • 96
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 60
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 205
WildChat-50m Collection All model responses associated with the WildChat-50m paper. • 55 items • Updated Jan 29 • 7
Financial Sentiment Analysis 💲📈 Collection Financial Sentiment Analysis models I created • 3 items • Updated Jan 16 • 4
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 61
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP Paper • 2408.04303 • Published Aug 8, 2024 • 20
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 50
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 135
📚 FineWeb-Edu Collection FineWeb-Edu datasets, classifier and ablation model • 5 items • Updated Jun 12, 2024 • 13