Hugging Face TB Research

Enterprise

community

AI & ML interests

Exploring smol models and high quality web and synthetic datasets, generated by LLMs (TB is for Textbook, as inspired by the "Textbooks are all your need" paper)

Recent Activity

lewtun authored a paper 4 days ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

reach-vb new activity 7 days ago

HuggingFaceTB/SmolVLM2-2.2B-Instruct:Ollama Availability

Xenova new activity 7 days ago

HuggingFaceTB/SmolVLM2-2.2B-Instruct:Inference endpoint not working

View all activity

Organization Card

Community About org cards

HuggingFaceTB

This is the home for smol models (SmolLM & SmolVLM) and high quality pre-training datasets. We released:

FineWeb-Edu: a filtered version of FineWeb dataset for educational content, paper available here.
Cosmopedia: the largest open synthetic dataset, with 25B tokens and 30M samples. It contains synthetic textbooks, blog posts, and stories, posts generated by Mixtral. Blog post available here.
Smollm-Corpus: the pre-training corpus of SmolLM: Cosmopedia v0.2, FineWeb-Edu dedup and Python-Edu. Blog post available here.
SmolLM2 models: a series of strong small models in three sizes: 135M, 360M and 1.7B
SmolVLM2: a family of small Video and Vision models in three sizes: 2.2B, 500M and 256M. Blog post available here.
FineMath: the best public math pretraining dataset with 50B tokens of mathematical and problem solving data.

News 🗞️

FineMath: the best public math pretraining dataset with 50B tokens of mathematical and problem solving data https://huggingface.co/datasets/HuggingFaceTB/finemath

Collections 13

spaces 12

SmolVLM2 XSPFGenerator (VLC prototype)

Generate video highlights and playlist

SmolVLM2 IPhone Waitlist

sign in to receive news on the iPhone app

SmolVLM2 HighlightGenerator

Generate video highlights from uploaded video

Running on Zero

SmolVLM

Generate text by analyzing images and videos

SmolVLM 256M Instruct WebGPU

Generate descriptions for images using WebGPU technology

SmolVLM 500M Instruct WebGPU

models 73

HuggingFaceTB/SmolLM2-1.7B-Instruct

Text Generation • Updated 10 days ago • 660k • • 574

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 10 days ago • 60.9k • 409

HuggingFaceTB/SmolVLM-500M-Instruct

Image-Text-to-Text • Updated 10 days ago • 18.5k • 114

HuggingFaceTB/SmolVLM-256M-Instruct

Image-Text-to-Text • Updated 10 days ago • 37.5k • 168

HuggingFaceTB/SmolVLM2-256M-Video-Instruct

Image-Text-to-Text • Updated 10 days ago • 5.46k • 41

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

Image-Text-to-Text • Updated 10 days ago • 7.55k • 42

HuggingFaceTB/SmolVLM2-2.2B-Instruct

Image-Text-to-Text • Updated 10 days ago • 505k • 112

HuggingFaceTB/SmolLM2-360M-intermediate-checkpoints

Updated 17 days ago • 90

HuggingFaceTB/SmolLM2-1.7B-intermediate-checkpoints

Updated 17 days ago • 830

HuggingFaceTB/SmolLM2-135M-intermediate-checkpoints

Updated 17 days ago • 63

datasets 37

HuggingFaceTB/dclm-edu

Viewer • Updated 9 days ago • 1B • 9.83k • 21

HuggingFaceTB/SmolLM2-intermediate-evals

Viewer • Updated 13 days ago • 582 • 100

HuggingFaceTB/smoltalk

Viewer • Updated Feb 10 • 2.2M • 8.12k • 314

HuggingFaceTB/smol-smoltalk

Viewer • Updated Feb 6 • 485k • 858 • 34

HuggingFaceTB/finemath

Viewer • Updated Feb 6 • 48.3M • 10.2k • 292

HuggingFaceTB/everyday-conversations-llama3.1-2k

Viewer • Updated Jan 29 • 2.38k • 653 • 98

HuggingFaceTB/MagPie-Pro-300k-MT

Viewer • Updated Jan 29 • 300k • 143

HuggingFaceTB/finemath_contamination_report

Viewer • Updated Jan 7 • 5.33k • 101 • 1

HuggingFaceTB/math_tasks

Viewer • Updated Dec 23, 2024 • 21.3k • 226 • 1

HuggingFaceTB/MATH

Updated Oct 16, 2024 • 167 • 4