Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
JournalistsonHF
's Collections
Transcription
Image Tools
Test Chat Models
For Fun & Understanding AI Capabilities
Datasets
Text-Analysis Tools
LLMs Evaluation
Data Journalism
Text-to-Speech
Datasets
updated
16 days ago
A curated list of datasets to train your models
Upvote
1
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
17 days ago
•
3B
•
221k
•
343
CIVICS-dataset/CIVICS
Viewer
•
Updated
May 13
•
700
•
6
•
3
HuggingFaceFW/fineweb
Viewer
•
Updated
3 days ago
•
46B
•
29.4k
•
1.5k
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Apr 16
•
31.1M
•
3.06k
•
503
academic-datasets/AMMeBa
Preview
•
Updated
May 21
•
3
HuggingFaceM4/OBELICS
Viewer
•
Updated
Aug 22, 2023
•
276M
•
2.01k
•
123
bigcode/the-stack-v2
Viewer
•
Updated
Apr 23
•
5.45B
•
276
•
226
pixparse/pdfa-eng-wds
Viewer
•
Updated
Mar 29
•
7.1k
•
611
•
89
pixparse/idl-wds
Viewer
•
Updated
Mar 29
•
3.41M
•
40
•
110
argilla/OpenHermesPreferences
Viewer
•
Updated
Mar 1
•
989k
•
178
•
174
argilla/Capybara-Preferences
Viewer
•
Updated
May 9
•
15.4k
•
239
•
34
PleIAs/YouTube-Commons
Updated
3 days ago
•
365
•
287
PleIAs/French-PD-Newspapers
Viewer
•
Updated
Mar 19
•
37.5k
•
32
•
60
mozilla-foundation/common_voice_17_0
Viewer
•
Updated
13 days ago
•
13M
•
43.4k
•
100
satellogic/EarthView
Updated
3 days ago
•
11k
•
99
Upvote
1
Share collection
View history
Collection guide
Browse collections