Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
82
10
15
Guilherme Penedo
guipenedo
Follow
DEER2U's profile picture
storytellerssafari's profile picture
alexx855's profile picture
733 followers
·
6 following
gui_penedo
guipenedo
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 6 hours ago
HuggingFaceFW/fineweb-edu-score-2
updated
a dataset
about 6 hours ago
HuggingFaceFW/fineweb-edu
updated
a dataset
about 6 hours ago
HuggingFaceFW/fineweb
View all activity
Articles
FineWeb2-C: Help Build Better Language Models in Your Language
11 days ago
•
10
Organizations
guipenedo
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
HuggingFaceFW/fineweb-edu
7 days ago
New update returns a 500 server error using the datasets-server API
3
#18 opened 7 days ago by
jonna32
New activity in
HuggingFaceFW/fineweb
16 days ago
Simple exact deduplication removes 2/3 of data.
4
#49 opened 5 months ago by
egor-pakhomov
Torrent?
3
#4 opened 9 months ago by
emilss
Any plan to train models on larger subset of dataset?
1
#8 opened 9 months ago by
mrfakename
Are copyrighted works included in this dataset?
4
#9 opened 9 months ago by
umm-maybe
Reprocessing for a new language
14
#12 opened 8 months ago by
pere
Training configs for data ablation study
2
#14 opened 8 months ago by
jimmyhbx
tiny-fineweb
3
#19 opened 8 months ago by
3thn
Unsafe files
1
#25 opened 8 months ago by
alielfilali01
"Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20" using fineweb by Karpathy
#28 opened 7 months ago by
clem
Regarding to the newly updated indexes(writen as deduplication issues)
5
#29 opened 7 months ago by
kimcando
Dedup
1
#32 opened 7 months ago by
shawnkx
Language subset
3
#33 opened 7 months ago by
talmor
How to compute the aggerate score?
1
#35 opened 7 months ago by
mornmirror
why do you apply "All filters except the (very destructive) terminal_punct"
3
#36 opened 7 months ago by
bpwl0121
Reproducibility of the work for other languages
3
#38 opened 7 months ago by
camillop
Fineweb train configuration
3
#39 opened 7 months ago by
nezhazheng
Casting Issue?
4
#40 opened 7 months ago by
FelixLabelle
Any plans to release warc content after the language filtering steps?
2
#41 opened 7 months ago by
Splend1dchan
Is there an official test set for benchmarking objectively?
2
#42 opened 6 months ago by
SophieOstmeier
Load more