Datasets and models for EMNLP paper "Scalable Data Ablation Approximations for Language Models through Modular Training and Merging"
Clara Na
claran
AI & ML interests
None yet
Recent Activity
authored
a paper
28 days ago
updated
a dataset
28 days ago
claran/modular-s2orc
updated
a collection
28 days ago
Scalable Data Ablations
Organizations
Collections
1
Papers
1
models
30
claran/s2orc-biology1994-1999-ind-130m
Updated
claran/s2orc-biology2007-2008-ind-130m
Updated
claran/s2orc-biology2013-2013-ind-130m
Updated
•
1
claran/s2orc-biology2021-2021-ind-130m
Updated
•
1
claran/s2orc-biology2019-2019-ind-130m
Updated
claran/s2orc-biology2000-2003-ind-130m
Updated
claran/s2orc-biology2015-2015-ind-130m
Updated
•
2
claran/s2orc-biology2014-2014-ind-130m
Updated
claran/s2orc-biology2004-2006-ind-130m
Updated
claran/s2orc-biology2016-2016-ind-130m
Updated
•
2
datasets
6
claran/modular-s2orc
Viewer
•
Updated
•
7.47M
•
282
•
1
claran/seed-pretrain-decon
Viewer
•
Updated
•
3.45M
•
42
claran/m2d2-wiki-decon
Viewer
•
Updated
•
5.3M
•
93
claran/seed-pretrain-decon-parquet
Viewer
•
Updated
•
6.61M
•
79
claran/m2d2-wiki-decon-parquet
Viewer
•
Updated
•
10.6M
•
2.91k
claran/modular-s2orc-parquet
Viewer
•
Updated
•
7.47M
•
33