-
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 11 -
ProTIP: Progressive Tool Retrieval Improves Planning
Paper • 2312.10332 • Published • 7 -
Paloma: A Benchmark for Evaluating Language Model Fit
Paper • 2312.10523 • Published • 11 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 70
daje kang
daje
·
AI & ML interests
None yet
Organizations
None yet
Collections
1
models
5
datasets
6
daje/mistral_tokenized_en_wiki
Viewer
•
Updated
•
16.1M
•
2
daje/mistral_tokenized_ko_wiki
Viewer
•
Updated
•
1.7M
•
1
daje/tokenized_enwiki
Viewer
•
Updated
•
16.4M
•
3
daje/tokenized_kowiki
Viewer
•
Updated
•
1.71M
•
3
daje/en_wiki
Viewer
•
Updated
•
5.09M
•
2
daje/ko_wiki
Viewer
•
Updated
•
311k
•
17
•
5