-
A Survey of Small Language Models
Paper • 2410.20011 • Published • 40 -
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper • 2410.23168 • Published • 24 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 59 -
GPT or BERT: why not both?
Paper • 2410.24159 • Published • 14
Ekaterina
h1de0us
·
AI & ML interests
None yet
Recent Activity
updated
a collection
about 1 month ago
[to-read]
updated
a collection
about 1 month ago
[to-read]
updated
a collection
about 1 month ago
[to-read]
Organizations
None yet
Collections
2
models
None public yet
datasets
None public yet