-
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 53 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 100 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 116 -
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Paper • 2501.12370 • Published • 10
![Av's picture](https://cdn-avatars.huggingface.co/v1/production/uploads/66bb574383782b108b356528/mirlNgWD1UujajVBbahqO.jpeg)
Av
Avi66
AI & ML interests
ML Research , LLMs , Applications
MultiModality
Recent Activity
updated
a collection
2 days ago
Vlm
updated
a collection
2 days ago
Spaces
updated
a collection
2 days ago
Papers
Organizations
None yet
Collections
4
models
None public yet
datasets
None public yet