Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.05405

Papers - Training - Scaling Properties

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Paper • 2404.05405 • Published Apr 8 • 9
Scaling Laws for Precision

Paper • 2411.04330 • Published 13 days ago • 6

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28 • 34
Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 41
Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 38

LM Capabilities and Scaling

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15 • 27
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9 • 21
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 35
Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1 • 16

"Physics of Language Models" series

Physics of Language Models: Part 1, Context-Free Grammar

Paper • 2305.13673 • Published May 23, 2023 • 7
Physics of Language Models: Part 3.2, Knowledge Manipulation

Paper • 2309.14402 • Published Sep 25, 2023 • 6
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Paper • 2404.05405 • Published Apr 8 • 9
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

Paper • 2309.14316 • Published Sep 25, 2023 • 7

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs