Collections
Discover the best community collections!
Collections including paper arxiv:2412.15115
-
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification
Paper • 2411.19638 • Published • 6 -
Word Sense Linking: Disambiguating Outside the Sandbox
Paper • 2412.09370 • Published • 8 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 113 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 334
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 19 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 18 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 15
-
Differential Transformer
Paper • 2410.05258 • Published • 168 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 119 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 105 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 41
-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Scaling Laws for Autoregressive Generative Modeling
Paper • 2010.14701 • Published -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4
-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 138 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 73
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 32 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 25 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 121 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 21
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 51 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 41 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 52
-
Apple Intelligence Foundation Language Models
Paper • 2407.21075 • Published • 3 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 110 -
Nemotron-4 340B Technical Report
Paper • 2406.11704 • Published -
Gemma 2: Improving Open Language Models at a Practical Size
Paper • 2408.00118 • Published • 75