Collections
Discover the best community collections!
Collections including paper arxiv:2406.11794
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 48 -
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Paper • 2406.10209 • Published • 8 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 51 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 56
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 18 -
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper • 2406.06608 • Published • 52 -
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 40 -
Transformers meet Neural Algorithmic Reasoners
Paper • 2406.09308 • Published • 43
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 16 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 13 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 13 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 29
-
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 84 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 48 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 108
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 29 -
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper • 2404.01294 • Published • 15 -
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper • 2406.08707 • Published • 15 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 48
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 36 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 96 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
AlpaGasus: Training A Better Alpaca with Fewer Data
Paper • 2307.08701 • Published • 22 -
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Paper • 2303.03915 • Published • 6 -
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
SlimPajama-DC: Understanding Data Combinations for LLM Training
Paper • 2309.10818 • Published • 10