Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.11794

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48
2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published May 28 • 22
Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14 • 37

Training-related

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48
griffin/chain_of_density

Viewer • Updated Sep 8, 2023 • 1.1k • 26 • 68
HuggingFaceFV/finevideo

Viewer • Updated 7 days ago • 43.8k • 185 • 236

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Paper • 2406.10209 • Published Jun 14 • 8
Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27 • 51
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17 • 56

Relevant-Papers-Midterm

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Paper • 2402.14848 • Published Feb 19 • 18
The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6 • 52
CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published Jun 7 • 40
Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13 • 43

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13 • 16
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24 • 13
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20 • 13
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17 • 29

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 13
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 84
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 108

cerebras/SlimPajama-627B

Preview • Updated Jul 7, 2023 • 4.63k • 415
JeanKaddour/minipile

Viewer • Updated Jun 20, 2023 • 1.01M • 1.65k • 113
nampdn-ai/tiny-textbooks

Viewer • Updated Jul 3 • 420k • 155 • 146
open-phi/textbooks

Viewer • Updated Oct 8, 2023 • 1.8k • 1.09k • 82

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 29
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Paper • 2406.08707 • Published Jun 13 • 15
DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48

Daily paper that is inspiring (abstract is enough)

World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13 • 36
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 79
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 96
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19 • 48

Dataset pruning/cleaning/dedup

AlpaGasus: Training A Better Alpaca with Fewer Data

Paper • 2307.08701 • Published Jul 17, 2023 • 22
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 6
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 22
SlimPajama-DC: Understanding Data Combinations for LLM Training

Paper • 2309.10818 • Published Sep 19, 2023 • 10

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs