Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 47
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 62
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published 11 days ago • 6
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published 25 days ago • 59
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23 • 21
Power-LM Collection Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated 3 days ago • 15
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated 25 days ago • 596
Model with Circuit Breakers Collection SoTA models with circuit breakers inserted. Top safety performance without losing capabilities. • 2 items • Updated Jul 9 • 3
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
view article Article RegMix: Data Mixture as Regression for Language Model Pre-training By SivilTaram • Jul 11 • 10
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1 • 33
🧬 RegMix: Data Mixture as Regression Collection Automatic data mixture method for large language model pre-training • 10 items • Updated Jul 26 • 5
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 57
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast Paper • 2402.08567 • Published Feb 13 • 2
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses Paper • 2406.01288 • Published Jun 3 • 1
Intriguing Properties of Data Attribution on Diffusion Models Paper • 2311.00500 • Published Nov 1, 2023 • 2
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition Paper • 2307.13269 • Published Jul 25, 2023 • 31
⚓️ Sailor Language Models Collection Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab. • 18 items • Updated Jul 26 • 16