Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24 • 52
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 30 days ago • 49
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 28
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24 • 12
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published 26 days ago • 60
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published 27 days ago • 18
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published 20 days ago • 36
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 20 days ago • 26
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published 20 days ago • 14
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published 19 days ago • 49
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published 19 days ago • 23
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 16 days ago • 60
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published 16 days ago • 22
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization Paper • 2406.05981 • Published 17 days ago • 10
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published 18 days ago • 12
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 15 days ago • 52
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published 23 days ago • 17
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering Paper • 2406.06573 • Published 23 days ago • 7
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published 16 days ago • 34
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published 17 days ago • 21
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published 14 days ago • 44