Aligning Teacher with Student Preferences for Tailored Training Data Generation Paper • 2406.19227 • Published 2 days ago • 15
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 4 days ago • 66
Unlocking Continual Learning Abilities in Language Models Paper • 2406.17245 • Published 4 days ago • 23
Efficient Continual Pre-training by Mitigating the Stability Gap Paper • 2406.14833 • Published 8 days ago • 18
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published 7 days ago • 39
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published 11 days ago • 34
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published 12 days ago • 34
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published 11 days ago • 26
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published 12 days ago • 54
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation Paper • 2406.10996 • Published 13 days ago • 31
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published 12 days ago • 60
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published 16 days ago • 85
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 18 days ago • 34
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published 23 days ago • 46
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 19 days ago • 60
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published 19 days ago • 22
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published 22 days ago • 23
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published 23 days ago • 19
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Paper • 2406.04271 • Published 23 days ago • 25
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published 25 days ago • 35
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published 29 days ago • 60
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published 30 days ago • 27
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Paper • 2405.19325 • Published about 1 month ago • 13
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23 • 21
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24 • 52
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 44
Observational Scaling Laws and the Predictability of Language Model Performance Paper • 2405.10938 • Published May 17 • 10
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14 • 27
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 106
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1 • 30
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 Paper • 2405.00664 • Published May 1 • 18
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 67
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 56
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 240
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19 • 39
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7 • 29
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18 • 16