The FinBen: An Holistic Financial Benchmark for Large Language Models Paper • 2402.12659 • Published Feb 20 • 13
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization Paper • 2402.13249 • Published Feb 20 • 10
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks Paper • 2311.07463 • Published Nov 13, 2023 • 13
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 8
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 43
PEFTDebias : Capturing debiasing information using PEFTs Paper • 2312.00434 • Published Dec 1, 2023 • 1
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers Paper • 2402.01911 • Published Feb 2 • 2
Empirical Study of PEFT techniques for Winter Wheat Segmentation Paper • 2310.01825 • Published Oct 3, 2023 • 2
LoRA: Low-Rank Adaptation of Large Language Models Paper • 2106.09685 • Published Jun 17, 2021 • 25
L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ Paper • 2402.04902 • Published Feb 7 • 1
Self-Instruct: Aligning Language Model with Self Generated Instructions Paper • 2212.10560 • Published Dec 20, 2022 • 6
Efficient Training of Language Models to Fill in the Middle Paper • 2207.14255 • Published Jul 28, 2022 • 1
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Paper • 2402.17193 • Published Feb 27 • 23
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 572
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model Paper • 2402.17412 • Published Feb 27 • 21
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT Paper • 2402.16840 • Published Feb 26 • 23
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models Paper • 2402.10524 • Published Feb 16 • 20
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models Paper • 2402.10986 • Published Feb 16 • 74
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 46
Training-Free Long-Context Scaling of Large Language Models Paper • 2402.17463 • Published Feb 27 • 19
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs Paper • 2403.02775 • Published Mar 5 • 11
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 33
OneBit: Towards Extremely Low-bit Large Language Models Paper • 2402.11295 • Published Feb 17 • 21
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 52
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 24
V3D: Video Diffusion Models are Effective 3D Generators Paper • 2403.06738 • Published Mar 11 • 28
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 51
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 73
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 54
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 59
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 58
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model Paper • 2206.14371 • Published Jun 29, 2022 • 3
Model Stock: All we need is just a few fine-tuned models Paper • 2403.19522 • Published Mar 28 • 9
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
Long-context LLMs Struggle with Long In-context Learning Paper • 2404.02060 • Published Apr 2 • 33
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding Paper • 2403.04797 • Published Mar 5 • 1
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 40
Latxa: An Open Language Model and Evaluation Suite for Basque Paper • 2403.20266 • Published Mar 29 • 3
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 62
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) Paper • 2404.00579 • Published Mar 31 • 1
RoFormer: Enhanced Transformer with Rotary Position Embedding Paper • 2104.09864 • Published Apr 20, 2021 • 7
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 58
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications Paper • 2404.13506 • Published Apr 21 • 1
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 106
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 65
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published May 7 • 14
Stylus: Automatic Adapter Selection for Diffusion Models Paper • 2404.18928 • Published Apr 29 • 14
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14 • 27
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15 • 23
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 28
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20 • 33
An Introduction to Vision-Language Modeling Paper • 2405.17247 • Published about 1 month ago • 77
Spectrum: Targeted Training on Signal to Noise Ratio Paper • 2406.06623 • Published 19 days ago • 1
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published 14 days ago • 14
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4 • 45
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published 15 days ago • 16
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published 20 days ago • 46
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published 9 days ago • 27
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published 12 days ago • 17
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published 9 days ago • 12
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 66
LiveMind: Low-latency Large Language Models with Simultaneous Inference Paper • 2406.14319 • Published 6 days ago • 13
Extreme Compression of Large Language Models via Additive Quantization Paper • 2401.06118 • Published Jan 11 • 11
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Paper • 2312.08935 • Published Dec 14, 2023 • 4
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Paper • 2406.14546 • Published 6 days ago • 1
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction Paper • 2401.17948 • Published Jan 31 • 2
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published 27 days ago • 3