Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation Paper • 2406.16678 • Published 5 days ago • 10
HARE: HumAn pRiors, a key to small language model Efficiency Paper • 2406.11410 • Published 12 days ago • 37
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published 9 days ago • 74
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework Paper • 2406.14783 • Published 9 days ago • 14
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published 8 days ago • 53
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published about 1 month ago • 26
Gecko: Versatile Text Embeddings Distilled from Large Language Models Paper • 2403.20327 • Published Mar 29 • 47
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15 • 56
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 44
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 573
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 179
Instruction-tuned Language Models are Better Knowledge Learners Paper • 2402.12847 • Published Feb 20 • 24
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Paper • 2402.06619 • Published Feb 9 • 50
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains Paper • 2402.05140 • Published Feb 6 • 18
Repeat After Me: Transformers are Better than State Space Models at Copying Paper • 2402.01032 • Published Feb 1 • 22
Specialized Language Models with Cheap Inference from Limited Domain Data Paper • 2402.01093 • Published Feb 2 • 45
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31 • 21
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper • 2401.15024 • Published Jan 26 • 63
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models Paper • 2401.06951 • Published Jan 13 • 23
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 50
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4 • 45
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon Paper • 2401.03462 • Published Jan 7 • 25
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 68
Pearl: A Production-ready Reinforcement Learning Agent Paper • 2312.03814 • Published Dec 6, 2023 • 14
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 44
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 255
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2 • 51
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 55
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning Paper • 2311.11501 • Published Nov 20, 2023 • 32
Memory Augmented Language Models through Mixture of Word Experts Paper • 2311.10768 • Published Nov 15, 2023 • 16
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 69
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4 • 60
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers Paper • 2311.10642 • Published Nov 17, 2023 • 23
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 235
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 38
GPT4All: An Ecosystem of Open Source Compressed Language Models Paper • 2311.04931 • Published Nov 6, 2023 • 20
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs Paper • 2311.05657 • Published Nov 9, 2023 • 26
The Generative AI Paradox: "What It Can Create, It May Not Understand" Paper • 2311.00059 • Published Oct 31, 2023 • 17
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 39
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models Paper • 2310.13671 • Published Oct 20, 2023 • 17
LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 10