DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published 12 days ago • 39 • 4
Toucan: Token-Aware Character Level Language Modeling Paper • 2311.08620 • Published Nov 15, 2023 • 3 • 2
Perceiver IO: A General Architecture for Structured Inputs & Outputs Paper • 2107.14795 • Published Jul 30, 2021 • 3
Hard ASH: Sparsity and the right optimizer make a continual learner Paper • 2404.17651 • Published Apr 26 • 2
MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates Paper • 2306.01685 • Published Jun 2, 2023 • 2
CoRe Optimizer: An All-in-One Solution for Machine Learning Paper • 2307.15663 • Published Jul 28, 2023 • 2
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models Paper • 2406.07528 • Published 18 days ago • 2
Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs Paper • 2406.02376 • Published 25 days ago • 2
SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself Paper • 2405.17052 • Published May 27 • 2
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models Paper • 2406.00605 • Published 27 days ago • 2
Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node Paper • 2405.16836 • Published May 27 • 2
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM Paper • 2406.06110 • Published 19 days ago • 2
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models Paper • 2406.05678 • Published 20 days ago • 2
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference Paper • 2405.17755 • Published May 28 • 2
Length Generalization of Causal Transformers without Position Encoding Paper • 2404.12224 • Published Apr 18 • 1 • 2
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory Paper • 2404.11163 • Published Apr 17 • 2
Universal In-Context Approximation By Prompting Fully Recurrent Models Paper • 2406.01424 • Published 26 days ago • 2
A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models Paper • 2405.16504 • Published May 26 • 2
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences Paper • 2406.08128 • Published 17 days ago • 2
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks Paper • 2405.15731 • Published May 24 • 2
State-Free Inference of State-Space Models: The Transfer Function Approach Paper • 2405.06147 • Published May 10 • 2
LoCoCo: Dropping In Convolutions for Long Context Compression Paper • 2406.05317 • Published 21 days ago • 2
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Paper • 2406.06484 • Published 19 days ago • 2 • 2
LongSSM: On the Length Extension of State-space Models in Language Modelling Paper • 2406.02080 • Published 25 days ago • 2
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 18 days ago • 34 • 4
Multilingual Large Language Models Are Not (Yet) Code-Switchers Paper • 2305.14235 • Published May 23, 2023 • 2
Do Llamas Work in English? On the Latent Language of Multilingual Transformers Paper • 2402.10588 • Published Feb 16 • 1 • 2
Integrating Multi-scale Contextualized Information for Byte-based Neural Machine Translation Paper • 2405.19290 • Published about 1 month ago • 2
SpaceByte: Towards Deleting Tokenization from Large Language Modeling Paper • 2404.14408 • Published Apr 22 • 6 • 3
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters Paper • 2405.16287 • Published May 25 • 10 • 2
D'OH: Decoder-Only random Hypernetworks for Implicit Neural Representations Paper • 2403.19163 • Published Mar 28 • 2
Byte-Level Recursive Convolutional Auto-Encoder for Text Paper • 1802.01817 • Published Feb 6, 2018 • 2
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 8 • 9
Prompting-based Synthetic Data Generation for Few-Shot Question Answering Paper • 2405.09335 • Published May 15 • 2
Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting Paper • 2310.09716 • Published Oct 15, 2023 • 2
TarGEN: Targeted Data Generation with Large Language Models Paper • 2310.17876 • Published Oct 27, 2023 • 2
CrossTune: Black-Box Few-Shot Classification with Label Enhancement Paper • 2403.12468 • Published Mar 19 • 2
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation Paper • 2405.17057 • Published May 27 • 2
SemCoder: Training Code Language Models with Comprehensive Semantics Paper • 2406.01006 • Published 26 days ago • 2
NExT: Teaching Large Language Models to Reason about Code Execution Paper • 2404.14662 • Published Apr 23 • 4 • 2
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs Paper • 2405.16325 • Published May 25 • 1 • 2
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks Paper • 2405.15179 • Published May 24 • 1 • 2
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections Paper • 2405.20271 • Published 30 days ago • 2
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining Paper • 2406.02214 • Published 25 days ago • 2
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors Paper • 2405.19597 • Published about 1 month ago • 2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters Paper • 2405.17604 • Published May 27 • 1 • 2
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform Paper • 2405.03003 • Published May 5 • 1 • 2
NOLA: Networks as Linear Combination of Low Rank Random Basis Paper • 2310.02556 • Published Oct 4, 2023 • 2 • 2
Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension Paper • 2404.17991 • Published Apr 27 • 5
CAMELoT: Towards Large Language Models with Training-Free Consolidated Associative Memory Paper • 2402.13449 • Published Feb 21 • 2