mdw123
's Collections
Papers
updated
Beyond Language Models: Byte Models are Digital World Simulators
Paper
•
2402.19155
•
Published
•
49
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper
•
2403.00522
•
Published
•
44
Resonance RoPE: Improving Context Length Generalization of Large
Language Models
Paper
•
2403.00071
•
Published
•
22
Learning and Leveraging World Models in Visual Representation Learning
Paper
•
2403.00504
•
Published
•
31
AtP*: An efficient and scalable method for localizing LLM behaviour to
components
Paper
•
2403.00745
•
Published
•
12
Learning to Decode Collaboratively with Multiple Language Models
Paper
•
2403.03870
•
Published
•
18
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
•
2403.03853
•
Published
•
61
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper
•
2403.05135
•
Published
•
42
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
•
2403.05525
•
Published
•
40
Stealing Part of a Production Language Model
Paper
•
2403.06634
•
Published
•
90
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
•
2403.07508
•
Published
•
74
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
•
2403.07816
•
Published
•
39
Synth^2: Boosting Visual-Language Models with Synthetic Captions and
Image Embeddings
Paper
•
2403.07750
•
Published
•
21
Chronos: Learning the Language of Time Series
Paper
•
2403.07815
•
Published
•
46
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
125
Veagle: Advancements in Multimodal Representation Learning
Paper
•
2403.08773
•
Published
•
7
Paper
•
2309.16609
•
Published
•
35
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper
•
2308.12966
•
Published
•
7
Uni-SMART: Universal Science Multimodal Analysis and Research
Transformer
Paper
•
2403.10301
•
Published
•
52
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
•
2403.13372
•
Published
•
62
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
•
2403.17887
•
Published
•
78
InternLM2 Technical Report
Paper
•
2403.17297
•
Published
•
30
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
•
2403.19887
•
Published
•
104
Transformer-Lite: High-efficiency Deployment of Large Language Models on
Mobile Phone GPUs
Paper
•
2403.20041
•
Published
•
34
Localizing Paragraph Memorization in Language Models
Paper
•
2403.19851
•
Published
•
13
DiJiang: Efficient Large Language Models through Compact Kernelization
Paper
•
2403.19928
•
Published
•
10
Long-form factuality in large language models
Paper
•
2403.18802
•
Published
•
24
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
•
2404.02258
•
Published
•
104
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
104
Pre-training Small Base LMs with Fewer Tokens
Paper
•
2404.08634
•
Published
•
34
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
64
SnapKV: LLM Knows What You are Looking for Before Generation
Paper
•
2404.14469
•
Published
•
23
FlowMind: Automatic Workflow Generation with LLMs
Paper
•
2404.13050
•
Published
•
33
Paper
•
2412.15115
•
Published
•
334
YuLan-Mini: An Open Data-efficient Language Model
Paper
•
2412.17743
•
Published
•
59
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
•
2412.18925
•
Published
•
84
Token-Budget-Aware LLM Reasoning
Paper
•
2412.18547
•
Published
•
42
DeepSeek-V3 Technical Report
Paper
•
2412.19437
•
Published
•
10