Chuanming
's Collections
paper2read
updated
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style
Models on Dense Captions
Paper
•
2312.08578
•
Published
•
16
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric
Strategy for Diverse Generative Tasks
Paper
•
2312.08583
•
Published
•
9
Vision-Language Models as a Source of Rewards
Paper
•
2312.09187
•
Published
•
11
StemGen: A music generation model that listens
Paper
•
2312.08723
•
Published
•
47
Pearl: A Production-ready Reinforcement Learning Agent
Paper
•
2312.03814
•
Published
•
14
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Paper
•
2312.13789
•
Published
•
13
PanGu-π: Enhancing Language Model Architectures via Nonlinearity
Compensation
Paper
•
2312.17276
•
Published
•
15
Training a Helpful and Harmless Assistant with Reinforcement Learning
from Human Feedback
Paper
•
2204.05862
•
Published
•
2
Improving Text Embeddings with Large Language Models
Paper
•
2401.00368
•
Published
•
79
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
181
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
•
2401.02038
•
Published
•
62
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
Paper
•
2312.03732
•
Published
•
7
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
123
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
•
2401.04081
•
Published
•
70
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
•
2401.03462
•
Published
•
27
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper
•
2401.04468
•
Published
•
48
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
•
2401.04658
•
Published
•
25
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper
•
2401.04577
•
Published
•
42
Tuning LLMs with Contrastive Alignment Instructions for Machine
Translation in Unseen, Low-resource Languages
Paper
•
2401.05811
•
Published
•
6
Self-Instruct: Aligning Language Model with Self Generated Instructions
Paper
•
2212.10560
•
Published
•
9
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
DeepSpeed-Inference
Paper
•
2401.08671
•
Published
•
14
Scalable Pre-training of Large Autoregressive Image Models
Paper
•
2401.08541
•
Published
•
36
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Paper
•
2401.10061
•
Published
•
29
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
145
Zero Bubble Pipeline Parallelism
Paper
•
2401.10241
•
Published
•
23
Medusa: Simple LLM Inference Acceleration Framework with Multiple
Decoding Heads
Paper
•
2401.10774
•
Published
•
54
Lost in the Middle: How Language Models Use Long Contexts
Paper
•
2307.03172
•
Published
•
37
AutoRT: Embodied Foundation Models for Large Scale Orchestration of
Robotic Agents
Paper
•
2401.12963
•
Published
•
12
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
Benchmark for Expert AGI
Paper
•
2311.16502
•
Published
•
35
Proactive Detection of Voice Cloning with Localized Watermarking
Paper
•
2401.17264
•
Published
•
17
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
•
2401.18058
•
Published
•
20
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices
Paper
•
2311.16567
•
Published
•
22
A Long Way to Go: Investigating Length Correlations in RLHF
Paper
•
2310.03716
•
Published
•
9
Efficient Exploration for LLMs
Paper
•
2402.00396
•
Published
•
21
Transforming and Combining Rewards for Aligning Large Language Models
Paper
•
2402.00742
•
Published
•
11
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
•
2401.15947
•
Published
•
49
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper
•
2402.10176
•
Published
•
36
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
•
2309.14509
•
Published
•
17
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
51
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
•
2311.03285
•
Published
•
28
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
88
NExT-GPT: Any-to-Any Multimodal LLM
Paper
•
2309.05519
•
Published
•
78
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
Perception
Paper
•
2401.16158
•
Published
•
19
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
•
2403.07816
•
Published
•
39
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
124
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
Paper
•
2402.09844
•
Published
•
20
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
243
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
•
2404.14619
•
Published
•
126
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper
•
2404.16710
•
Published
•
75
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
72
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
•
66
LoRA Learns Less and Forgets Less
Paper
•
2405.09673
•
Published
•
87
Pheme: Efficient and Conversational Speech Generation
Paper
•
2401.02839
•
Published
•
17
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
•
2405.11143
•
Published
•
34
Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions
Paper
•
2310.07301
•
Published
•
1
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Paper
•
2405.11273
•
Published
•
17
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper
•
2405.14734
•
Published
•
11
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper
•
2309.14525
•
Published
•
30
Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection
Paper
•
2310.11511
•
Published
•
75
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
•
2406.00888
•
Published
•
30
Step-aware Preference Optimization: Aligning Preference with Denoising
Performance at Each Step
Paper
•
2406.04314
•
Published
•
27
Scalable Diffusion Models with Transformers
Paper
•
2212.09748
•
Published
•
17
Back to Basics: Revisiting REINFORCE Style Optimization for Learning
from Human Feedback in LLMs
Paper
•
2402.14740
•
Published
•
11
RewardBench: Evaluating Reward Models for Language Modeling
Paper
•
2403.13787
•
Published
•
21
An Introduction to Vision-Language Modeling
Paper
•
2405.17247
•
Published
•
86
Florence-2: Advancing a Unified Representation for a Variety of Vision
Tasks
Paper
•
2311.06242
•
Published
•
86
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper
•
2406.02430
•
Published
•
30
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant
for Mobile Devices
Paper
•
2312.16886
•
Published
•
19
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper
•
2402.03766
•
Published
•
12
Paper
•
2407.10671
•
Published
•
160
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
•
2406.08464
•
Published
•
65
Paper
•
2408.07009
•
Published
•
61
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper
•
2408.08872
•
Published
•
98
mGTE: Generalized Long-Context Text Representation and Reranking Models
for Multilingual Text Retrieval
Paper
•
2407.19669
•
Published
•
22
Building and better understanding vision-language models: insights and
future directions
Paper
•
2408.12637
•
Published
•
123
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper
•
2408.15240
•
Published
•
13
Language Model Can Listen While Speaking
Paper
•
2408.02622
•
Published
•
37
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild
Paper
•
2409.03753
•
Published
•
18
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Paper
•
2409.06666
•
Published
•
55
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View
Synthesis
Paper
•
2409.07129
•
Published
•
6
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper
•
2409.01704
•
Published
•
83
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
•
2409.12191
•
Published
•
74
Prithvi WxC: Foundation Model for Weather and Climate
Paper
•
2409.13598
•
Published
•
39
Baichuan-Omni Technical Report
Paper
•
2410.08565
•
Published
•
84
Rewarding Progress: Scaling Automated Process Verifiers for LLM
Reasoning
Paper
•
2410.08146
•
Published
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Paper
•
2410.16153
•
Published
•
43
Baichuan Alignment Technical Report
Paper
•
2410.14940
•
Published
•
49
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper
•
2410.13861
•
Published
•
52
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper
•
2404.05719
•
Published
•
81
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Paper
•
2410.18451
•
Published
•
15
Continuous Speech Synthesis using per-token Latent Diffusion
Paper
•
2410.16048
•
Published
•
29
Fast Best-of-N Decoding via Speculative Rejection
Paper
•
2410.20290
•
Published
•
10
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop
Reinforcement Learning
Paper
•
2410.21845
•
Published
•
12
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World
Exploration, Feedback and Optimization
Paper
•
2410.19609
•
Published
•
17
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science
Competitions
Paper
•
2410.20424
•
Published
•
38
Flow-DPO: Improving LLM Mathematical Reasoning through Online
Multi-Agent Learning
Paper
•
2410.22304
•
Published
•
16
A Large Recurrent Action Model: xLSTM enables Fast Inference for
Robotics Tasks
Paper
•
2410.22391
•
Published
•
22
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
•
2410.23168
•
Published
•
24
Stealing User Prompts from Mixture of Experts
Paper
•
2410.22884
•
Published
•
14
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
•
2410.02884
•
Published
•
51
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated
Parameters by Tencent
Paper
•
2411.02265
•
Published
•
24
Watermark Anything with Localized Messages
Paper
•
2411.07231
•
Published
•
20
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper
•
2411.10440
•
Published
•
110
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
•
2411.14405
•
Published
•
57
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper
•
2411.14402
•
Published
•
41
SpiRit-LM: Interleaved Spoken and Written Language Model
Paper
•
2402.05755
•
Published
•
13
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper
•
2412.03555
•
Published
•
118
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
130
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
92
A Survey of Small Language Models
Paper
•
2410.20011
•
Published
•
40
Paper
•
2412.15115
•
Published
•
285