-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 11 -
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 28 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 35 -
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 92
Collections
Discover the best community collections!
Collections including paper arxiv:2412.08635
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 24 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 36 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 51 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 64 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 53 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 34