Submitted by akhaliq 94 Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency · 6 authors 13
Submitted by Xidong 54 LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture · 5 authors 2
Submitted by NeoZ123 47 LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA · 11 authors 3
Submitted by akhaliq 29 MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark · 14 authors 3
Submitted by akhaliq 19 Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining · 3 authors 2
Submitted by akhaliq 10 FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation · 4 authors 2
Submitted by davanstrien 10 Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text · 4 authors 3