Mitigating Object Hallucination via Concentric Causal Attention Paper • 2410.15926 • Published Oct 21 • 16
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published Oct 21 • 17
Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages Paper • 2412.09025 • Published 4 days ago • 4
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Paper • 2412.09349 • Published 3 days ago • 5
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios Paper • 2412.08972 • Published 4 days ago • 8
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published 3 days ago • 19