Submitted by akhaliq 57 CogVLM2: Visual Language Models for Image and Video Understanding · 25 authors 5
Submitted by akhaliq 49 WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling · 16 authors 4
Submitted by akhaliq 31 ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model · 8 authors 2
Submitted by akhaliq 27 SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners · 7 authors 2
Submitted by zhuzeyuan 26 Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems · 4 authors 2
Submitted by hallisky 11 StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements · 6 authors 4
Submitted by necludov 8 Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold · 8 authors 2