Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper β’ 2412.18319 β’ Published 13 days ago β’ 34
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 25 days ago β’ 92
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper β’ 2412.06559 β’ Published 28 days ago β’ 72
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper β’ 2412.05271 β’ Published Dec 6, 2024 β’ 123
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper β’ 2412.04467 β’ Published Dec 5, 2024 β’ 105
view article Article Running Your Custom LoRA Fine-Tuned MusicGen Large Locally By theeseus-ai β’ Dec 6, 2024 β’ 1
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram β’ Dec 4, 2024 β’ 75