Are Your LLMs Capable of Stable Reasoning? Paper β’ 2412.13147 β’ Published about 1 month ago β’ 91
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper β’ 2412.13018 β’ Published about 1 month ago β’ 41
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper β’ 2412.05271 β’ Published Dec 6, 2024 β’ 128 β’ 5
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper β’ 2412.05271 β’ Published Dec 6, 2024 β’ 128
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper β’ 2407.20183 β’ Published Jul 29, 2024 β’ 41 β’ 4
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper β’ 2407.20183 β’ Published Jul 29, 2024 β’ 41