xinyu1205/recognize-anything-plus-model Zero-Shot Image Classification β’ Updated Oct 25, 2023 β’ 36
Byte Latent Transformer: Patches Scale Better Than Tokens Paper β’ 2412.09871 β’ Published 20 days ago β’ 80
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper β’ 2412.08737 β’ Published 21 days ago β’ 51
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper β’ 2412.08580 β’ Published 21 days ago β’ 45
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper β’ 2412.05271 β’ Published 26 days ago β’ 121