Marqo-FashionCLIP and Marqo-FashionSigLIP Collection SOTA multimodal models for fashion product embeddings -> https://github.com/marqo-ai/marqo-FashionCLIP/ β’ 11 items β’ Updated 22 days ago β’ 5
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper β’ 2408.16768 β’ Published 21 days ago β’ 25
CogVLM2: Visual Language Models for Image and Video Understanding Paper β’ 2408.16500 β’ Published 22 days ago β’ 55
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper β’ 2408.15881 β’ Published 22 days ago β’ 20
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper β’ 2408.08872 β’ Published Aug 16 β’ 96