Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated Sep 20 • 3.51k • 45
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22 • 35
Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated Sep 20 • 3.51k • 45
Salesforce/xgen-mm-phi3-mini-instruct-singleimg-r-v1.5 Image-Text-to-Text • Updated Sep 12 • 244 • 15
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 98
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 98
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 56
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models Paper • 2209.07511 • Published Sep 15, 2022