Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published 26 days ago • 30
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21 • 41
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation Paper • 2411.13025 • Published Nov 20 • 2
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation Paper • 2411.13025 • Published Nov 20 • 2
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation Paper • 2411.13025 • Published Nov 20 • 2 • 2
Unicom: Universal and Compact Representation Learning for Image Retrieval Paper • 2304.05884 • Published Apr 12, 2023 • 2
High-Fidelity Facial Albedo Estimation via Texture Quantization Paper • 2406.13149 • Published Jun 19 • 2
Multi-label Cluster Discrimination for Visual Representation Learning Paper • 2407.17331 • Published Jul 24 • 2
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension Paper • 2410.14332 • Published Oct 18 • 1
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination Paper • 2408.09441 • Published Aug 18 • 2
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption Paper • 2308.08428 • Published Aug 16, 2023 • 1
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension Paper • 2410.14332 • Published Oct 18 • 1