ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published 10 days ago • 17
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published 9 days ago • 27
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published 12 days ago • 33
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published 28 days ago • 24
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception Paper • 2410.12788 • Published Oct 16 • 21
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 30 days ago • 57
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published 29 days ago • 65
Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published 30 days ago • 17
LLM-based Optimization of Compound AI Systems: A Survey Paper • 2410.16392 • Published 29 days ago • 13
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published 29 days ago • 15
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Paper • 2410.11190 • Published Oct 15 • 20
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Paper • 2410.02458 • Published Oct 3 • 9
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 101
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26 • 36