Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 66
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • 8 days ago • 48
DepthFM: Fast Monocular Depth Estimation with Flow Matching Paper • 2403.13788 • Published Mar 20 • 15
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model Paper • 2403.13064 • Published Mar 19 • 30
CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting Paper • 2401.18075 • Published Jan 31 • 7
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 46
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities Paper • 2401.14405 • Published Jan 25 • 11
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild Paper • 2401.13627 • Published Jan 24 • 70
Single-View 3D Human Digitalization with Large Reconstruction Models Paper • 2401.12175 • Published Jan 22 • 5
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models Paper • 2401.11739 • Published Jan 22 • 16
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22 • 22
ControlRoom3D: Room Generation using Semantic Proxy Rooms Paper • 2312.05208 • Published Dec 8, 2023 • 8
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model Paper • 2312.13252 • Published Dec 20, 2023 • 26
ControlMat: A Controlled Generative Approach to Material Capture Paper • 2309.01700 • Published Sep 4, 2023 • 11