DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper ā¢ 2410.12628 ā¢ Published Oct 16 ā¢ 28
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. ā¢ 8 items ā¢ Updated 4 days ago ā¢ 95
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper ā¢ 2408.03178 ā¢ Published Aug 6 ā¢ 37
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper ā¢ 2407.19584 ā¢ Published Jul 28 ā¢ 62
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models ā¢ 11 items ā¢ Updated 16 days ago ā¢ 637
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper ā¢ 2407.14561 ā¢ Published Jul 18 ā¢ 33
GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization Paper ā¢ 2407.15002 ā¢ Published Jul 20 ā¢ 4
Temporal Residual Jacobians For Rig-free Motion Transfer Paper ā¢ 2407.14958 ā¢ Published Jul 20 ā¢ 5
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Paper ā¢ 2407.15642 ā¢ Published Jul 22 ā¢ 10
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation Paper ā¢ 2407.15060 ā¢ Published Jul 21 ā¢ 9
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning Paper ā¢ 2407.15762 ā¢ Published Jul 22 ā¢ 9
Artist: Aesthetically Controllable Text-Driven Stylization without Training Paper ā¢ 2407.15842 ā¢ Published Jul 22 ā¢ 14
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions Paper ā¢ 2407.15187 ā¢ Published Jul 21 ā¢ 11
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper ā¢ 2407.15841 ā¢ Published Jul 22 ā¢ 40
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper ā¢ 2407.01392 ā¢ Published Jul 1 ā¢ 39