AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated about 9 hours ago • 6
Insight-V Collection Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models • 5 items • Updated about 4 hours ago • 3
WhisperNER Collection Collection of WhisperNER models for joint open type NER and ASR • 3 items • Updated 21 days ago • 3
OpenScholar_V1 Collection The set of models, index, data associated with the paper "OpenScholar: Synthesizing Scientific Literature with Retrieval-Augmented LMs". • 8 items • Updated 37 minutes ago • 14
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 7 days ago • 89
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published 8 days ago • 65
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings Paper • 2411.08017 • Published 10 days ago • 11
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated 4 days ago • 227
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published 15 days ago • 34
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. • 9 items • Updated 4 days ago • 70
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching Paper • 2407.03648 • Published Jul 4 • 16
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 29 days ago • 486
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 18 days ago • 89
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated Oct 15 • 140
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization Paper • 2406.04312 • Published Jun 6 • 1
NVLM 1.0 Collection A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks. • 1 item • Updated Oct 1 • 48