MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published 8 days ago • 50
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 15 days ago • 48
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published 8 days ago • 65
PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation Paper • 2411.08307 • Published 9 days ago • 6
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published 10 days ago • 24
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published 15 days ago • 34
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces Paper • 2410.09918 • Published Oct 13 • 3
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11 • 6
The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents Paper • 2304.01412 • Published Apr 3, 2023 • 2
OmniCorpus Collection OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text • 6 items • Updated Oct 21 • 1
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Paper • 2410.08202 • Published Oct 10 • 3
view article Article Advanced Flux Dreambooth LoRA Training with 🧨 diffusers By linoyts • Oct 21 • 27
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Paper • 2410.11190 • Published Oct 15 • 20
Granite Guardian Models Collection A collection of models created by IBM for safeguarding language models. • 4 items • Updated 18 days ago • 13