MLM - a OnePiece123 Collection

OnePiece123 's Collections

LLM

MLM

MLM

updated Jul 17

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Paper • 2406.17294 • Published Jun 25 • 10
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27 • 51
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Paper • 2406.20076 • Published Jun 28 • 8
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3 • 18
Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17 • 49
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published Jul 4 • 35
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Paper • 2407.06135 • Published Jul 8 • 20
Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10 • 40
SEED-Story: Multimodal Long Story Generation with Large Language Model

Paper • 2407.08683 • Published Jul 11 • 22