Cuiunbo's picture

Cuiunbo PRO

Cuiunbo

·

AI & ML interests

Anything

Recent Activity

liked a model 17 days ago

lerobot/pi0

new activity 25 days ago

HKUSTAudio/Llasa-3B:Are There Quantitative Metrics, Such as Simo Compared to Other TTS?

liked a dataset 28 days ago

liboaccn/MIT-10M

View all activity

Organizations

Cuiunbo's activity

upvoted a paper 4 months ago

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51

upvoted a paper 5 months ago

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published Oct 14, 2024 • 26

upvoted a paper 7 months ago

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 82

upvoted a collection 8 months ago

UI Agent

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robots • 323 items • Updated 20 minutes ago • 48

upvoted 2 papers 8 months ago

GUICourse: From General Vision Language Models to Versatile GUI Agents

Paper • 2406.11317 • Published Jun 17, 2024 • 1

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18, 2024 • 17

upvoted a paper 9 months ago

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published Jun 26, 2024 • 29

upvoted a paper 10 months ago

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 24

upvoted a collection 10 months ago

ConvLLaVA

A collection of ConvLLaVA models. • 10 items • Updated May 28, 2024 • 10

upvoted 2 papers 10 months ago

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published May 23, 2024 • 14

RoHM: Robust Human Motion Reconstruction via Diffusion

Paper • 2401.08570 • Published Jan 16, 2024 • 1

upvoted a collection 10 months ago

MiniCPM-V

17 items • Updated Aug 6, 2024 • 1

upvoted a paper 10 months ago

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Paper • 2404.14239 • Published Apr 22, 2024 • 9

upvoted a collection 10 months ago

VisionLM

799 items • Updated about 18 hours ago • 45

upvoted a paper 10 months ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131

upvoted a collection 10 months ago

Tiny Models

3 items • Updated Jun 20, 2024 • 1

upvoted a paper 10 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 101