Hongsheng LI

hsli-cuhk

https://www.ee.cuhk.edu.hk/~hsli/

AI & ML interests

None yet

Recent Activity

authored a paper about 2 hours ago

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

authored a paper 28 days ago

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

authored a paper about 2 months ago

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

View all activity

Organizations

None yet

hsli-cuhk's activity

authored a paper about 2 hours ago

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published about 17 hours ago • 21

authored a paper 28 days ago

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published 29 days ago • 27

authored 2 papers about 2 months ago

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Paper • 2501.13920 • Published Jan 23 • 15

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published Jan 23 • 37

authored a paper 2 months ago

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 51

authored 3 papers 3 months ago

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published Dec 12, 2024 • 21

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 35

StreamChat: Chatting with Streaming Video

Paper • 2412.08646 • Published Dec 11, 2024 • 18

authored a paper 4 months ago

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published Nov 16, 2024 • 45

authored 3 papers 5 months ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 56

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

Paper • 2410.07303 • Published Oct 9, 2024 • 18

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Paper • 2410.08196 • Published Oct 10, 2024 • 46

authored a paper 6 months ago

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19, 2024 • 37

authored 3 papers 7 months ago

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28, 2024 • 21

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

Paper • 2408.13674 • Published Aug 24, 2024 • 18

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Paper • 2408.02657 • Published Aug 5, 2024 • 34

authored 2 papers 8 months ago

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

Paper • 2407.17490 • Published Jul 3, 2024 • 31

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 33

authored a paper 9 months ago

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Paper • 2406.11831 • Published Jun 17, 2024 • 22

authored a paper 10 months ago

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48