Zhaokai Wang's picture

Zhaokai Wang

wzk1015

·

https://www.wzk.plus

wzk1015

AI & ML interests

Computer Vision Music Generation Multimodal Large Language Models

Recent Activity

updated a model 14 days ago

OpenGVLab/PIIP

liked a model 14 days ago

OpenGVLab/V2PE

authored a paper 15 days ago

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

View all activity

Organizations

wzk1015's activity

upvoted a paper 15 days ago

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published 17 days ago • 7

upvoted 3 papers about 2 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 35

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Paper • 2412.09428 • Published Dec 12, 2024 • 7

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 129

upvoted a paper 2 months ago

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Paper • 2410.08202 • Published Oct 10, 2024 • 4

upvoted a collection 2 months ago

InternVL2.5

Better than InternVL 2.0 • 18 items • Updated 21 days ago • 81

upvoted a collection 4 months ago

Mono-InternVL

A Pioneering Monolithic MLLM • 2 items • Updated 21 days ago • 6

upvoted a paper 7 months ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11, 2024 • 20

upvoted a collection 8 months ago

InternVL1.0

Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks • 16 items • Updated 21 days ago • 18