7 36 17

Zesen Cheng

ClownRat

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

upvoted a paper 1 day ago

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

upvoted a paper 1 day ago

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

upvoted a paper 1 day ago

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

View all activity

Organizations

ClownRat's activity

upvoted 3 papers 1 day ago

New activity in DAMO-NLP-SG/VideoLLaMA2.1-7B-AV 2 days ago

Some weights of Videollama2Qwen2ForCausalLM were not initialized from the model checkpoint at ./VideoLLaMA2.1-7B-AV and are newly initialized:

#4 opened 26 days ago by

zybbmn

authored a paper 5 days ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 11 days ago • 40

upvoted a paper 5 days ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 18 days ago • 65

updated a model 6 days ago

ClownRat/VideoLLaMA2.1-7B-16F

Text Generation • Updated 6 days ago • 6

upvoted 2 papers 6 days ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 11 days ago • 40

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 10 days ago • 91

updated 2 models 17 days ago

ClownRat/resnet-50-torchvision

Updated 17 days ago • 1.57k

ClownRat/mask2former-resnet-50-coco-instance

Updated 17 days ago • 847

updated a model 19 days ago

ClownRat/resnet-101-torchvision

Updated 19 days ago • 8

updated a collection 22 days ago

Mask2Former

Collection

2 items • Updated 22 days ago

liked a dataset 23 days ago

ClownRat/COCO2017-Instance

Viewer • Updated Dec 11, 2024 • 123k • 27 • 1

updated a model 26 days ago

ClownRat/mask2former-resnet-101-coco-instance

Updated 26 days ago • 10

updated a dataset about 1 month ago

ClownRat/COCO2017-Instance

Viewer • Updated Dec 11, 2024 • 123k • 27 • 1

upvoted 3 papers about 1 month ago

Towards Universal Soccer Video Understanding

Paper • 2412.01820 • Published Dec 2, 2024 • 9

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 17

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 105