1 3 7

Yunlong Tang

yunlong10

https://yunlong10.github.io/

AI & ML interests

Multimodal Learning, Video Understanding & Generation

Recent Activity

upvoted a paper about 2 months ago

Generative AI for Cel-Animation: A Survey

authored a paper about 2 months ago

Scaling Concept With Text-Guided Diffusion Models

authored a paper about 2 months ago

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

View all activity

Organizations

None yet

yunlong10's activity

upvoted a paper about 2 months ago

Generative AI for Cel-Animation: A Survey

Paper • 2501.06250 • Published Jan 8 • 13

authored 3 papers about 2 months ago

commented a paper about 2 months ago

Generative AI for Cel-Animation: A Survey

Paper • 2501.06250 • Published Jan 8 • 13 •

authored 7 papers 5 months ago

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Paper • 2305.02677 • Published May 4, 2023

Video Understanding with Large Language Models: A Survey

Paper • 2312.17432 • Published Dec 29, 2023 • 2

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Paper • 2402.00827 • Published Feb 1, 2024 • 2

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

Paper • 2403.16276 • Published Mar 24, 2024

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Paper • 2404.12353 • Published Apr 18, 2024

AIM 2024 Challenge on Video Saliency Prediction: Methods and Results

Paper • 2409.14827 • Published Sep 23, 2024

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Paper • 2410.09733 • Published Oct 13, 2024 • 9

upvoted a paper 5 months ago

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Paper • 2410.09733 • Published Oct 13, 2024 • 9

liked a Space 8 months ago

152

PaintsUndo

📊

Generate videos and key frames from a single image

liked a model 9 months ago

Efficient-Large-Model/VILA1.5-40b

Text Generation • Updated Jul 18, 2024 • 1.09k • 17

liked a dataset about 1 year ago

jylins/videoxum

Viewer • Updated Apr 22, 2024 • 14k • 262 • 7

liked a model over 1 year ago

QuanSun/EVA-CLIP

Updated Jun 14, 2023 • 82

upvoted a paper over 1 year ago

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83

liked 2 Spaces almost 2 years ago

392

Grounded Segment Anything

📚

102

Caption Anything

📚