Ye Fang

aleafy

AI & ML interests

None yet

Recent Activity

upvoted a paper 23 days ago

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

upvoted a paper about 1 month ago

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

View all activity

Organizations

None yet

aleafy's activity

upvoted a paper 23 days ago

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published 24 days ago • 20

upvoted a paper about 1 month ago

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Paper • 2412.01824 • Published Dec 2, 2024 • 65

upvoted 3 papers 2 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 34

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 45

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 66

liked a dataset 6 months ago

Zery/MaskImageNet

Viewer • Updated Jul 30, 2024 • 469k • 53 • 2

upvoted a paper 6 months ago

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 93

liked a model 6 months ago

internlm/internlm-xcomposer2d5-7b

Visual Question Answering • Updated Jul 22, 2024 • 108k • 185

liked a Space 8 months ago

Running

🏆

GRM

authored a paper 8 months ago

Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

Paper • 2404.16829 • Published Apr 25, 2024 • 5

upvoted a paper 8 months ago

Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

Paper • 2404.16829 • Published Apr 25, 2024 • 5

liked a model 9 months ago

internlm/internlm-xcomposer2-4khd-7b

Visual Question Answering • Updated Apr 18, 2024 • 1.06k • 71

upvoted a paper 11 months ago

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Paper • 2401.16420 • Published Jan 29, 2024 • 55

upvoted a paper 12 months ago

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Paper • 2401.04092 • Published Jan 8, 2024 • 21

liked a Space about 1 year ago

Running

🔥

Alpha-CLIP LLaVA-1.5

upvoted a paper about 1 year ago

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

Paper • 2312.15011 • Published Dec 22, 2023 • 15

authored 2 papers about 1 year ago

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 32

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Paper • 2312.02980 • Published Dec 5, 2023 • 7

liked a Space about 1 year ago

Runtime error

🔥

Alpha-CLIP_ImageVar

upvoted a paper about 1 year ago

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper • 2312.03818 • Published Dec 6, 2023 • 32