matlok
's Collections
Papers - Image - Encoders - Clip
updated
TextCraftor: Your Text Encoder Can be Image Quality Controller
Paper
•
2403.18978
•
Published
•
13
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image
Generation
Paper
•
2404.02733
•
Published
•
20
OmniFusion Technical Report
Paper
•
2404.06212
•
Published
•
74
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper
•
2404.07448
•
Published
•
11
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal
Large Language Models
Paper
•
2404.09204
•
Published
•
10
MoDE: CLIP Data Experts via Clustering
Paper
•
2404.16030
•
Published
•
12
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper
•
2404.17672
•
Published
•
18
Stylus: Automatic Adapter Selection for Diffusion Models
Paper
•
2404.18928
•
Published
•
14
Data curation via joint example selection further accelerates multimodal
learning
Paper
•
2406.17711
•
Published
•
3
MAVIS: Mathematical Visual Instruction Tuning
Paper
•
2407.08739
•
Published
•
31
Law of Vision Representation in MLLMs
Paper
•
2408.16357
•
Published
•
92
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for
Image-to-Video Generation
Paper
•
2411.04709
•
Published
•
25
SLIP: Self-supervision meets Language-Image Pre-training
Paper
•
2112.12750
•
Published
•
1
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Paper
•
2411.02327
•
Published
•
11
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Paper
•
2203.03897
•
Published
•
1