Youngtaek Oh's picture

2 15

Youngtaek Oh

ytaek-oh

·

https://ytaek-oh.github.io

AI & ML interests

Vision and Language, Multimodality, Compositionality

Recent Activity

updated a Space about 1 month ago

ytaek-oh/table

published a Space about 1 month ago

ytaek-oh/table

updated a Space about 1 month ago

ytaek-oh/table

View all activity

Organizations

None yet

ytaek-oh's activity

upvoted 12 papers 3 months ago

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27, 2024 • 45

jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images

Paper • 2412.08802 • Published Dec 11, 2024 • 5

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Paper • 2412.09283 • Published Dec 12, 2024 • 19

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Paper • 2412.09616 • Published Dec 12, 2024 • 1

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Paper • 2412.08580 • Published Dec 11, 2024 • 45

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training

Paper • 2411.11927 • Published Nov 18, 2024 • 1

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

Paper • 2411.16828 • Published Nov 25, 2024 • 1

COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training

Paper • 2412.01814 • Published Dec 2, 2024 • 1

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Paper • 2411.18674 • Published Nov 27, 2024 • 1

FLAIR: VLM with Fine-grained Language-informed Image Representations

Paper • 2412.03561 • Published Dec 4, 2024 • 1

GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis

Paper • 2412.06089 • Published Dec 8, 2024 • 4

CompCap: Improving Multimodal Large Language Models with Composite Captions

Paper • 2412.05243 • Published Dec 6, 2024 • 19

upvoted a collection 4 months ago

ProLIP

Official ProLIP weights • 7 items • Updated 1 day ago • 6

upvoted a paper 4 months ago

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Paper • 2410.09733 • Published Oct 13, 2024 • 9

upvoted a paper 5 months ago

Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

Paper • 2410.05210 • Published Oct 7, 2024 • 10