2 26 21

Wujian Peng

wjpoom

https://scholar.google.com/citations?user=GTuWk9YAAAAJ&hl=zh-CN

wjpoom

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

updated a dataset 2 days ago

wjpoom/SPEC

updated a dataset 5 days ago

Inst-IT/Inst-IT-Dataset

View all activity

Organizations

wjpoom's activity

upvoted a paper 2 days ago

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Paper • 2312.00081 • Published Nov 30, 2023 • 2

upvoted a paper 6 days ago

Cross-Modality Safety Alignment

Paper • 2406.15279 • Published Jun 21 • 4

upvoted a paper 10 days ago

Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning

Paper • 2412.03565 • Published 11 days ago • 11

upvoted a paper 4 months ago

RelBench: A Benchmark for Deep Learning on Relational Databases

Paper • 2407.20060 • Published Jul 29 • 7

upvoted 4 papers 5 months ago

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Paper • 2407.16982 • Published Jul 24 • 40

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 16

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9 • 41

Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11 • 48

upvoted an article 5 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 170

upvoted 4 papers 5 months ago

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1 • 75

upvoted 7 papers 6 months ago

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published Jun 18 • 14

Improving Visual Commonsense in Language Models via Multiple Image Generation

Paper • 2406.13621 • Published Jun 19 • 13

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Paper • 2406.13542 • Published Jun 19 • 16

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20 • 34

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21 • 14

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Paper • 2406.16377 • Published Jun 24 • 11

Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13 • 24