3 13 24

Tony Zhao

tianchez

https://www.tianchez.com

AI & ML interests

Multimodal Agent, Generative AI

Recent Activity

upvoted a paper 8 days ago

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

updated a collection 8 days ago

Multimodal Research

updated a collection 8 days ago

Multimodal Research

View all activity

Organizations

tianchez's activity

upvoted a paper 8 days ago

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Paper • 2403.06892 • Published Mar 11, 2024 • 1

upvoted 4 papers 13 days ago

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Paper • 2312.15043 • Published Dec 22, 2023 • 1

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

Paper • 2207.00221 • Published Jul 1, 2022 • 1

OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network

Paper • 2209.05946 • Published Sep 10, 2022 • 1

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Paper • 2407.04923 • Published Jul 6, 2024 • 1

upvoted a paper 22 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published 27 days ago • 121

upvoted a paper 29 days ago

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Paper • 2411.16044 • Published Nov 25, 2024 • 1

upvoted a paper 2 months ago

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published Oct 21, 2024 • 44

upvoted a collection 4 months ago

Multimodal Models

Collection

Multimodal models with leading performance. • 14 items • Updated Nov 17, 2024 • 20

upvoted a paper 7 months ago

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Paper • 2401.00448 • Published Dec 31, 2023 • 28

upvoted a paper 11 months ago

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35

upvoted a paper about 1 year ago

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Paper • 2310.13473 • Published Oct 20, 2023 • 1

upvoted a paper over 1 year ago

DETR Doesn't Need Multi-Scale or Locality Design

Paper • 2308.01904 • Published Aug 3, 2023 • 8