InfiMM

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

xiaotianhan authored a paper 2 months ago

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Ye27 authored a paper 2 months ago

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Ye27 authored a paper 3 months ago

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

View all activity

Infi-MM's activity

xiaotianhan

authored a paper 2 months ago

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Paper • 2410.18666 • Published Oct 24, 2024 • 19

Ye27

authored a paper 2 months ago

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Paper • 2410.18666 • Published Oct 24, 2024 • 19

Ye27

authored a paper 3 months ago

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Paper • 2405.17815 • Published May 28, 2024

xiaotianhan

updated a dataset 3 months ago

Infi-MM/InfiMM-WebMath-40B

Viewer • Updated Sep 24, 2024 • 22.8M • 571 • 53

Ye27

authored a paper 4 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 48

bytehxf

authored a paper 4 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 48

Yi-Qi638

authored 3 papers 4 months ago

CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

Paper • 2311.11567 • Published Nov 20, 2023 • 8

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Paper • 2401.06805 • Published Jan 10, 2024 • 2

COCO is "ALL'' You Need for Visual Instruction Fine-tuning

Paper • 2401.08968 • Published Jan 17, 2024 • 2

xiaotianhan

authored a paper 4 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 48

lllliuhhhhggg

authored a paper 4 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 48

Yi-Qi638

authored a paper 4 months ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 48

xiaotianhan

posted an update 4 months ago

Post

881

🚀 Excited to announce the release of InfiMM-WebMath-40B — the largest open-source multimodal pretraining dataset designed to advance mathematical reasoning in AI! 🧮✨

With 40 billion tokens, this dataset aims for enhancing the reasoning capabilities of multimodal large language models in the domain of mathematics.

If you're interested in MLLMs, AI, and math reasoning, check out our work and dataset:

🤗 HF: InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning (2409.12568)
📂 Dataset: Infi-MM/InfiMM-WebMath-40B

Ye27

authored a paper 4 months ago

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29, 2024 • 92

xiaotianhan

updated a Space 5 months ago

Running

🚀

README

Ye27

authored a paper 9 months ago

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 52

xiaotianhan

authored a paper 9 months ago

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 52

xiaotianhan

posted an update 9 months ago

Post

2094

🎉 🎉 🎉 Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)

2 replies

lllliuhhhhggg

updated a model 10 months ago

Infi-MM/infimm-hd

Text Generation • Updated Mar 6, 2024 • 15 • 27

KhalilMrini

authored a paper 10 months ago

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 16

AI & ML interests

Recent Activity

Team members 11

Infi-MM's activity

README