Hugo Laurençon's picture

Hugo Laurençon

HugoLaurencon

·

HugoLaurencon

AI & ML interests

None yet

Recent Activity

upvoted an article 7 days ago

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

upvoted a paper 20 days ago

Autonomy-of-Experts Models

upvoted a paper 26 days ago

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

View all activity

Organizations

HugoLaurencon's activity

upvoted an article 7 days ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

8 days ago

• 92

upvoted a paper 20 days ago

Autonomy-of-Experts Models

Paper • 2501.13074 • Published 20 days ago • 40

upvoted a paper 26 days ago

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published 26 days ago • 34

upvoted a paper 29 days ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 83

upvoted a paper about 1 month ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99

upvoted 4 papers about 2 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 345

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 125

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 106

upvoted 2 papers 2 months ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 13

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Paper • 2412.04280 • Published Dec 5, 2024 • 13

upvoted 4 papers 3 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 59

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43

Watermark Anything with Localized Messages

Paper • 2411.07231 • Published Nov 11, 2024 • 20

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 50

upvoted 5 papers 4 months ago

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24, 2024 • 40

WAFFLE: Multi-Modal Model for Automated Front-End Development

Paper • 2410.18362 • Published Oct 24, 2024 • 12

MoH: Multi-Head Attention as Mixture-of-Head Attention

Paper • 2410.11842 • Published Oct 15, 2024 • 21

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 93

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15, 2024 • 17