Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.09246

A collection of Audio, Video and Visual LLMs.

myshell-ai/OpenVoice

Text-to-Speech • Updated Apr 24 • 356
Running

896

🤗

OpenVoice
dataautogpt3/ProteusV0.3

Text-to-Image • Updated Feb 12 • 47.2k • 89
ByteDance/SDXL-Lightning

Text-to-Image • Updated Apr 3 • 391k • 1.78k

Vision-Language

SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 5
Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 13
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Paper • 2201.12086 • Published Jan 28, 2022 • 2
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

Paper • 2305.15028 • Published May 24, 2023 • 1

LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning

Paper • 2309.06440 • Published Sep 12, 2023 • 9
Robotic Table Tennis: A Case Study into a High Speed Learning System

Paper • 2309.03315 • Published Sep 6, 2023 • 5
Video Language Planning

Paper • 2310.10625 • Published Oct 16, 2023 • 7
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

Paper • 2311.01455 • Published Nov 2, 2023 • 25

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs