Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.15295

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 38
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

Papers - Image - Summarize as JSON

VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21

Papers - Video Games - Image - Understanding - QA

VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21

VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21
asgaardlab/VideoGameBunny-v1_0-8B

Text Generation • Updated Aug 26 • 11 • 1
asgaardlab/VideoGameBunny-v1_0-4B

Text Generation • Updated Aug 26 • 10 • 1
asgaardlab/VideoGameBunnyCheckpoints

Updated Jul 26

VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Paper • 2407.15060 • Published Jul 21 • 9
Discrete Flow Matching

Paper • 2407.15595 • Published Jul 22 • 11

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Paper • 2406.09406 • Published Jun 13 • 13
VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14 • 9
What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12 • 39

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 30

Papers - Video Games - Navigation

Scaling Instructable Agents Across Many Simulated Worlds

Paper • 2404.10179 • Published Mar 13 • 26
VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21

Papers - Video Games

Scaling Instructable Agents Across Many Simulated Worlds

Paper • 2404.10179 • Published Mar 13 • 26
VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21

about 17 hours ago

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Paper • 2403.06775 • Published Mar 11 • 3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper • 2010.11929 • Published Oct 22, 2020 • 6
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition

Paper • 2110.07040 • Published Oct 13, 2021 • 2
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks

Paper • 1811.00056 • Published Oct 31, 2018 • 2

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs