Yatharth Sharma's picture

Yatharth Sharma

YaTharThShaRma999

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

reacted to jbilcke-hf's post with 🚀 3 days ago

Doing some testing with HunyuanVideo on the Hugging Face Inference Endpoints 🤗 prompt: "a Shiba Inu is acting as a DJ, he wears sunglasses and is mixing and scratching with vinyl discs at a Ibiza sunny sand beach party" 1280x720, 22 steps, 121 frames There are still some things to iron out regarding speed and memory usage, right now it takes 20min on an A100 (see attached charts) but you can check it out here: https://huggingface.co/jbilcke-hf/HunyuanVideo-for-InferenceEndpoints There are various things I want to try like the 100% diffusers version and other models (LTX-Video..)

reacted to jbilcke-hf's post with 👍 3 days ago

Doing some testing with HunyuanVideo on the Hugging Face Inference Endpoints 🤗 prompt: "a Shiba Inu is acting as a DJ, he wears sunglasses and is mixing and scratching with vinyl discs at a Ibiza sunny sand beach party" 1280x720, 22 steps, 121 frames There are still some things to iron out regarding speed and memory usage, right now it takes 20min on an A100 (see attached charts) but you can check it out here: https://huggingface.co/jbilcke-hf/HunyuanVideo-for-InferenceEndpoints There are various things I want to try like the 100% diffusers version and other models (LTX-Video..)

View all activity

Organizations

None yet

YaTharThShaRma999's activity

upvoted a paper 2 days ago

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published 3 days ago • 19

upvoted 6 papers 3 days ago

Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published 4 days ago • 12

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published 5 days ago • 12

VidTok: A Versatile and Open-Source Video Tokenizer

Paper • 2412.13061 • Published 5 days ago • 6

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

Paper • 2412.12571 • Published 6 days ago • 7

Learning from Massive Human Videos for Universal Humanoid Pose Control

Paper • 2412.14172 • Published 4 days ago • 10

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published 4 days ago • 17

upvoted 4 papers 5 days ago

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Paper • 2412.11100 • Published 8 days ago • 5

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Paper • 2412.12094 • Published 6 days ago • 9

Whisper-GPT: A Hybrid Representation Audio Large Language Model

Paper • 2412.11449 • Published 7 days ago • 4

TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning

Paper • 2412.10447 • Published 11 days ago • 5

upvoted a paper 9 days ago

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published 10 days ago • 43

upvoted 4 papers 16 days ago

Negative Token Merging: Image-based Adversarial Feature Guidance

Paper • 2412.01339 • Published 20 days ago • 21

A Noise is Worth Diffusion Guidance

Paper • 2412.03895 • Published 18 days ago • 27

OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published 21 days ago • 10

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published 17 days ago • 16

upvoted 2 papers 17 days ago

Mimir: Improving Video Diffusion Models for Precise Text Understanding

Paper • 2412.03085 • Published 19 days ago • 12

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training

Paper • 2412.02030 • Published 20 days ago • 18

upvoted 2 papers 19 days ago

Efficient Track Anything

Paper • 2411.18933 • Published 25 days ago • 16

Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning

Paper • 2412.01408 • Published 20 days ago • 1