NG's picture

129 225

NG

SirRa1zel

·

AI & ML interests

Text-to-Speech, Translation, Object Detection

Recent Activity

liked a Space 4 days ago

ASLP-lab/DiffRhythm

liked a Space 4 days ago

hf-audio/open_asr_leaderboard

liked a dataset 6 days ago

GeneralReasoning/GeneralThought-195K

View all activity

Organizations

None yet

SirRa1zel's activity

upvoted 2 papers 11 days ago

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published 12 days ago • 42

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published 13 days ago • 32

upvoted a paper 12 days ago

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

Paper • 2502.15872 • Published 17 days ago • 4

upvoted an article about 1 month ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.15k

upvoted a collection about 1 month ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 8 items • Updated 15 days ago • 392

upvoted 4 papers about 2 months ago

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper • 2501.12909 • Published Jan 22 • 68

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 83

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Paper • 2501.10045 • Published Jan 17 • 9

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published Jan 16 • 19

upvoted a collection about 2 months ago

OuteTTS 0.3

4 items • Updated Jan 15 • 18

upvoted a paper about 2 months ago

MangaNinja: Line Art Colorization with Precise Reference Following

Paper • 2501.08332 • Published Jan 14 • 57

upvoted a collection about 2 months ago

Visual Document Retrieval

A collection of models, datasets, and spaces in the VDR series • 5 items • Updated Jan 10 • 8

upvoted 3 papers about 2 months ago

UnCommon Objects in 3D

Paper • 2501.07574 • Published Jan 13 • 13

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10 • 61

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 43

upvoted a collection 2 months ago

Cosmos

The collection of Cosmos models • 31 items • Updated Jan 17 • 268

upvoted a paper 3 months ago

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 47

upvoted a collection 3 months ago

[MASK] is All You Need

Code, dataset, and pretrained model • 6 items • Updated Feb 6 • 9

upvoted a paper 3 months ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 35

upvoted a paper 4 months ago

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

Paper • 2407.03648 • Published Jul 4, 2024 • 18