6 122 172

Inui

Norm

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

updated a collection 1 day ago

Image / Video Gen

liked a model 8 days ago

deepseek-ai/DeepSeek-V3

updated a collection 11 days ago

Image / Video Gen

View all activity

Organizations

Norm's activity

upvoted a paper 11 days ago

Large Motion Video Autoencoding with Cross-modal Video VAE

Paper • 2412.17805 • Published 12 days ago • 23

upvoted a paper 14 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published 29 days ago • 123

upvoted a paper 16 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 16 days ago • 334

upvoted a paper 18 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 22 days ago • 136

upvoted a paper 24 days ago

STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published 25 days ago • 70

upvoted 7 papers about 1 month ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 120

Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published Nov 28, 2024 • 33

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 29

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Paper • 2411.11922 • Published Nov 18, 2024 • 18

upvoted 2 papers about 2 months ago

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Paper • 2411.07461 • Published Nov 12, 2024 • 22

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 48

upvoted 2 papers 2 months ago

In-Context LoRA for Diffusion Transformers

Paper • 2410.23775 • Published Oct 31, 2024 • 11

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25, 2024 • 82

upvoted 2 papers 3 months ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 90

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Paper • 2401.08740 • Published Jan 16, 2024 • 12

upvoted a collection 3 months ago

LLaVA-Video

Collection

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5, 2024 • 56

upvoted a paper 3 months ago

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 38