Dazhi Jiang's picture

Dazhi Jiang

thuzhizhi

·

jiangzizi

AI & ML interests

None yet

Organizations

None yet

thuzhizhi's activity

upvoted 2 papers 17 days ago

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Paper • 2409.07450 • Published 17 days ago • 10

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published 18 days ago • 19

upvoted 3 papers 18 days ago

SongCreator: Lyrics-based Universal Song Generation

Paper • 2409.06029 • Published 19 days ago • 19

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Paper • 2409.06633 • Published 18 days ago • 14

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06135 • Published 19 days ago • 14

upvoted 6 papers 19 days ago

POINTS: Improving Your Vision-language Model with Affordable Strategies

Paper • 2409.04828 • Published 21 days ago • 22

Benchmarking Chinese Knowledge Rectification in Large Language Models

Paper • 2409.05806 • Published 19 days ago • 14

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Paper • 2409.05591 • Published 20 days ago • 26

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Paper • 2409.05152 • Published 20 days ago • 29

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published 24 days ago • 71

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published 19 days ago • 45

upvoted 4 papers 20 days ago

Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

Paper • 2409.04005 • Published 23 days ago • 16

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Paper • 2409.04410 • Published 22 days ago • 23

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published 24 days ago • 27

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

Paper • 2409.03810 • Published 23 days ago • 30

upvoted 4 papers 23 days ago

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

Paper • 2409.03512 • Published 24 days ago • 25

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Paper • 2409.01944 • Published 25 days ago • 44

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Paper • 2409.03420 • Published 24 days ago • 23

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 23 days ago • 85

upvoted 4 papers 24 days ago

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published 24 days ago • 54

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published 24 days ago • 27

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Paper • 2409.02897 • Published 24 days ago • 43

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published 26 days ago • 75

upvoted 6 papers 25 days ago

FLUX that Plays Music

Paper • 2409.00587 • Published 28 days ago • 31

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published 27 days ago • 26

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published 25 days ago • 33

Compositional 3D-aware Video Generation with LLM Director

Paper • 2409.00558 • Published 28 days ago • 14

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published 27 days ago • 12

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published 28 days ago • 38

upvoted 4 papers 26 days ago

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Paper • 2408.17131 • Published 30 days ago • 11

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28 • 32

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Paper • 2408.15914 • Published Aug 28 • 21

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Paper • 2408.17267 • Published 30 days ago • 22

upvoted 5 papers 29 days ago

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published about 1 month ago • 45

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published about 1 month ago • 26

CSGO: Content-Style Composition in Text-to-Image Generation

Paper • 2408.16766 • Published about 1 month ago • 17

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Paper • 2408.16767 • Published about 1 month ago • 29

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published about 1 month ago • 56

upvoted 22 papers about 1 month ago

Efficient LLM Scheduling by Learning to Rank

Paper • 2408.15792 • Published Aug 28 • 18

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28 • 20

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

Paper • 2408.15915 • Published Aug 28 • 19

Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Paper • 2408.15991 • Published Aug 28 • 15

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 81

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Paper • 2408.15079 • Published Aug 27 • 51

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Paper • 2408.14805 • Published Aug 27 • 12

GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

Paper • 2408.13674 • Published Aug 24 • 17

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Paper • 2408.13467 • Published Aug 24 • 23

MobileQuant: Mobile-friendly Quantization for On-device Language Models

Paper • 2408.13933 • Published Aug 25 • 13

SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

Paper • 2408.14354 • Published Aug 26 • 40

K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

Paper • 2408.14468 • Published Aug 26 • 33

TVG: A Training-free Transition Video Generation Method with Diffusion Models

Paper • 2408.13413 • Published Aug 24 • 13

Training-free Long Video Generation with Chain of Diffusion Model Experts

Paper • 2408.13423 • Published Aug 24 • 19

Efficient Detection of Toxic Prompts in Large Language Models

Paper • 2408.11727 • Published Aug 21 • 11

Memory-Efficient LLM Training with Online Subspace Descent

Paper • 2408.12857 • Published Aug 23 • 10

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Paper • 2408.13239 • Published Aug 23 • 10

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Paper • 2408.13233 • Published Aug 23 • 20

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Paper • 2408.09787 • Published Aug 19 • 6

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

Paper • 2408.12076 • Published Aug 22 • 11

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20 • 49

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

Paper • 2408.10635 • Published Aug 20 • 13