--- license: apache-2.0 base_model: - Lykon/DreamShaper pipeline_tag: text-to-image --- # TDM: Learning Few-Step Diffusion Models by Trajectory Distribution Matching
This is the Official Repository of "[Learning Few-Step Diffusion Models by Trajectory Distribution Matching](https://arxiv.org/abs/2503.06674)", by *Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang*. ## User Study Time! ![user_study](user_study.jpg) Which one do you think is better? Some images are generated by Pixart-α (50 NFE). Some images are generated by **TDM (4 NFE)**, distilling from Pixart-α in a data-free way with merely 500 training iterations and 2 A800 hours.
Click for answer

Answers of TDM's position (left to right): bottom, bottom, top, bottom, top.

## Fast Text-to-Video Geneartion Our proposed TDM can be easily extended to text-to-video.

Teacher Student

The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, **TDM (4NFE)** can generate 25 videos, as shown in the below, achieving an impressive **25 times speedup without performance degradation**. (Note: The noise in the GIF is due to compression.) ## Usage ### TDM-SD3-LoRA ```python import torch from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler from huggingface_hub import hf_hub_download from safetensors.torch import load_file from diffusers.utils import make_image_grid pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda") pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125. pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory. pipe.vae.config.shift_factor = 0.0 pipe = pipe.to("cuda") pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler") pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6. pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) generator = torch.manual_seed(8888) image = pipe( prompt="A cute panda holding a sign says TDM SOTA!", negative_prompt="", num_inference_steps=4, height=1024, width=1024, num_images_per_prompt = 1, guidance_scale=1., generator = generator, ).images[0] pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler") pipe.set_adapters(["tdm"], [0.]) # Unload lora generator = torch.manual_seed(8888) teacher_image = pipe( prompt="A cute panda holding a sign says TDM SOTA!", negative_prompt="", num_inference_steps=28, height=1024, width=1024, num_images_per_prompt = 1, guidance_scale=7., generator = generator, ).images[0] make_image_grid([image,teacher_image],1,2) ``` ![sd3_compare](sd3_compare.jpg) The sample generated by SD3 with 56 NFE is on the right, and the sample generated by **TDM** with 4NFE is on the left. Which one do you feel is better? ### TDM-Dreamshaper-v7-LoRA ```python import torch from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler from huggingface_hub import hf_hub_download from safetensors.torch import load_file repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA" ckpt_name = "tdm_dreamshaper.pt" pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda") pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name)) pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler") generator = torch.manual_seed(317) image = pipe( prompt="A close-up photo of an Asian lady with sunglasses", negative_prompt="", num_inference_steps=4, num_images_per_prompt = 1, generator = generator, guidance_scale=1., ).images[0] image ``` ![tdm_dreamshaper](tdm_dreamshaper.jpg) ## TDM-CogVideoX-2B-LoRA ```python import torch from diffusers import CogVideoXPipeline from diffusers.utils import export_to_video pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16) pipe.vae.enable_slicing() # Save memory pipe.vae.enable_tiling() # Save memory pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA") pipe.to("cuda") prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The " "panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance" ) # We train the generator on timesteps [999, 856, 665, 399]. # The official scheduler of CogVideo-X using uniform spacing, may cause inferior results. # But TDM-LoRA still works well under 4 NFE. # We will update the TDM-CogVideoX-LoRA soon for better performance! generator = torch.manual_seed(8888) frames = pipe(prompt, guidance_scale=1, num_inference_steps=4, num_frames=49, generator = generator, use_dynamic_cfg=True).frames[0] export_to_video(frames, "output-TDM.mp4", fps=8) ``` ## 🔥 Pre-trained Models We release a bucket of TDM-LoRA. Please enjoy it! - [TDM-SD3-LoRA](https://huggingface.co/Luo-Yihong/TDM_sd3_lora) - [TDM-CogVideoX-2B-LoRA](https://huggingface.co/Luo-Yihong/TDM_CogVideoX-2B_LoRA) - [TDM-Dreamshaper-LoRA](https://huggingface.co/Luo-Yihong/TDM_dreamshaper_LoRA) ## Contact Please contact Yihong Luo (yluocg@connect.ust.hk) if you have any questions about this work. ## Bibtex ``` @misc{luo2025tdm, title={Learning Few-Step Diffusion Models by Trajectory Distribution Matching}, author={Yihong Luo and Tianyang Hu and Jiacheng Sun and Yujun Cai and Jing Tang}, year={2025}, eprint={2503.06674}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.06674}, } ```