Luo-Yihong commited on
Commit
2df0851
·
verified ·
1 Parent(s): 25e1a3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -5
README.md CHANGED
@@ -31,11 +31,115 @@ Our proposed TDM can be easily extended to text-to-video.
31
  <img src="student.gif" alt="Student" width="100%">
32
  </p>
33
 
34
- The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, **TDM (4NFE)** can generate 25 videos, as shown below, achieving an impressive **25 times speedup without performance degradation**. (Note: The noise in the GIF is due to compression.)
35
-
36
-
37
- ## 🔥TODO
38
- - Pre-trained Models will be released soon.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Contact
41
 
 
31
  <img src="student.gif" alt="Student" width="100%">
32
  </p>
33
 
34
+ The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, **TDM (4NFE)** can generate 25 videos, as shown in the below, achieving an impressive **25 times speedup without performance degradation**. (Note: The noise in the GIF is due to compression.)
35
+
36
+ ## Usage
37
+ ### TDM-SD3-LoRA
38
+ ```python
39
+ import torch
40
+ from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler
41
+ from huggingface_hub import hf_hub_download
42
+ from safetensors.torch import load_file
43
+ from diffusers.utils import make_image_grid
44
+ pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda")
45
+ pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA
46
+ pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125.
47
+ pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory.
48
+ pipe.vae.config.shift_factor = 0.0
49
+ pipe = pipe.to("cuda")
50
+ pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
51
+ pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6.
52
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
53
+ generator = torch.manual_seed(8888)
54
+ image = pipe(
55
+ prompt="A cute panda holding a sign says TDM SOTA!",
56
+ negative_prompt="",
57
+ num_inference_steps=4,
58
+ height=1024,
59
+ width=1024,
60
+ num_images_per_prompt = 1,
61
+ guidance_scale=1.,
62
+ generator = generator,
63
+ ).images[0]
64
+
65
+ pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
66
+ pipe.set_adapters(["tdm"], [0.]) # Unload lora
67
+ generator = torch.manual_seed(8888)
68
+ teacher_image = pipe(
69
+ prompt="A cute panda holding a sign says TDM SOTA!",
70
+ negative_prompt="",
71
+ num_inference_steps=28,
72
+ height=1024,
73
+ width=1024,
74
+ num_images_per_prompt = 1,
75
+ guidance_scale=7.,
76
+ generator = generator,
77
+ ).images[0]
78
+ make_image_grid([image,teacher_image],1,2)
79
+ ```
80
+ ![sd3_compare](sd3_compare.jpg)
81
+ The sample generated by SD3 with 56 NFE is on the right, and the sample generated by **TDM** with 4NFE is on the left. Which one do you feel is better?
82
+
83
+ ### TDM-Dreamshaper-v7-LoRA
84
+ ```python
85
+ import torch
86
+ from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler
87
+ from huggingface_hub import hf_hub_download
88
+ from safetensors.torch import load_file
89
+ repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA"
90
+ ckpt_name = "tdm_dreamshaper.pt"
91
+ pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda")
92
+ pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
93
+ pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
94
+ generator = torch.manual_seed(317)
95
+ image = pipe(
96
+ prompt="A close-up photo of an Asian lady with sunglasses",
97
+ negative_prompt="",
98
+ num_inference_steps=4,
99
+ num_images_per_prompt = 1,
100
+ generator = generator,
101
+ guidance_scale=1.,
102
+ ).images[0]
103
+ image
104
+ ```
105
+ ![tdm_dreamshaper](tdm_dreamshaper.jpg)
106
+
107
+ ## TDM-CogVideoX-2B-LoRA
108
+ ```python
109
+ import torch
110
+ from diffusers import CogVideoXPipeline
111
+ from diffusers.utils import export_to_video
112
+ pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16)
113
+ pipe.vae.enable_slicing() # Save memory
114
+ pipe.vae.enable_tiling() # Save memory
115
+ pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA")
116
+ pipe.to("cuda")
117
+ prompt = (
118
+ "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The "
119
+ "panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
120
+ "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
121
+ "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
122
+ "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
123
+ "atmosphere of this unique musical performance"
124
+ )
125
+ # We train the generator on timesteps [999, 856, 665, 399].
126
+ # The official scheduler of CogVideo-X using uniform spacing, may cause inferior results.
127
+ # But TDM-LoRA still works well under 4 NFE.
128
+ # We will update the TDM-CogVideoX-LoRA soon for better performance!
129
+ generator = torch.manual_seed(8888)
130
+ frames = pipe(prompt, guidance_scale=1,
131
+ num_inference_steps=4,
132
+ num_frames=49,
133
+ generator = generator,
134
+ use_dynamic_cfg=True).frames[0]
135
+ export_to_video(frames, "output-TDM.mp4", fps=8)
136
+ ```
137
+ ## 🔥 Pre-trained Models
138
+ We release a bucket of TDM-LoRA. Please enjoy it!
139
+ - [TDM-SD3-LoRA](https://huggingface.co/Luo-Yihong/TDM_sd3_lora)
140
+ - [TDM-CogVideoX-2B-LoRA](https://huggingface.co/Luo-Yihong/TDM_CogVideoX-2B_LoRA)
141
+ - [TDM-Dreamshaper-LoRA](https://huggingface.co/Luo-Yihong/TDM_dreamshaper_LoRA)
142
+
143
 
144
  ## Contact
145