Luo-Yihong
/

TDM

Model card Files Files and versions Community

Luo-Yihong commited on about 6 hours ago

Commit

2df0851

verified ·

1 Parent(s): 25e1a3b

Update README.md

Browse files

Files changed (1) hide show

README.md +109 -5

README.md CHANGED Viewed

@@ -31,11 +31,115 @@ Our proposed TDM can be easily extended to text-to-video.
   <img src="student.gif" alt="Student" width="100%">
 </p>
-The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, **TDM (4NFE)** can generate 25 videos, as shown below, achieving an impressive **25 times speedup  without performance degradation**. (Note: The noise in the GIF is due to compression.)
-## 🔥TODO
-- Pre-trained Models will be released soon.
 ## Contact

   <img src="student.gif" alt="Student" width="100%">
 </p>
+The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, **TDM (4NFE)** can generate 25 videos, as shown in the below, achieving an impressive **25 times speedup  without performance degradation**. (Note: The noise in the GIF is due to compression.)
+## Usage
+### TDM-SD3-LoRA
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+from diffusers.utils import make_image_grid
+pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda")
+pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA
+pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125.
+pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory.
+pipe.vae.config.shift_factor = 0.0
+pipe = pipe.to("cuda")
+pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
+pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6.
+pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
+generator = torch.manual_seed(8888)
+image = pipe(
+    prompt="A cute panda holding a sign says TDM SOTA!",
+    negative_prompt="",
+    num_inference_steps=4,
+    height=1024,
+    width=1024,
+    num_images_per_prompt = 1,
+    guidance_scale=1.,
+    generator = generator,
+).images[0]
+pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
+pipe.set_adapters(["tdm"], [0.]) # Unload lora
+generator = torch.manual_seed(8888)
+teacher_image = pipe(
+    prompt="A cute panda holding a sign says TDM SOTA!",
+    negative_prompt="",
+    num_inference_steps=28,
+    height=1024,
+    width=1024,
+    num_images_per_prompt = 1,
+    guidance_scale=7.,
+    generator = generator,
+).images[0]
+make_image_grid([image,teacher_image],1,2)
+```
+![sd3_compare](sd3_compare.jpg)
+The sample generated by SD3 with 56 NFE is on the right, and the sample generated by **TDM** with 4NFE is on the left. Which one do you feel is better?
+### TDM-Dreamshaper-v7-LoRA
+```python
+import torch
+from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA"
+ckpt_name = "tdm_dreamshaper.pt"
+pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda")
+pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
+pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
+generator = torch.manual_seed(317)
+image = pipe(
+    prompt="A close-up photo of an Asian lady with sunglasses",
+    negative_prompt="",
+    num_inference_steps=4,
+    num_images_per_prompt = 1,
+    generator = generator,
+    guidance_scale=1.,
+).images[0]
+image
+```
+![tdm_dreamshaper](tdm_dreamshaper.jpg)
+## TDM-CogVideoX-2B-LoRA
+```python
+import torch
+from diffusers import CogVideoXPipeline
+from diffusers.utils import export_to_video
+pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16)
+pipe.vae.enable_slicing() # Save memory
+pipe.vae.enable_tiling() # Save memory
+pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA")
+pipe.to("cuda")
+prompt = (
+    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The "
+    "panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
+    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
+    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
+    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
+    "atmosphere of this unique musical performance"
+)
+# We train the generator on timesteps [999, 856, 665, 399].
+# The official scheduler of CogVideo-X using uniform spacing, may cause inferior results.
+# But TDM-LoRA still works well under 4 NFE.
+# We will update the TDM-CogVideoX-LoRA soon for better performance!
+generator = torch.manual_seed(8888)
+frames = pipe(prompt, guidance_scale=1,
+              num_inference_steps=4,
+              num_frames=49,
+              generator = generator,
+              use_dynamic_cfg=True).frames[0]
+export_to_video(frames, "output-TDM.mp4", fps=8)
+```
+## 🔥 Pre-trained Models
+We release a bucket of TDM-LoRA. Please enjoy it!
+- [TDM-SD3-LoRA](https://huggingface.co/Luo-Yihong/TDM_sd3_lora)
+- [TDM-CogVideoX-2B-LoRA](https://huggingface.co/Luo-Yihong/TDM_CogVideoX-2B_LoRA)
+- [TDM-Dreamshaper-LoRA](https://huggingface.co/Luo-Yihong/TDM_dreamshaper_LoRA)
 ## Contact