base_model:
- THUDM/CogVideoX-5b
tags:
- LoRA
- cogvideo
This CogVideoX LoRA is created as part of a fan project for research purposes only and is not intended for commercial use. It is based on the TV series called Arcane which are protected by copyright. Users utilize the model at their own risk. Users are obligated to comply with copyright laws and applicable regulations. The model has been developed for non-commercial purposes, and it is not my intention to infringe on any copyright. I assume no responsibility for any damages or legal consequences arising from the use of the model.
Important Notes:
- Despite the LoRA started to learn the appearance of the main characters it still doesn't able to reproduce them accurately. So don't expect to see your beloved characters.
- For the best quality do not use quantization (like fp8) or sage attention
Compatibility:
- CogVideoX 1.0 (It does not work with CogVideox 1.5 models)
ID Token / Trigger word(s):
Use these in your prompt helps providing the style. See example prompt below.
- csetiarcane
Character Tokens:
I used tokens to tag characters. Unfortunately, it didn't properly learn their appearances, so don't expect to see your favorite characters. However, since it associates certain character features with the tokens, it might help with character consistency. See example prompt below.
Token | Character |
---|---|
nfheimerdinger | heimerdinger |
nfvi | vi |
nfjinx | jinx |
nfcaitlyn | caitlyn |
nfekko | ekko |
nfjayce | jayce |
nfmel | mel |
nfsilco | silco |
nfmarcus | marcus |
nfcassandra | cassandra |
nfsinged | singed |
nfviktor | viktor |
nfsevika | sevika |
nfenforcer | enforcers |
nffinn | finn |
nfrenni | renni |
Inference:
You can use the finetuned model for inference with the following code:
import torch
from diffusers import CogVideoXPipeline
from diffusers import export_to_video
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights("Cseti/CogVideoX1.0-LoRA-Arcane", adapter_name=["cogvideox-lora"])
pipe.set_adapters(["cogvideox-lora"], [1.0])
video = pipe("walgro1. The scene begins with a close-up of Gromit’s face, his expressive eyes filling the frame. His brow furrows slightly, ears perked forward in concentration. The soft lighting highlights the subtle details of his fur, every strand catching the warm sunlight filtering in from a nearby window. His dark, round nose twitches ever so slightly, sensing something in the air, and his gaze darts to the side, following an unseen movement. The camera lingers on Gromit’s face, capturing the subtleties of his expression—a quirked eyebrow and a knowing look that suggests he’s piecing together something clever. His silent, thoughtful demeanor speaks volumes as he watches the scene unfold with quiet intensity. The background remains out of focus, drawing all attention to the sharp intelligence in his eyes and the slight tilt of his head. In the claymation style of Wallace and Gromit.").frames[0]
export_to_video(video, "output.mp4", fps=8)
Acknowledgment:
- Thanks to the CogVideo team for making these great models available
- Thanks to A-R-R-O-W for his CogVideoX-Factory that helps us making CogVideo LoRAs more easier.
- Thanks to Kijai for his great ComfyUI integration
- Thanks to POM for providing the computing resources. Without this, these LoRAs could not have been created.
Examples:
Prompt: "walgro1. Gromit sits quietly in a cozy living room, the soft glow of a nearby lamp casting warm light across the room. The camera starts with a close-up of his thoughtful expression, his eyes darting toward the side, observing the subtle movement of something off-screen. A clock ticks rhythmically on the wall behind him, creating a steady backdrop to the otherwise silent room. The camera slowly pulls back to reveal the setting: a tidy space with bookshelves filled with old volumes, a comfortable armchair in the corner, and a small coffee table in the center, where a half-finished jigsaw puzzle lies scattered. The atmosphere is calm, almost serene, as Gromit glances toward the puzzle, his curiosity piqued."
Citation
@article{yang2024cogvideox,
title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer},
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
journal={arXiv preprint arXiv:2408.06072},
year={2024}
}
@article{hong2022cogvideo,
title={CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers},
author={Hong, Wenyi and Ding, Ming and Zheng, Wendi and Liu, Xinghan and Tang, Jie},
journal={arXiv preprint arXiv:2205.15868},
year={2022}
}