|
--- |
|
base_model: |
|
- THUDM/CogVideoX-5b |
|
tags: |
|
- LoRA |
|
- cogvideo |
|
--- |
|
|
|
# Arcane CogVideoX LoRA v1 |
|
|
|
<video width="100%" height="auto" controls> |
|
<source src="https://huggingface.co/Cseti/CogVideoX1.0-LoRA-Arcane-v1/resolve/main/csetiarcane_comparison_00017.mp4" type="video/mp4"> |
|
</video> |
|
<b>Prompt:</b> "CSETIARCANE. The video is focusing on girl with blue hair. Her expression transitions subtly across the frames, starting from a somber or contemplative look and progressing to one that appears more engaged or possibly surprised. The lighting is dim and moody, emphasizing shadows and creating a dramatic effect around the character's features. The background remains consistently dark throughout the sequence, ensuring the focus stays solely on the character's changing expressions." |
|
|
|
## Important Notes: |
|
|
|
This CogVideoX LoRA is created as part of a <b>fan project</b> for <b>research purposes</b> only and is <b>not</b> intended for commercial use. It is based on the TV series called Arcane which are protected by copyright. Users utilize the model at their own risk. Users are obligated to comply with copyright laws and applicable regulations. The model has been developed for non-commercial purposes, and it is not my intention to infringe on any copyright. I assume no responsibility for any damages or legal consequences arising from the use of the model. |
|
|
|
- Despite the LoRA started to learn the appearance of the main characters it still doesn't able to reproduce them accurately. So don't expect to see your beloved characters. |
|
- For the best quality do not use quantization (like fp8) or sage attention |
|
|
|
## Compatibility: |
|
- CogVideoX 1.0 (It does not work with CogVideox 1.5 models) |
|
|
|
## ID Token / Trigger word(s): |
|
|
|
Use these in your prompt helps providing the style. See example prompt above. |
|
- csetiarcane |
|
|
|
## Character Tokens: |
|
I used tokens to tag characters. Unfortunately, it didn't properly learn their appearances, so don't expect to see your favorite characters. However, since it associates certain character features with the tokens, it might help with character consistency. |
|
|
|
| Token | Character | |
|
|--------|-----------| |
|
| nfheimerdinger | heimerdinger | |
|
| nfvi | vi | |
|
| nfjinx | jinx | |
|
| nfcaitlyn | caitlyn | |
|
| nfekko | ekko | |
|
| nfjayce | jayce | |
|
| nfmel | mel | |
|
| nfsilco | silco | |
|
| nfmarcus | marcus | |
|
| nfcassandra | cassandra | |
|
| nfsinged | singed | |
|
| nfviktor | viktor | |
|
| nfsevika | sevika | |
|
| nfenforcer | enforcers | |
|
| nffinn | finn | |
|
| nfrenni | renni | |
|
|
|
## <u><b>Inference:</u></b> |
|
|
|
You can use the finetuned model for inference with the following code: |
|
|
|
```python |
|
import torch |
|
from diffusers import CogVideoXPipeline |
|
from diffusers import export_to_video |
|
|
|
pipe = CogVideoXPipeline.from_pretrained( |
|
"THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16 |
|
).to("cuda") |
|
|
|
pipe.load_lora_weights("Cseti/CogVideoX1.0-LoRA-Arcane", adapter_name=["cogvideox-lora"]) |
|
pipe.set_adapters(["cogvideox-lora"], [1.0]) |
|
|
|
video = pipe("CSETIARCANE. The video is focusing on girl with blue hair. Her expression transitions subtly across the frames, starting from a somber or contemplative look and progressing to one that appears more engaged or possibly surprised. The lighting is dim and moody, emphasizing shadows and creating a dramatic effect around the character's features. The background remains consistently dark throughout the sequence, ensuring the focus stays solely on the character's changing expressions.").frames[0] |
|
export_to_video(video, "output.mp4", fps=8) |
|
|
|
``` |
|
|
|
## Acknowledgment: |
|
|
|
- Thanks to the [CogVideo team](https://github.com/a-r-r-o-w/cogvideox-factory) for making these great models available |
|
- Thanks to [A-R-R-O-W](https://github.com/a-r-r-o-w) for his [CogVideoX-Factory](https://github.com/a-r-r-o-w/cogvideox-factory) that helps us making CogVideo LoRAs more easier. |
|
- Thanks to [Kijai](https://github.com/kijai) for his great [ComfyUI integration](https://github.com/kijai/ComfyUI-CogVideoXWrapper) |
|
- Thanks to [POM](https://huggingface.co/peteromallet) for providing the computing resources. Without this, these LoRAs could not have been created. |
|
|
|
## Trainig details: |
|
- LR: 1e-4 |
|
- Optimizer: adamw |
|
- Scheduler: cosine_with_restarts |
|
- steps: 17000 |
|
- dataset: 136 (49x720x480) videos |
|
- rank / alpha: 128 / 128 |
|
- batch size: 1 |
|
|
|
## Citation |
|
``` |
|
@article{yang2024cogvideox, |
|
title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer}, |
|
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others}, |
|
journal={arXiv preprint arXiv:2408.06072}, |
|
year={2024} |
|
} |
|
@article{hong2022cogvideo, |
|
title={CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers}, |
|
author={Hong, Wenyi and Ding, Ming and Zheng, Wendi and Liu, Xinghan and Tang, Jie}, |
|
journal={arXiv preprint arXiv:2205.15868}, |
|
year={2022} |
|
} |
|
``` |