metadata

base_model:
  - THUDM/CogVideoX-5b
tags:
  - LoRA
  - cogvideo

Arcane CogVideoX LoRA v1

Prompt: "CSETIARCANE. The video is focusing on girl with blue hair. Her expression transitions subtly across the frames, starting from a somber or contemplative look and progressing to one that appears more engaged or possibly surprised. The lighting is dim and moody, emphasizing shadows and creating a dramatic effect around the character's features. The background remains consistently dark throughout the sequence, ensuring the focus stays solely on the character's changing expressions."

Important Notes:

This CogVideoX LoRA is created as part of a fan project for research purposes only and is not intended for commercial use. It is based on the TV series called Arcane which are protected by copyright. Users utilize the model at their own risk. Users are obligated to comply with copyright laws and applicable regulations. The model has been developed for non-commercial purposes, and it is not my intention to infringe on any copyright. I assume no responsibility for any damages or legal consequences arising from the use of the model.

Despite the LoRA started to learn the appearance of the main characters it still doesn't able to reproduce them accurately. So don't expect to see your beloved characters.
For the best quality do not use quantization (like fp8) or sage attention

Compatibility:

CogVideoX 1.0 (It does not work with CogVideox 1.5 models)

ID Token / Trigger word(s):

Use these in your prompt helps providing the style. See example prompt above.

csetiarcane

Character Tokens:

I used tokens to tag characters. Unfortunately, it didn't properly learn their appearances, so don't expect to see your favorite characters. However, since it associates certain character features with the tokens, it might help with character consistency.

Token	Character
nfheimerdinger	heimerdinger
nfvi	vi
nfjinx	jinx
nfcaitlyn	caitlyn
nfekko	ekko
nfjayce	jayce
nfmel	mel
nfsilco	silco
nfmarcus	marcus
nfcassandra	cassandra
nfsinged	singed
nfviktor	viktor
nfsevika	sevika
nfenforcer	enforcers
nffinn	finn
nfrenni	renni

Inference:

You can use the finetuned model for inference with the following code:

import torch
from diffusers import CogVideoXPipeline
from diffusers import export_to_video

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16
).to("cuda")

pipe.load_lora_weights("Cseti/CogVideoX1.0-LoRA-Arcane", adapter_name=["cogvideox-lora"])
pipe.set_adapters(["cogvideox-lora"], [1.0])

video = pipe("walgro1. The scene begins with a close-up of Gromit’s face, his expressive eyes filling the frame. His brow furrows slightly, ears perked forward in concentration. The soft lighting highlights the subtle details of his fur, every strand catching the warm sunlight filtering in from a nearby window. His dark, round nose twitches ever so slightly, sensing something in the air, and his gaze darts to the side, following an unseen movement. The camera lingers on Gromit’s face, capturing the subtleties of his expression—a quirked eyebrow and a knowing look that suggests he’s piecing together something clever. His silent, thoughtful demeanor speaks volumes as he watches the scene unfold with quiet intensity. The background remains out of focus, drawing all attention to the sharp intelligence in his eyes and the slight tilt of his head. In the claymation style of Wallace and Gromit.").frames[0]
export_to_video(video, "output.mp4", fps=8)

Acknowledgment:

Thanks to the CogVideo team for making these great models available
Thanks to A-R-R-O-W for his CogVideoX-Factory that helps us making CogVideo LoRAs more easier.
Thanks to Kijai for his great ComfyUI integration
Thanks to POM for providing the computing resources. Without this, these LoRAs could not have been created.

Citation

@article{yang2024cogvideox,
  title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer},
  author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
  journal={arXiv preprint arXiv:2408.06072},
  year={2024}
}
@article{hong2022cogvideo,
  title={CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers},
  author={Hong, Wenyi and Ding, Ming and Zheng, Wendi and Liu, Xinghan and Tang, Jie},
  journal={arXiv preprint arXiv:2205.15868},
  year={2022}
}