view post Post 2712 Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio modelFull YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENVOfficial Hugging Face repo of CogVideoX1.5-5B-I2V : THUDM/CogVideoX1.5-5B-I2VOfficial github repo : https://github.com/THUDM/CogVideoUsed prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05Demo images shared in : https://www.patreon.com/posts/112848192I used 1360x768px images at 16 FPS and 81 frames = 5 seconds+1 frame coming from initial imageAlso I have enabled all the optimizations shared on Hugging Facepipe.enable_sequential_cpu_offload()pipe.vae.enable_slicing()pipe.vae.enable_tiling()quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENVUsed audio model : https://github.com/hkchengrex/MMAudio1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENVUsed very simple prompts - it fails when there is human in input video so use text to audio in such casesI also tested some VRAM usages for CogVideoX1.5-5B-I2VResolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB See translation 1 reply Β· π 8 8 π₯ 5 5 β€οΈ 4 4 π 2 2 π§ 2 2 π 1 1 π€ 1 1 π 1 1 β 1 1 π€ 1 1 π€― 1 1 + Reply