Add library name, pipeline tag
#53
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -7,6 +7,8 @@ pinned: true
|
|
7 |
language:
|
8 |
- en
|
9 |
license: other
|
|
|
|
|
10 |
---
|
11 |
|
12 |
# LTX-Video Model Card
|
@@ -67,13 +69,13 @@ source env/bin/activate
|
|
67 |
python -m pip install -e .\[inference-script\]
|
68 |
```
|
69 |
|
70 |
-
Then, download the model from [Hugging Face](https://huggingface.co/Lightricks/LTX-Video)
|
71 |
|
72 |
```python
|
73 |
-
from huggingface_hub import
|
74 |
|
75 |
model_path = 'PATH' # The local directory to save downloaded checkpoint
|
76 |
-
|
77 |
```
|
78 |
|
79 |
#### Inference
|
@@ -83,13 +85,13 @@ To use our model, please follow the inference code in [inference.py](https://git
|
|
83 |
##### For text-to-video generation:
|
84 |
|
85 |
```bash
|
86 |
-
python inference.py --
|
87 |
```
|
88 |
|
89 |
##### For image-to-video generation:
|
90 |
|
91 |
```bash
|
92 |
-
python inference.py --
|
93 |
```
|
94 |
|
95 |
### Diffusers π§¨
|
@@ -162,4 +164,124 @@ Diffusers also supports directly loading from the original LTX checkpoints using
|
|
162 |
- This model is not intended or able to provide factual information.
|
163 |
- As a statistical model this checkpoint might amplify existing societal biases.
|
164 |
- The model may fail to generate videos that matches the prompts perfectly.
|
165 |
-
- Prompt following is heavily influenced by the prompting-style.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
language:
|
8 |
- en
|
9 |
license: other
|
10 |
+
library_name: diffusers
|
11 |
+
pipeline_tag: any-to-any
|
12 |
---
|
13 |
|
14 |
# LTX-Video Model Card
|
|
|
69 |
python -m pip install -e .\[inference-script\]
|
70 |
```
|
71 |
|
72 |
+
Then, download the model from [Hugging Face](https://huggingface.co/Lightricks/LTX-Video)
|
73 |
|
74 |
```python
|
75 |
+
from huggingface_hub import hf_hub_download
|
76 |
|
77 |
model_path = 'PATH' # The local directory to save downloaded checkpoint
|
78 |
+
hf_hub_download(repo_id="Lightricks/LTX-Video", filename="ltx-video-2b-v0.9.safetensors", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
|
79 |
```
|
80 |
|
81 |
#### Inference
|
|
|
85 |
##### For text-to-video generation:
|
86 |
|
87 |
```bash
|
88 |
+
python inference.py --ckpt_path 'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
|
89 |
```
|
90 |
|
91 |
##### For image-to-video generation:
|
92 |
|
93 |
```bash
|
94 |
+
python inference.py --ckpt_path 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED
|
95 |
```
|
96 |
|
97 |
### Diffusers π§¨
|
|
|
164 |
- This model is not intended or able to provide factual information.
|
165 |
- As a statistical model this checkpoint might amplify existing societal biases.
|
166 |
- The model may fail to generate videos that matches the prompts perfectly.
|
167 |
+
- Prompt following is heavily influenced by the prompting-style.
|
168 |
+
|
169 |
+
## Community Contribution
|
170 |
+
|
171 |
+
### ComfyUI-LTXTricks π οΈ
|
172 |
+
|
173 |
+
A community project providing additional nodes for enhanced control over the LTX Video model. It includes implementations of advanced techniques like RF-Inversion, RF-Edit, FlowEdit, and more. These nodes enable workflows such as Image and Video to Video (I+V2V), enhanced sampling via Spatiotemporal Skip Guidance (STG), and interpolation with precise frame settings.
|
174 |
+
|
175 |
+
- **Repository:** [ComfyUI-LTXTricks](https://github.com/logtd/ComfyUI-LTXTricks)
|
176 |
+
- **Features:**
|
177 |
+
- π **RF-Inversion:** Implements [RF-Inversion](https://rf-inversion.github.io/) with an [example workflow here](https://github.com/logtd/ComfyUI-LTXTricks/blob/main/example_workflows/example_ltx_inversion.json).
|
178 |
+
- βοΈ **RF-Edit:** Implements [RF-Solver-Edit](https://github.com/wangjiangshan0725/RF-Solver-Edit) with an [example workflow here](https://github.com/logtd/ComfyUI-LTXTricks/blob/main/example_workflows/example_ltx_rf_edit.json).
|
179 |
+
- π **FlowEdit:** Implements [FlowEdit](https://github.com/fallenshock/FlowEdit) with an [example workflow here](https://github.com/logtd/ComfyUI-LTXTricks/blob/main/example_workflows/example_ltx_flow_edit.json).
|
180 |
+
- π₯ **I+V2V:** Enables Video to Video with a reference image. [Example workflow](https://github.com/logtd/ComfyUI-LTXTricks/blob/main/example_workflows/example_ltx_iv2v.json).
|
181 |
+
- β¨ **Enhance:** Partial implementation of [STGuidance](https://junhahyung.github.io/STGuidance/). [Example workflow](https://github.com/logtd/ComfyUI-LTXTricks/blob/main/example_workflows/example_ltxv_stg.json).
|
182 |
+
- πΌοΈ **Interpolation and Frame Setting:** Nodes for precise control of latents per frame. [Example workflow](https://github.com/logtd/ComfyUI-LTXTricks/blob/main/example_workflows/example_ltx_interpolation.json).
|
183 |
+
|
184 |
+
|
185 |
+
### LTX-VideoQ8 π±
|
186 |
+
|
187 |
+
**LTX-VideoQ8** is an 8-bit optimized version of [LTX-Video](https://github.com/Lightricks/LTX-Video), designed for faster performance on NVIDIA ADA GPUs.
|
188 |
+
|
189 |
+
- **Repository:** [LTX-VideoQ8](https://github.com/KONAKONA666/LTX-Video)
|
190 |
+
- **Features:**
|
191 |
+
- π Up to 3X speed-up with no accuracy loss
|
192 |
+
- π₯ Generate 720x480x121 videos in under a minute on RTX 4060 (8GB VRAM)
|
193 |
+
- π οΈ Fine-tune 2B transformer models with precalculated latents
|
194 |
+
- **Community Discussion:** [Reddit Thread](https://www.reddit.com/r/StableDiffusion/comments/1h79ks2/fast_ltx_video_on_rtx_4060_and_other_ada_gpus/)
|
195 |
+
|
196 |
+
### Your Contribution
|
197 |
+
|
198 |
+
...is welcome! If you have a project or tool that integrates with LTX-Video,
|
199 |
+
please let us know by opening an issue or pull request.
|
200 |
+
|
201 |
+
# Training
|
202 |
+
|
203 |
+
## Diffusers
|
204 |
+
|
205 |
+
Diffusers implemented [LoRA support](https://github.com/huggingface/diffusers/pull/10228),
|
206 |
+
with a training script for fine-tuning.
|
207 |
+
More information and training script in
|
208 |
+
[finetrainers](https://github.com/a-r-r-o-w/finetrainers?tab=readme-ov-file#training).
|
209 |
+
|
210 |
+
## Diffusion-Pipe
|
211 |
+
|
212 |
+
An experimental training framework with pipeline parallelism, enabling fine-tuning of large models like **LTX-Video** across multiple GPUs.
|
213 |
+
|
214 |
+
- **Repository:** [Diffusion-Pipe](https://github.com/tdrussell/diffusion-pipe)
|
215 |
+
- **Features:**
|
216 |
+
- π οΈ Full fine-tune support for LTX-Video using LoRA
|
217 |
+
- π Useful metrics logged to Tensorboard
|
218 |
+
- π Training state checkpointing and resumption
|
219 |
+
- β‘ Efficient pre-caching of latents and text embeddings for multi-GPU setups
|
220 |
+
|
221 |
+
|
222 |
+
# Join Us π
|
223 |
+
|
224 |
+
Want to work on cutting-edge AI research and make a real impact on millions of users worldwide?
|
225 |
+
|
226 |
+
At **Lightricks**, an AI-first company, weβre revolutionizing how visual content is created.
|
227 |
+
|
228 |
+
If you are passionate about AI, computer vision, and video generation, we would love to hear from you!
|
229 |
+
|
230 |
+
Please visit our [careers page](https://careers.lightricks.com/careers?query=&office=all&department=R%26D) for more information.
|
231 |
+
|
232 |
+
# Acknowledgement
|
233 |
+
|
234 |
+
We are grateful for the following awesome projects when implementing LTX-Video:
|
235 |
+
* [DiT](https://github.com/facebookresearch/DiT) and [PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha): vision transformers for image generation.
|
236 |
+
|
237 |
+
## Citation
|
238 |
+
|
239 |
+
π Our tech report is out! If you find our work helpful, please βοΈ star the repository and cite our paper.
|
240 |
+
|
241 |
+
```
|
242 |
+
@article{HaCohen2024LTXVideo,
|
243 |
+
title={LTX-Video: Realtime Video Latent Diffusion},
|
244 |
+
author={HaCohen, Yoav and Chiprut, Nisan and Brazowski, Benny and Shalem, Daniel and Moshe, Dudu and Richardson, Eitan and Levin, Eran and Shiran, Guy and Zabari, Nir and Gordon, Ori and Panet, Poriya and Weissbuch, Sapir and Kulikov, Victor and Bitterman, Yaki and Melumian, Zeev and Bibi, Ofir},
|
245 |
+
journal={arXiv preprint arXiv:2501.00103},
|
246 |
+
year={2024}
|
247 |
+
}
|
248 |
+
```
|
249 |
+
## reasoning
|
250 |
+
The model card already has the license, but is missing the `pipeline_tag` and `library_name`. Given the description in the README and the Github README,
|
251 |
+
it takes text as input and generates videos as output, hence we can add `pipeline_tag: text-to-video`. From the code snippets, it's also clear that
|
252 |
+
the model is compatible with the `diffusers` library, hence we can add that as `library_name`. We also add the link to the paper.
|
253 |
+
|
254 |
+
## Title
|
255 |
+
Add pipeline tag, library name
|
256 |
+
|
257 |
+
## Comment
|
258 |
+
This PR adds the `pipeline_tag` and `library_name` to the model card, such that it can be found with the tag at https://huggingface.co/models?pipeline_tag=text-to-video.
|
259 |
+
It also adds a link to the paper at https://huggingface.co/papers/2501.00103.
|
260 |
+
|
261 |
+
## Content
|
262 |
+
```yaml
|
263 |
+
---
|
264 |
+
tags:
|
265 |
+
- ltx-video
|
266 |
+
- text-to-video
|
267 |
+
- image-to-video
|
268 |
+
pinned: true
|
269 |
+
language:
|
270 |
+
- en
|
271 |
+
license: other
|
272 |
+
library_name: diffusers
|
273 |
+
pipeline_tag: text-to-video
|
274 |
+
---
|
275 |
+
# LTX-Video Model Card
|
276 |
+
This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
|
277 |
+
|
278 |
+
LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
|
279 |
+
We provide a model for both text-to-video as well as image+text-to-video usecases
|
280 |
+
|
281 |
+
<img src="./media/trailer.gif" alt="trailer" width="512">
|
282 |
+
|
283 |
+
|
284 |
+
| | | | |
|
285 |
+
|:---:|:---:|:---:|:---:|
|
286 |
+
| ![example1](./media/ltx-video_example_00001.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A woman with long brown hair and light skin smiles at another woman...</summary>A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.</details> | ![example2](./media/ltx-video_example_00002.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A woman walks away from a white Jeep parked on a city street at night...</summary>A woman walks away from a white Jeep parked on a city street at night, then ascends a staircase and knocks on a door. The woman, wearing a dark jacket and jeans, walks away from the Jeep parked on the left side of the street, her back to the camera; she walks at a steady pace, her arms swinging slightly by her sides; the street is dimly lit, with streetlights casting pools of light on the wet pavement; a man in a dark jacket and jeans walks past the Jeep in the opposite direction; the camera follows the woman from behind as she walks up a set of stairs towards a building with a green door; she reaches the top of the stairs and turns left, continuing to walk towards the building; she reaches the door and knocks on it with her right hand; the camera remains stationary, focused on the doorway; the scene is captured in real-life footage.</details> | ![example3](./media/ltx-video_example_00003.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A woman with blonde hair styled up, wearing a black dress...</summary>A woman with blonde hair styled up, wearing a black dress with sequins and pearl earrings, looks down with a sad expression on her face. The camera remains stationary, focused on the woman's face. The lighting is dim, casting soft shadows on her face. The scene appears to be from a movie or TV show.</details> | ![example4](./media/ltx-video_example_00004.gif)<br><details style="max-width: 300px; margin: auto;"><summary>The camera pans over a snow-covered mountain range...</summary>The camera pans over a snow-covered mountain range, revealing a vast expanse of snow-capped peaks and valleys.The mountains are covered in a thick layer of snow, with some areas appearing almost white while others have a slightly darker, almost grayish hue. The peaks are jagged and irregular, with some rising sharply into the sky while others are more rounded. The valleys are deep and narrow, with steep slopes that are also covered in snow. The trees in the foreground are mostly bare, with only a few leaves remaining on their branches. The sky is overcast, with thick clouds obscuring the sun. The overall impression is one of peace and tranquility, with the snow-covered mountains standing as a testament to the power and beauty of nature.</details> |
|
287 |
+
| ![example5](./media/ltx-video_example_00005.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A woman with light skin, wearing a blue jacket and a black hat...</summary>A woman with light skin, wearing a blue jacket and a black hat with a veil, looks down and to her right, then back up as she speaks; she has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her jacket; the camera remains stationary on her face as she speaks; the background is out of focus, but shows trees and people in period clothing; the scene is captured in real-life footage.</details> | ![example6](./media/ltx-video_example_00006.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A man in a dimly lit room talks on a vintage telephone...</summary>A man in a dimly lit room talks on a vintage telephone, hangs up, and looks down with a sad expression. He holds the black rotary phone to his right ear with his right hand, his left hand holding a rocks glass with amber liquid. He wears a brown suit jacket over a white shirt, and a gold ring on his left ring finger. His short hair is neatly combed, and he has light skin with visible wrinkles around his eyes. The camera remains stationary, focused on his face and upper body. The room is dark, lit only by a warm light source off-screen to the left, casting shadows on the wall behind him. The scene appears to be from a movie.</details> | ![example7](./media/ltx-video_example_00007.gif)<br><details style="max-width: 300px; margin: auto;"><summary>A prison guard unlocks and opens a cell door...</summary>A prison guard unlocks and opens a cell door to reveal a young man sitting at a table with a woman. The guard, wearing a dark blue uniform with a badge on his left chest, unlocks the cell door with a key held in his right hand and pulls it open; he has short brown hair, light skin, and a neutral expression. The young man, wearing a black and white striped shirt, sits at a table covered with a white tablecloth, facing the woman; he has short brown hair, light skin, and a neutral expression. The woman, wearing a dark blue
|