|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- WenhaoWang/VidProM |
|
language: |
|
- en |
|
pipeline_tag: text-to-image |
|
tags: |
|
- text-to-video generation |
|
- VidProM |
|
- Automatical text-to-video prompt |
|
--- |
|
|
|
# The first model for automatic text-to-video prompt completion: Given a few words as input, the model will generate a few whole text-to-video prompts. |
|
|
|
# Details |
|
|
|
It is fine-tuned on the [VidProM](https://huggingface.co/datasets/WenhaoWang/VidProM) dataset using [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and 8 A100 GPUs. |
|
|
|
# Usage |
|
|
|
## Download the model |
|
``` |
|
from transformers import pipeline |
|
import torch |
|
pipe = pipeline("text-generation", model="WenhaoWang/AutoT2VPrompt", model_kwargs={"torch_dtype": torch.bfloat16}, device_map="cuda:0") |
|
``` |
|
|
|
## Set the Parameters |
|
``` |
|
input = "An underwater world" # The input text to generate text-to-video prompt. |
|
max_length = 50 # The maximum length of the generated text. |
|
temperature = 1.2 # Controls the randomness of the generation. Higher values lead to more random outputs. |
|
top_k = 8 # Limits the number of words considered at each step to the top k most likely words. |
|
num_return_sequences = 10 # The number of different text-to-video prompts to generate from the same input. |
|
``` |
|
|
|
## Generation |
|
``` |
|
all_prompts = pipe(input, max_length = max_length, do_sample = True, temperature = temperature, top_k = top_k, num_return_sequences=num_return_sequences) |
|
|
|
def process(text): |
|
text = text.replace('\n', '.') |
|
text = text.replace(' .', '.') |
|
text = text[:text.rfind('.')] |
|
text = text + '.' |
|
return text |
|
|
|
for i in range(num_return_sequences): |
|
print(process(all_prompts[i]['generated_text'])) |
|
``` |
|
|
|
You will get 10 text-to-video prompts, and you can pick one you like most. |
|
|
|
``` |
|
An underwater world, 25 ye boy, with aqua-green eyes, dk sandy blond hair, from the back, and on his back a fish, 23 ye old, weing glasses,ctoon chacte. |
|
An underwater world, the video should capture the essence of tranquility and the beauty of nature.. a woman with short hair weing a green dress sitting at the desk. |
|
An underwater world, the ocean is full of discded items, the water flows, and the light penetrating through the water. |
|
An underwater world.. a woman with red eyes and red lips is looking forwd. |
|
An underwater world.. an old man sitting in a chair, smoking a pipe, a little smoke coming out of the chair, a man is drinking a glass. |
|
An underwater world. The ocean is filled with bioluminess as the water reflects a soft glow from a bioluminescent phosphorescent light source. The camera slowly moves away and zooms in.. |
|
An underwater world. the girl looks at the camera and smiles with happiness.. |
|
An underwater world, 1960s horror film.. |
|
An underwater world.. 4 men in 1940s style clothes walk ound a gothic castle. night, fe. A girl is running, and there e some flowers along the river. |
|
An underwater world, -camera pan up . A girl is playing with her cat on a sunny day in the pk. A man is running and then falling down and dying. |
|
``` |
|
|
|
# License |
|
|
|
The model is licensed under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en). |
|
|
|
# Citation |
|
``` |
|
@article{wang2024vidprom, |
|
title={VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models}, |
|
author={Wang, Wenhao and Yang, Yi}, |
|
journal={arXiv preprint arXiv:2403.06098}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
# Acknowledgment |
|
|
|
The fine-tuning process is helped by [Yaowei Zheng](https://github.com/hiyouga). |
|
|
|
# Contact |
|
|
|
If you have any questions, feel free to contact [Wenhao Wang](https://wangwenhao0716.github.io) (wangwenhao0716@gmail.com). |