thanks to kabachuha ❤

Files changed (5) hide show

README.md ADDED Viewed

+---
+license: cc-by-nc-4.0
+task_categories:
+  - text-to-video
+language:
+  - en
+tags:
+  - anime
+---
+This is https://huggingface.co/datasets/strangeman3107/animov-0.1 model by strangeman3107 that was converted by [me](https://github.com/kabachuha) into the ModelScope original format using this script https://github.com/ExponentialML/Text-To-Video-Finetuning/pull/52.
+Ready to use in Auto1111 webui with this extension https://github.com/deforum-art/sd-webui-text2video
+---
+Now, copyting info from the original page
+This is a text2video model for diffusers, fine-tuned with a [modelscope](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b) to have an anime-style appearance.
+It was trained at 384x384 resolution.
+It still generates unstable content often. The usage is the same as with the original modelscope model.
+example images are [here](https://imgur.com/a/sCwmKG1).

VQGAN_autoencoder.pth ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:930e9865584beae2405d29bc06a05db3bb6a5b34eedd40a7db29b9156ed7d098
+size 2607657443

configuration.json ADDED Viewed

+{   "framework": "pytorch",
+    "task": "text-to-video-synthesis",
+    "model": {
+        "type": "latent-text-to-video-synthesis",
+        "model_args": {
+            "ckpt_clip": "open_clip_pytorch_model.bin",
+            "ckpt_unet": "text2video_pytorch_model.pth",
+            "ckpt_autoencoder": "VQGAN_autoencoder.pth",
+            "max_frames": 16,
+            "tiny_gpu": 1
+        },
+        "model_cfg": {
+            "unet_in_dim": 4,
+            "unet_dim": 320,
+            "unet_y_dim": 768,
+            "unet_context_dim": 1024,
+            "unet_out_dim": 4,
+            "unet_dim_mult": [1, 2, 4, 4],
+            "unet_num_heads": 8,
+            "unet_head_dim": 64,
+            "unet_res_blocks": 2,
+            "unet_attn_scales": [1, 0.5, 0.25],
+            "unet_dropout": 0.1,
+            "temporal_attention": "True",
+            "num_timesteps": 1000,
+            "mean_type": "eps",
+            "var_type": "fixed_small",
+            "loss_type": "mse"
+        }
+    },
+    "pipeline": {
+        "type": "latent-text-to-video-synthesis"
+    }
+}

open_clip_pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:73c32c62eebf1112b0693ff9e3ecfa0573ba02cd279420ea4da4af1cbfb39e3b
+size 1972451989

text2video_pytorch_model.pth ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b710dd73d49c598b339ea9c76c78750fa38e6477793f6373be18087dbe9740c
+size 2822972283