imdigitalashish
/

swapna_0

+---
+license: apache-2.0
+tags:
+- Kandinsky
+- text-image
+- text2image
+- diffusion
+- latent diffusion
+- mCLIP-XLMR
+- mT5
+---
+# Kandinsky 2.0
+Kandinsky 2.0 — the first multilingual text2image model.
+[Open In Colab](https://colab.research.google.com/drive/1uPg9KwGZ2hJBl9taGA_3kyKGw12Rh3ij?usp=sharing)
+[GitHub repository](https://github.com/ai-forever/Kandinsky-2.0)
+[Habr post](https://habr.com/ru/company/sberbank/blog/701162/)
+[Demo](https://rudalle.ru/)
+**UNet size: 1.2B parameters**
+![NatallE.png](https://s3.amazonaws.com/moonup/production/uploads/1669132577749-5f91b1208a61a359f44e1851.png)
+It is a latent diffusion model with two multi-lingual text encoders:
+* mCLIP-XLMR (560M parameters)
+* mT5-encoder-small (146M parameters)
+These encoders and multilingual training datasets unveil the real multilingual text2image generation experience!
+![header.png](https://s3.amazonaws.com/moonup/production/uploads/1669132825912-5f91b1208a61a359f44e1851.png)
+# How to use
+```python
+pip install "git+https://github.com/ai-forever/Kandinsky-2.0.git"
+from kandinsky2 import get_kandinsky2
+model = get_kandinsky2('cuda', task_type='text2img')
+images = model.generate_text2img('кошка в космосе', batch_size=4, h=512, w=512, num_steps=75, denoised_type='dynamic_threshold', dynamic_threshold_v=99.5, sampler='ddim_sampler', ddim_eta=0.01, guidance_scale=10)
+```
+# Authors
++ Arseniy Shakhmatov: [Github](https://github.com/cene555), [Blog](https://t.me/gradientdip)
++ Anton Razzhigaev: [Github](https://github.com/razzant), [Blog](https://t.me/abstractDL)
++ Aleksandr Nikolich: [Github](https://github.com/AlexWortega), [Blog](https://t.me/lovedeathtransformers)
++ Vladimir Arkhipkin: [Github](https://github.com/oriBetelgeuse)
++ Igor Pavlov: [Github](https://github.com/boomb0om)
++ Andrey Kuznetsov: [Github](https://github.com/kuznetsoffandrey)
++ Denis Dimitrov: [Github](https://github.com/denndimitrov)

text_encoder1/config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "XLMRobertaForMaskedLM"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "xlm-roberta",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "output_past": true,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.17.0.dev0",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 250002
+}

text_encoder1/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a7124266439dd6bce544c23b57249a2ca764dbcd38e8eab16ce272c28c27b049
+size 2242347565

text_encoder1/sentencepiece.bpe.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
+size 5069051

text_encoder1/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}

text_encoder1/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

text_encoder1/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"bos_token": "<s>", "eos_token": "</s>", "sep_token": "</s>", "cls_token": "<s>", "unk_token": "<unk>", "pad_token": "<pad>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "xlm-roberta-large", "tokenizer_class": "XLMRobertaTokenizer"}

text_encoder2/config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "architectures": [
+    "MT5ForConditionalGeneration"
+  ],
+  "d_ff": 1024,
+  "d_kv": 64,
+  "d_model": 512,
+  "decoder_start_token_id": 0,
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gated-gelu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "mt5",
+  "num_decoder_layers": 8,
+  "num_heads": 6,
+  "num_layers": 8,
+  "pad_token_id": 0,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "tokenizer_class": "T5Tokenizer",
+  "vocab_size": 250112
+}

text_encoder2/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9daa76f0231b96162833ebfee26e886a781b72e0d95ad1a1826b9147a74a939a
+size 1200794589

text_encoder2/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "additional_special_tokens": []}

text_encoder2/spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ef78f86560d809067d12bac6c09f19a462cb3af3f54d2b8acbba26e1433125d6
+size 4309802

text_encoder2/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "extra_ids": 0}

vae.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9890a9697c53045ad52965b33aadc3429d43a644c9af4c01ede3d551f3adf0a
+size 1096193273