Boris Dayma
commited on
Commit
•
3690afe
1
Parent(s):
30fa77a
doc(README): fix typo's
Browse files
README.md
CHANGED
@@ -25,11 +25,11 @@ The system relies on the Flax/JAX infrastructure, which are ideal for TPU traini
|
|
25 |
|
26 |
The main components of the architecture include:
|
27 |
|
28 |
-
* An encoder, based on [BART](https://arxiv.org/abs/1910.13461). The encoder
|
29 |
|
30 |
-
* A decoder,
|
31 |
|
32 |
-
The model definition we use for the encoder can be downloaded from our [Github repo](https://github.com/borisdayma/dalle-mini). The encoder is
|
33 |
|
34 |
To use the decoder, you need to follow the instructions in our accompanying VQGAN model in the hub, [flax-community/vqgan_f16_16384](https://huggingface.co/flax-community/vqgan_f16_16384).
|
35 |
|
@@ -37,8 +37,8 @@ To use the decoder, you need to follow the instructions in our accompanying VQGA
|
|
37 |
|
38 |
The easiest way to get familiar with the code and the models is to follow the inference notebook we provide in our [github repo](https://github.com/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb). For your convenience, you can open it in Google Colaboratory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb)
|
39 |
|
40 |
-
If you just want to test the trained model and see what it comes up with, please visit [our demo](https://huggingface.co/spaces/flax-community/dalle-mini), available
|
41 |
|
42 |
### Additional Details
|
43 |
|
44 |
-
Our [report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA) contains
|
|
|
25 |
|
26 |
The main components of the architecture include:
|
27 |
|
28 |
+
* An encoder, based on [BART](https://arxiv.org/abs/1910.13461). The encoder transforms a sequence of input text tokens to a sequence of image tokens. The input tokens are extracted from the text prompt by using the model's tokenizer. The image tokens are a fixed-length sequence, and they represent indices in a VQGAN-based pre-trained codebook.
|
29 |
|
30 |
+
* A decoder, which converts the image tokens to image pixels. As mentioned above, the decoder is based on a [VQGAN model](https://compvis.github.io/taming-transformers/).
|
31 |
|
32 |
+
The model definition we use for the encoder can be downloaded from our [Github repo](https://github.com/borisdayma/dalle-mini). The encoder is represented by the class `CustomFlaxBartForConditionalGeneration`.
|
33 |
|
34 |
To use the decoder, you need to follow the instructions in our accompanying VQGAN model in the hub, [flax-community/vqgan_f16_16384](https://huggingface.co/flax-community/vqgan_f16_16384).
|
35 |
|
|
|
37 |
|
38 |
The easiest way to get familiar with the code and the models is to follow the inference notebook we provide in our [github repo](https://github.com/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb). For your convenience, you can open it in Google Colaboratory: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/dev/inference/inference_pipeline.ipynb)
|
39 |
|
40 |
+
If you just want to test the trained model and see what it comes up with, please visit [our demo](https://huggingface.co/spaces/flax-community/dalle-mini), available in 🤗 Spaces.
|
41 |
|
42 |
### Additional Details
|
43 |
|
44 |
+
Our [report](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA) contains more details about how the model was trained and shows many examples that demonstrate its capabilities.
|