kuprel commited on
Commit
d4779c4
1 Parent(s): 4bf3f44

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # min(DALL路E)
2
+
3
+ [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kuprel/min-dalle/blob/main/min_dalle.ipynb)
4
+
5
+ This is a fast, minimal port of Boris Dayma's [DALL路E Mega](https://github.com/borisdayma/dalle-mini). It has been stripped down for inference and converted to PyTorch. The only third party dependencies are numpy, requests, pillow and torch.
6
+
7
+ To generate a 5x5 grid of DALL路E Mega images it takes:
8
+ - 48 sec with a P100 in Colab
9
+ - 16 sec with an A100 on Replicate
10
+
11
+ The flax model and code for converting it to torch can be found [here](https://github.com/kuprel/min-dalle-flax).
12
+
13
+ ## Install
14
+
15
+ ```bash
16
+ $ pip install min-dalle
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ Load the model parameters once and reuse the model to generate multiple images.
22
+
23
+ ```python
24
+ from min_dalle import MinDalle
25
+
26
+ model = MinDalle(
27
+ models_root='./pretrained',
28
+ dtype=torch.float32,
29
+ is_mega=True,
30
+ is_reusable=True
31
+ )
32
+ ```
33
+
34
+ The required models will be downloaded to `models_root` if they are not already there. Set the `dtype` to `torch.float16` to save GPU memory. If you have an Ampere architecture GPU you can use `torch.bfloat16`. Once everything has finished initializing, call `generate_image` with some text as many times as you want. Use a positive `seed` for reproducible results. Higher values for `log2_supercondition_factor` result in better agreement with the text but a narrower variety of generated images. Every image token is sampled from the top-$k$ most probable tokens.
35
+
36
+ ```python
37
+ image = model.generate_image(
38
+ text='Nuclear explosion broccoli',
39
+ seed=-1,
40
+ grid_size=4,
41
+ log2_k=6,
42
+ log2_supercondition_factor=5,
43
+ is_verbose=False
44
+ )
45
+
46
+ display(image)
47
+ ```
48
+ <img src="https://github.com/kuprel/min-dalle/raw/main/examples/nuclear_broccoli.jpg" alt="min-dalle" width="400"/>
49
+ credit: https://twitter.com/hardmaru/status/1544354119527596034
50
+
51
+
52
+ ### Saving Individual Images
53
+ The images can also be generated as a `FloatTensor` in case you want to process them manually.
54
+
55
+ ```python
56
+ images = model.generate_images(
57
+ text='Nuclear explosion broccoli',
58
+ seed=-1,
59
+ image_count=7,
60
+ log2_k=6,
61
+ log2_supercondition_factor=5,
62
+ is_verbose=False
63
+ )
64
+ ```
65
+
66
+ To get an image into PIL format you will have to first move the images to the CPU and convert the tensor to a numpy array.
67
+ ```python
68
+ images = images.to('cpu').numpy()
69
+ ```
70
+ Then image $i$ can be coverted to a PIL.Image and saved
71
+ ```python
72
+ image = Image.fromarray(images[i])
73
+ image.save('image_{}.png'.format(i))
74
+ ```
75
+
76
+ ### Interactive
77
+
78
+ If the model is being used interactively (e.g. in a notebook) `generate_image_stream` can be used to generate a stream of images as the model is decoding. The detokenizer adds a slight delay for each image. Setting `log2_mid_count` to 3 results in a total of `2 ** 3 = 8` generated images. The only valid values for `log2_mid_count` are 0, 1, 2, 3, and 4. This is implemented in the colab.
79
+
80
+ ```python
81
+ image_stream = model.generate_image_stream(
82
+ text='Dali painting of WALL路E',
83
+ seed=-1,
84
+ grid_size=3,
85
+ log2_mid_count=3,
86
+ log2_k=6,
87
+ log2_supercondition_factor=3,
88
+ is_verbose=False
89
+ )
90
+
91
+ for image in image_stream:
92
+ display(image)
93
+ ```
94
+ <img src="https://github.com/kuprel/min-dalle/raw/main/examples/dali_walle_animated.gif" alt="min-dalle" width="300"/>
95
+
96
+ ### Command Line
97
+
98
+ Use `image_from_text.py` to generate images from the command line.
99
+
100
+ ```bash
101
+ $ python image_from_text.py --text='artificial intelligence' --no-mega
102
+ ```
103
+ <img src="https://github.com/kuprel/min-dalle/raw/main/examples/artificial_intelligence.jpg" alt="min-dalle" width="200"/>