File size: 2,675 Bytes
d17a262
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c815631
 
d17a262
 
c815631
 
 
 
 
 
 
 
 
8d10e61
c815631
 
 
 
fb5dd14
 
c815631
 
 
8d10e61
 
 
 
 
 
 
 
c815631
 
 
 
 
 
 
8d10e61
c815631
 
8d10e61
 
 
 
 
fb5dd14
c815631
 
d17a262
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: apache-2.0
tags:
- text-to-image
- flux
datasets:
- DucHaiten/pony-art
- jordandavis/fashion_num_people
- mattmdjaga/human_parsing_dataset
- Voxel51/Describable-Textures-Dataset
- twodgirl/vndb
---

# Flux Latent Preview at Half-Size

The decoder provides a preview image; such thing already exists in the wild for the Flux Dev model.

Max supported resolution is between 768 and 1024px.

![](images/etoiles.png)

Retraining the [text encoder](https://huggingface.co/twodgirl/flux-text-encoder-neutered) and the VAE decoder has reduced the checkpoint size by around 10GB. This set the model's capabilities back by two years.

## Inference

```python
from diffusers import AutoencoderKL, FluxPipeline
from safetensors.torch import load_model
from tea_model import TeaDecoder
import torch
from torchvision import transforms

def preview_image(latents, pipe):
    latents = FluxPipeline._unpack_latents(latents,
                                           pipe.default_sample_size * pipe.vae_scale_factor,
                                           pipe.default_sample_size * pipe.vae_scale_factor,
                                           pipe.vae_scale_factor)
    tea = TeaDecoder(ch_in=16)
    load_model(tea, './vae_decoder.safetensors')
    tea = tea.to(device='cuda')
    output = tea(latents.to(torch.float32)) / 2.0 + 0.5
    preview = transforms.ToPILImage()(output[0].clamp(0, 1))

    return preview

def full_size_image(latents, pipe):
    latents = FluxPipeline._unpack_latents(latents,
                                           pipe.default_sample_size * pipe.vae_scale_factor,
                                           pipe.default_sample_size * pipe.vae_scale_factor,
                                           pipe.vae_scale_factor)
    latents = (latents / pipe.vae.config.scaling_factor) + pipe.vae.config.shift_factor
    latents = latents.to(dtype=pipe.vae.dtype)
    torch.cuda.empty_cache()
    pipe.vae = pipe.vae.to(device='cuda')
    pixel_values, = pipe.vae.decode(latents, return_dict=False)
    images = pipe.image_processor.postprocess(pixel_values.to('cpu'), output_type='pil')

    return images

if __name__ == '__main__':
    pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-dev')
    latents = pipe('cat playing piano', num_inference_steps=10, output_type='latent').images
    # Return the upscaled and preview image.
    upscaled = full_size_image(latents, pipe)
    preview = preview_image(latents, pipe)
    preview.save('cat.png')
```

## Disclaimer

Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.