Thanks and suggestion

#1
by set-soft - opened

Hi @silveroxides !
Thanks for creating the FP8 version of the model. I tested it and seems to work quite close to the original, even when doing a quantization (again) to save VRAM.

I have a suggestion: Comment about it to @Shitao I already mentioned it here: https://github.com/VectorSpaceLab/OmniGen/issues/108

Your FP8 version allowed me to start testing much faster (slow internet connection) and will help me to save disk space if I want to keep the model around.

I'm quite ignorant about the PyTorch and NN stuff, so I have some dumb questions:

  1. Is all the data encoded in FP8 in this file? Or just some of the layers? The code for quantization in VRAM only applies it to nn.Linear layers
  2. The fact that the file itself is in FP8 doesn't mean the model uses 8 bits when loaded in memory, is that correct?
  3. Is it possible to load the model in memory and keep it small?

The code I'm testing applies 8 bits quantization using int8, this helps to save a lot of VRAM. But I wonder if its possible to have a model on disk that is already encoded using int8. So it saves disk space, VRAM and quantization time.

Note: You can see that I ignore a lot about the topic ;-)

I just did a super rough dtype conversion.
the used saftle at GitHub is currently working on a fork of the ComfyUI nodes that will include quantization loading. Will see what more will come in the future.

to save using different dtype you generally only have to put '.to(torch.float8_e4m3fn)' connected to the end of the function that loads the model then save it using save_file imported from safetensors.torch module. I usually load model with safe_open function imported from Safetensors

I'm using it with these nodes: https://github.com/set-soft/ComfyUI_OmniGen_Nodes/

No idea how they compare to https://github.com/AIFSH/OmniGen-ComfyUI

Sign up or log in to comment