Safetensors
flowpoint commited on
Commit
d316f04
Β·
1 Parent(s): 49c776c

add benchmarks numbers for rtx4000ada (non-sff)

Browse files

benchmarks were run the example from the readme (without api).
no init image was used. using an init image seemed to speed up
generation on the rtx4000 at 1024x1024 by 0.16 it/s

the config-dev-offload-1-4080.json was used with the following modified keys:

"ae_dtype": "bfloat16",
"text_enc_dtype": "bfloat16",
"flow_quantization_dtype": "qfloat8",
"text_enc_quantization_dtype": "qint4",
"ae_quantization_dtype": "qfloat8",
"compile_extras": false,
"compile_blocks": false,
"offload_text_encoder": true,
"offload_vae": false,
"offload_flow": false

offloading_flow=true strangely caused an out-of memory when generating a second image.

all in all, the 4090 seems about 2.8x faster than the rtx4000ada (non-sff) which is in line with power consumption and other hardware specifications.

Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -16,6 +16,8 @@ Note:
16
  | 1024x1024 | RTX4090 | bfl codebase fp8 wo quant | 1.7 |
17
  | 1024x1024 | RTX4090 | ❌ compile blocks & extras | 2.55 |
18
  | 1024x1024 | RTX4090 | βœ… compile blocks & extras | 3.51 |
 
 
19
  | 1024x1024 | RTX6000ADA | bfl codebase | 1.74 |
20
  | 1024x1024 | RTX6000ADA | ❌ compile blocks & extras | 2.08 |
21
  | 1024x1024 | RTX6000ADA | βœ… compile blocks & extras | 2.8 |
@@ -24,6 +26,8 @@ Note:
24
  | 768x768 | RTX4090 | bfl codebase fp8 wo quant | 2.32 |
25
  | 768x768 | RTX4090 | ❌ compile blocks & extras | 4.47 |
26
  | 768x768 | RTX4090 | βœ… compile blocks & extras | 6.2 |
 
 
27
  | 768x768 | RTX6000ADA | bfl codebase | 3.01 |
28
  | 768x768 | RTX6000ADA | ❌ compile blocks & extras | 3.43 |
29
  | 768x768 | RTX6000ADA | βœ… compile blocks & extras | 4.46 |
@@ -32,6 +36,8 @@ Note:
32
  | 1024x720 | RTX4090 | bfl codebase fp8 wo quant | 3.01 |
33
  | 1024x720 | RTX4090 | ❌ compile blocks & extras | 3.6 |
34
  | 1024x720 | RTX4090 | βœ… compile blocks & extras | 4.96 |
 
 
35
  | 1024x720 | RTX6000ADA | bfl codebase | 2.37 |
36
  | 1024x720 | RTX6000ADA | ❌ compile blocks & extras | 2.87 |
37
  | 1024x720 | RTX6000ADA | βœ… compile blocks & extras | 3.78 |
 
16
  | 1024x1024 | RTX4090 | bfl codebase fp8 wo quant | 1.7 |
17
  | 1024x1024 | RTX4090 | ❌ compile blocks & extras | 2.55 |
18
  | 1024x1024 | RTX4090 | βœ… compile blocks & extras | 3.51 |
19
+ | 1024x1024 | RTX4000ADA | ❌ compile blocks & extras | 0.79 |
20
+ | 1024x1024 | RTX4000ADA | βœ… compile blocks & extras | 1.26 |
21
  | 1024x1024 | RTX6000ADA | bfl codebase | 1.74 |
22
  | 1024x1024 | RTX6000ADA | ❌ compile blocks & extras | 2.08 |
23
  | 1024x1024 | RTX6000ADA | βœ… compile blocks & extras | 2.8 |
 
26
  | 768x768 | RTX4090 | bfl codebase fp8 wo quant | 2.32 |
27
  | 768x768 | RTX4090 | ❌ compile blocks & extras | 4.47 |
28
  | 768x768 | RTX4090 | βœ… compile blocks & extras | 6.2 |
29
+ | 768x768 | RTX4000 | ❌ compile blocks & extras | 1.41 |
30
+ | 768x768 | RTX4000 | βœ… compile blocks & extras | 2.19 |
31
  | 768x768 | RTX6000ADA | bfl codebase | 3.01 |
32
  | 768x768 | RTX6000ADA | ❌ compile blocks & extras | 3.43 |
33
  | 768x768 | RTX6000ADA | βœ… compile blocks & extras | 4.46 |
 
36
  | 1024x720 | RTX4090 | bfl codebase fp8 wo quant | 3.01 |
37
  | 1024x720 | RTX4090 | ❌ compile blocks & extras | 3.6 |
38
  | 1024x720 | RTX4090 | βœ… compile blocks & extras | 4.96 |
39
+ | 1024x720 | RTX4000 | ❌ compile blocks & extras | 1.14 |
40
+ | 1024x720 | RTX4000 | βœ… compile blocks & extras | 1.78 |
41
  | 1024x720 | RTX6000ADA | bfl codebase | 2.37 |
42
  | 1024x720 | RTX6000ADA | ❌ compile blocks & extras | 2.87 |
43
  | 1024x720 | RTX6000ADA | βœ… compile blocks & extras | 3.78 |