add benchmarks numbers for rtx4000ada (non-sff)
Browse filesbenchmarks were run the example from the readme (without api).
no init image was used. using an init image seemed to speed up
generation on the rtx4000 at 1024x1024 by 0.16 it/s
the config-dev-offload-1-4080.json was used with the following modified keys:
"ae_dtype": "bfloat16",
"text_enc_dtype": "bfloat16",
"flow_quantization_dtype": "qfloat8",
"text_enc_quantization_dtype": "qint4",
"ae_quantization_dtype": "qfloat8",
"compile_extras": false,
"compile_blocks": false,
"offload_text_encoder": true,
"offload_vae": false,
"offload_flow": false
offloading_flow=true strangely caused an out-of memory when generating a second image.
all in all, the 4090 seems about 2.8x faster than the rtx4000ada (non-sff) which is in line with power consumption and other hardware specifications.
@@ -16,6 +16,8 @@ Note:
|
|
16 |
| 1024x1024 | RTX4090 | bfl codebase fp8 wo quant | 1.7 |
|
17 |
| 1024x1024 | RTX4090 | β compile blocks & extras | 2.55 |
|
18 |
| 1024x1024 | RTX4090 | β
compile blocks & extras | 3.51 |
|
|
|
|
|
19 |
| 1024x1024 | RTX6000ADA | bfl codebase | 1.74 |
|
20 |
| 1024x1024 | RTX6000ADA | β compile blocks & extras | 2.08 |
|
21 |
| 1024x1024 | RTX6000ADA | β
compile blocks & extras | 2.8 |
|
@@ -24,6 +26,8 @@ Note:
|
|
24 |
| 768x768 | RTX4090 | bfl codebase fp8 wo quant | 2.32 |
|
25 |
| 768x768 | RTX4090 | β compile blocks & extras | 4.47 |
|
26 |
| 768x768 | RTX4090 | β
compile blocks & extras | 6.2 |
|
|
|
|
|
27 |
| 768x768 | RTX6000ADA | bfl codebase | 3.01 |
|
28 |
| 768x768 | RTX6000ADA | β compile blocks & extras | 3.43 |
|
29 |
| 768x768 | RTX6000ADA | β
compile blocks & extras | 4.46 |
|
@@ -32,6 +36,8 @@ Note:
|
|
32 |
| 1024x720 | RTX4090 | bfl codebase fp8 wo quant | 3.01 |
|
33 |
| 1024x720 | RTX4090 | β compile blocks & extras | 3.6 |
|
34 |
| 1024x720 | RTX4090 | β
compile blocks & extras | 4.96 |
|
|
|
|
|
35 |
| 1024x720 | RTX6000ADA | bfl codebase | 2.37 |
|
36 |
| 1024x720 | RTX6000ADA | β compile blocks & extras | 2.87 |
|
37 |
| 1024x720 | RTX6000ADA | β
compile blocks & extras | 3.78 |
|
|
|
16 |
| 1024x1024 | RTX4090 | bfl codebase fp8 wo quant | 1.7 |
|
17 |
| 1024x1024 | RTX4090 | β compile blocks & extras | 2.55 |
|
18 |
| 1024x1024 | RTX4090 | β
compile blocks & extras | 3.51 |
|
19 |
+
| 1024x1024 | RTX4000ADA | β compile blocks & extras | 0.79 |
|
20 |
+
| 1024x1024 | RTX4000ADA | β
compile blocks & extras | 1.26 |
|
21 |
| 1024x1024 | RTX6000ADA | bfl codebase | 1.74 |
|
22 |
| 1024x1024 | RTX6000ADA | β compile blocks & extras | 2.08 |
|
23 |
| 1024x1024 | RTX6000ADA | β
compile blocks & extras | 2.8 |
|
|
|
26 |
| 768x768 | RTX4090 | bfl codebase fp8 wo quant | 2.32 |
|
27 |
| 768x768 | RTX4090 | β compile blocks & extras | 4.47 |
|
28 |
| 768x768 | RTX4090 | β
compile blocks & extras | 6.2 |
|
29 |
+
| 768x768 | RTX4000 | β compile blocks & extras | 1.41 |
|
30 |
+
| 768x768 | RTX4000 | β
compile blocks & extras | 2.19 |
|
31 |
| 768x768 | RTX6000ADA | bfl codebase | 3.01 |
|
32 |
| 768x768 | RTX6000ADA | β compile blocks & extras | 3.43 |
|
33 |
| 768x768 | RTX6000ADA | β
compile blocks & extras | 4.46 |
|
|
|
36 |
| 1024x720 | RTX4090 | bfl codebase fp8 wo quant | 3.01 |
|
37 |
| 1024x720 | RTX4090 | β compile blocks & extras | 3.6 |
|
38 |
| 1024x720 | RTX4090 | β
compile blocks & extras | 4.96 |
|
39 |
+
| 1024x720 | RTX4000 | β compile blocks & extras | 1.14 |
|
40 |
+
| 1024x720 | RTX4000 | β
compile blocks & extras | 1.78 |
|
41 |
| 1024x720 | RTX6000ADA | bfl codebase | 2.37 |
|
42 |
| 1024x720 | RTX6000ADA | β compile blocks & extras | 2.87 |
|
43 |
| 1024x720 | RTX6000ADA | β
compile blocks & extras | 3.78 |
|