test / benchmarks /perf.md
iblfe's picture
Upload folder using huggingface_hub
b585c7f verified

A newer version of the Gradio SDK is available: 5.9.1

Upgrade

Backend: transformers

For Interactive visualization of the results, save the linked file as html on your machine and open it in a browser.

Model: h2oai/h2ogpt-4096-llama2-7b-chat (transformers)

Number of GPUs: 0

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 CPU 1215.52 1.17546
8 CPU 1216.98 1.17641
4 CPU 1217.17 1.16575

Number of GPUs: 1

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 31.8619 41.9433
16 1 x NVIDIA GeForce RTX 4090 (24564 MiB) 32.2947 40.9252
16 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 37.1139 32.4529
16 1 x NVIDIA RTX A6000 (46068 MiB) 47.0375 29.8526
16 1 x NVIDIA GeForce RTX 3090 (24576 MiB) 67.9752 18.0571
8 1 x NVIDIA GeForce RTX 4090 (24564 MiB) 114.622 9.21246
8 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 94.1774 8.95532
8 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 181.246 7.47991
8 1 x NVIDIA RTX A6000 (46068 MiB) 148.616 6.61984
8 1 x NVIDIA GeForce RTX 3090 (24576 MiB) 185.146 4.35807
4 1 x NVIDIA GeForce RTX 4090 (24564 MiB) 39.544 32.571
4 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 42.8067 32.3408
4 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 53.3973 23.3267
4 1 x NVIDIA RTX A6000 (46068 MiB) 61.5241 22.8456
4 1 x NVIDIA GeForce RTX 3090 (24576 MiB) 90.5194 14.9456

Number of GPUs: 2

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 32.1395 40.3871
16 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 39.9269 32.248
16 2 x NVIDIA RTX A6000 (46068 MiB) 47.4105 28.8472
16 2 x NVIDIA GeForce RTX 3090 (24576 MiB) 71.4808 17.7518
8 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 94.9813 9.03765
8 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 178.2 7.55443
8 2 x NVIDIA RTX A6000 (46068 MiB) 152.544 6.43862
8 2 x NVIDIA GeForce RTX 3090 (24576 MiB) 186.884 4.35012
4 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 43.235 32.0566
4 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 57.0808 22.6791
4 2 x NVIDIA RTX A6000 (46068 MiB) 64.6442 21.972
4 2 x NVIDIA GeForce RTX 3090 (24576 MiB) 94.5099 14.6162

Number of GPUs: 4

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 42.3398 30.2181
16 4 x NVIDIA RTX A6000 (46068 MiB) 49.089 27.7344
8 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 180.534 7.53804
8 4 x NVIDIA RTX A6000 (46068 MiB) 153.411 6.46469
4 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 58.6287 21.9123
4 4 x NVIDIA RTX A6000 (46068 MiB) 66.4926 21.409

Number of GPUs: 8

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 40.4986 30.5489
8 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 186.713 7.23498
4 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 60.1828 21.9172

Model: h2oai/h2ogpt-4096-llama2-13b-chat (transformers)

Number of GPUs: 1

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 52.4984 26.2487
16 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 49.7972 24.9301
16 1 x NVIDIA RTX A6000 (46068 MiB) 71.9114 18.4362
16 1 x NVIDIA GeForce RTX 3090 (24576 MiB) nan nan OOM
16 1 x NVIDIA GeForce RTX 4090 (24564 MiB) nan nan OOM
8 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 168.967 7.67522
8 1 x NVIDIA GeForce RTX 4090 (24564 MiB) 185.442 6.0205
8 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 174.458 5.69269
8 1 x NVIDIA RTX A6000 (46068 MiB) 193.993 5.56359
8 1 x NVIDIA GeForce RTX 3090 (24576 MiB) 280.467 3.75936
4 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 45.3051 20.4771
4 1 x NVIDIA GeForce RTX 4090 (24564 MiB) 68.0646 16.1241
4 1 x NVIDIA RTX A6000 (46068 MiB) 81.1389 15.6933
4 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 74.271 15.0868
4 1 x NVIDIA GeForce RTX 3090 (24576 MiB) 96.6189 9.77255

Number of GPUs: 2

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 51.6428 26.1842
16 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 51.299 24.8757
16 2 x NVIDIA RTX A6000 (46068 MiB) 72.8565 18.2039
16 2 x NVIDIA GeForce RTX 3090 (24576 MiB) 89.5996 12.8295
8 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 167.523 7.82793
8 2 x NVIDIA RTX A6000 (46068 MiB) 195.929 5.51238
8 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 180.781 5.43787
8 2 x NVIDIA GeForce RTX 3090 (24576 MiB) 280.831 3.72157
4 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 47.1425 19.9791
4 2 x NVIDIA RTX A6000 (46068 MiB) 84.5776 15.1326
4 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 79.9461 14.3455
4 2 x NVIDIA GeForce RTX 3090 (24576 MiB) 98.4705 9.68779

Number of GPUs: 4

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 55.3779 21.7073
16 4 x NVIDIA RTX A6000 (46068 MiB) 74.4377 17.8537
8 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 179.505 5.45185
8 4 x NVIDIA RTX A6000 (46068 MiB) 199.799 5.39725
4 4 x NVIDIA RTX A6000 (46068 MiB) 87.6579 14.6779
4 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 78.9061 14.6754

Number of GPUs: 8

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 55.3965 22.302
8 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 185.328 5.38647
4 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 83.0479 13.969

Model: h2oai/h2ogpt-4096-llama2-70b-chat (transformers)

Number of GPUs: 1

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) nan nan OOM
16 1 x NVIDIA GeForce RTX 3090 (24576 MiB) nan nan OOM
16 1 x NVIDIA A100-SXM4-80GB (81920 MiB) nan nan OOM
16 1 x NVIDIA RTX A6000 (46068 MiB) nan nan OOM
8 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) nan nan OOM
8 1 x NVIDIA GeForce RTX 3090 (24576 MiB) nan nan OOM
8 1 x NVIDIA RTX A6000 (46068 MiB) nan nan OOM
4 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 122.132 10.6495
4 1 x NVIDIA RTX A6000 (46068 MiB) 165.058 6.94248
4 1 x NVIDIA GeForce RTX 3090 (24576 MiB) nan nan OOM

Number of GPUs: 2

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 2 x NVIDIA RTX A6000 (46068 MiB) nan nan OOM
8 2 x NVIDIA RTX A6000 (46068 MiB) 410.069 2.25687
4 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 120.538 10.5008
4 2 x NVIDIA RTX A6000 (46068 MiB) 171.744 6.71342
4 2 x NVIDIA GeForce RTX 3090 (24576 MiB) nan nan OOM

Number of GPUs: 4

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 4 x NVIDIA RTX A6000 (46068 MiB) 267.056 4.24242
8 4 x NVIDIA RTX A6000 (46068 MiB) 413.957 2.22551
4 4 x NVIDIA RTX A6000 (46068 MiB) 175.491 6.5798

Backend: text-generation-inference

For Interactive visualization of the results, save the linked file as html on your machine and open it in a browser.

Model: h2oai/h2ogpt-4096-llama2-7b-chat (text-generation-inference)

Number of GPUs: 1

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 39.0155 55.2139
16 1 x NVIDIA GeForce RTX 3090 (24576 MiB) 29.129 45.9535
16 1 x NVIDIA GeForce RTX 4090 (24564 MiB) 24.3988 44.5878
16 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 39.2697 30.3068
16 1 x NVIDIA RTX A6000 (46068 MiB) 40.3622 29.9724

Number of GPUs: 2

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 7.63612 71.7881
16 2 x NVIDIA RTX A6000 (46068 MiB) 41.0461 30.3726
16 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 41.0245 29.36

Number of GPUs: 4

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 4 x NVIDIA RTX A6000 (46068 MiB) 42.8377 29.388
16 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 41.0995 28.4403

Number of GPUs: 8

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 42.8594 27.8644

Model: h2oai/h2ogpt-4096-llama2-13b-chat (text-generation-inference)

Number of GPUs: 1

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 1 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 21.7823 33.7132
16 1 x NVIDIA A100-SXM4-80GB (81920 MiB) 51.8428 19.083
16 1 x NVIDIA GeForce RTX 3090 (24576 MiB) nan nan OOM
16 1 x NVIDIA RTX A6000 (46068 MiB) nan nan OOM

Number of GPUs: 2

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 2 x NVIDIA RTX 6000 Ada Generation (49140 MiB) 10.8242 57.8237
16 2 x NVIDIA GeForce RTX 3090 (24576 MiB) 42.2111 31.4247
16 2 x NVIDIA A100-SXM4-80GB (81920 MiB) 53.3837 22.223
16 2 x NVIDIA RTX A6000 (46068 MiB) 64.782 21.3549

Number of GPUs: 4

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 52.7912 21.3862
16 4 x NVIDIA RTX A6000 (46068 MiB) 66.5247 20.777

Number of GPUs: 8

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 56.3847 20.3764

Model: h2oai/h2ogpt-4096-llama2-70b-chat (text-generation-inference)

Number of GPUs: 4

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 4 x NVIDIA A100-SXM4-80GB (81920 MiB) 131.453 9.61851
16 4 x NVIDIA RTX A6000 (46068 MiB) nan nan OOM

Number of GPUs: 8

bits gpus summarization time [sec] generation speed [tokens/sec] exception
16 8 x NVIDIA A100-SXM4-80GB (81920 MiB) 133.53 9.53011