casperhansen
commited on
Commit
•
19fe3d1
1
Parent(s):
364bcfc
Update README.md
Browse files
README.md
CHANGED
@@ -10,13 +10,16 @@ Original model can be found at [https://huggingface.co/mosaicml/mpt-7b-8k-chat](
|
|
10 |
|
11 |
## ⚡ 4-bit Inference Speed
|
12 |
|
13 |
-
|
14 |
|
15 |
H100:
|
16 |
- CUDA 12.0, Driver 525.105.17: 92 tokens/s (10.82 ms/token)
|
17 |
|
18 |
-
RTX 4090 (
|
|
|
19 |
- CUDA 12.0, Driver 525.125.06: 117 tokens/s (8.52 ms/token)
|
|
|
|
|
20 |
- CUDA 12.2, Driver 535.54.03: 53 tokens/s (18.6 ms/token)
|
21 |
- CUDA 12.2, Driver 535.54.03: 56 tokens/s (17.71 ms/token)
|
22 |
- CUDA 12.0, Driver 525.125.06: 55 tokens/ (18.15 ms/token)
|
|
|
10 |
|
11 |
## ⚡ 4-bit Inference Speed
|
12 |
|
13 |
+
Machines rented from RunPod - speed may vary dependent on both GPU/CPU.
|
14 |
|
15 |
H100:
|
16 |
- CUDA 12.0, Driver 525.105.17: 92 tokens/s (10.82 ms/token)
|
17 |
|
18 |
+
RTX 4090 + Intel i9 13900K (2 different VMs):
|
19 |
+
- CUDA 12.0, Driver 525.125.06: 134 tokens/s (7.46 ms/token)
|
20 |
- CUDA 12.0, Driver 525.125.06: 117 tokens/s (8.52 ms/token)
|
21 |
+
|
22 |
+
RTX 4090 + AMD EPYC 7-Series (2 different VMs):
|
23 |
- CUDA 12.2, Driver 535.54.03: 53 tokens/s (18.6 ms/token)
|
24 |
- CUDA 12.2, Driver 535.54.03: 56 tokens/s (17.71 ms/token)
|
25 |
- CUDA 12.0, Driver 525.125.06: 55 tokens/ (18.15 ms/token)
|