casperhansen commited on
Commit
19fe3d1
1 Parent(s): 364bcfc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -10,13 +10,16 @@ Original model can be found at [https://huggingface.co/mosaicml/mpt-7b-8k-chat](
10
 
11
  ## ⚡ 4-bit Inference Speed
12
 
13
- This was tested on RunPod. Speed varies across machines, I have not been able to reproduce 117 tokens/s consistently on a 4090 yet.
14
 
15
  H100:
16
  - CUDA 12.0, Driver 525.105.17: 92 tokens/s (10.82 ms/token)
17
 
18
- RTX 4090 (4 different VMs):
 
19
  - CUDA 12.0, Driver 525.125.06: 117 tokens/s (8.52 ms/token)
 
 
20
  - CUDA 12.2, Driver 535.54.03: 53 tokens/s (18.6 ms/token)
21
  - CUDA 12.2, Driver 535.54.03: 56 tokens/s (17.71 ms/token)
22
  - CUDA 12.0, Driver 525.125.06: 55 tokens/ (18.15 ms/token)
 
10
 
11
  ## ⚡ 4-bit Inference Speed
12
 
13
+ Machines rented from RunPod - speed may vary dependent on both GPU/CPU.
14
 
15
  H100:
16
  - CUDA 12.0, Driver 525.105.17: 92 tokens/s (10.82 ms/token)
17
 
18
+ RTX 4090 + Intel i9 13900K (2 different VMs):
19
+ - CUDA 12.0, Driver 525.125.06: 134 tokens/s (7.46 ms/token)
20
  - CUDA 12.0, Driver 525.125.06: 117 tokens/s (8.52 ms/token)
21
+
22
+ RTX 4090 + AMD EPYC 7-Series (2 different VMs):
23
  - CUDA 12.2, Driver 535.54.03: 53 tokens/s (18.6 ms/token)
24
  - CUDA 12.2, Driver 535.54.03: 56 tokens/s (17.71 ms/token)
25
  - CUDA 12.0, Driver 525.125.06: 55 tokens/ (18.15 ms/token)