jeffra commited on
Commit
354026b
·
verified ·
1 Parent(s): 16f87eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -15,10 +15,18 @@ For more details about SwiftKV and how to use it:
15
 
16
  ## Performance Metrics
17
 
18
- Combined input and output throughput for Llama 3.1 405B across a range of input lengths.
 
 
 
 
 
 
 
19
  <img src="figure-4.png" alt="performance plot of llama-405B w. swiftkv" width="400">
20
- Legend: blue - baseline FP8, pink - SwiftKV FP8<br>
21
 
 
 
22
 
23
 
24
  ## Eval Metrics
 
15
 
16
  ## Performance Metrics
17
 
18
+ To evaluate SwiftKV’s performance, we focus on the following key metrics:
19
+ * Combined throughput: The total number of input and output tokens processed per second. This determines:
20
+ For batch processing, the time required to complete jobs.
21
+ For interactive use, the volume of concurrent requests a system can handle.
22
+ * TTFT: The latency between a user request and receiving the first token in the response.
23
+ * TPOT: The latency between subsequent tokens after the first token.
24
+
25
+ Combined input and output throughput for Llama 3.1 405B across a range of input lengths. Blue is baseline FP8 and Ping is SwiftKV FP8.
26
  <img src="figure-4.png" alt="performance plot of llama-405B w. swiftkv" width="400">
 
27
 
28
+ TTFT (top) and TPOT (bottom) for input lengths 2000 (left), 8000 (middle), and 32000 (right) for Llama 3.1 405B fp8 model. For each experiment, a range of different request arrival rates is simulated. Each request generates 256 output tokens.
29
+ <img src="figure-6.png" alt="performance plot of llama-405B w. swiftkv" width="700">
30
 
31
 
32
  ## Eval Metrics