Snowflake
/

Llama-3.1-SwiftKV-8B-Instruct-FP8

compressed-tensors

Model card Files Files and versions Community

jeffra commited on Dec 5, 2024

Commit

b3472fb

·

verified ·

1 Parent(s): 354026b

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -15,10 +15,10 @@ For more details about SwiftKV and how to use it:
 ## Performance Metrics
-To evaluate SwiftKV’s performance, we focus on the following key metrics:
 * Combined throughput: The total number of input and output tokens processed per second. This determines:
-For batch processing, the time required to complete jobs.
-For interactive use, the volume of concurrent requests a system can handle.
 * TTFT: The latency between a user request and receiving the first token in the response.
 * TPOT: The latency between subsequent tokens after the first token.

 ## Performance Metrics
+To evaluate SwiftKV’s performance, we focus on the following key metrics (see more details in our [blog](https://www.snowflake.com/engineering-blog/swiftkv-llm-compute-reduction/)):
 * Combined throughput: The total number of input and output tokens processed per second. This determines:
+  * For batch processing, the time required to complete jobs.
+  * For interactive use, the volume of concurrent requests a system can handle.
 * TTFT: The latency between a user request and receiving the first token in the response.
 * TPOT: The latency between subsequent tokens after the first token.