comaniac
/

Meta-Llama-3-70B-Instruct-FP8-v1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

comaniac commited on May 26, 2024

Commit

3e9d5ca

·

verified ·

1 Parent(s): 41b0326

Create README.md

Files changed (1) hide show

README.md +11 -0

README.md ADDED Viewed

	@@ -0,0 +1,11 @@

+## Llama-3-70B-Instruct-FP8-v1
+* Weights and activations are per-tensor quantized to float8_e4m3.
+* Quantization with AutoFP8.
+* Calibration dataset: Ultrachat (mgoin/ultrachat_2k)
+* Samples: 1024
+* Sequence length: 4096
+## Evaluation
+TBA