llmware
/

llama-2-chat-onnx

Model card Files Files and versions Community

doberst commited on Oct 25, 2024

Commit

a4f5899

·

verified ·

1 Parent(s): 6ea3696

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -10,9 +10,7 @@ tags:
 # llama-2-chat-onnx
-<!-- Provide a quick summary of what the model is/does. -->
-**llama-2-chat-onnx** is an ONNX int4 quantized version of Llama-2-Chat, providing a fast, small inference implementation, optimized for AI PCs using Intel GPU, CPU and NPU.
 [**llama-2-chat**](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) is the official chat finetune of the classic Llama 2 model, one of the most most iconic (and still one the best) 7B instruct trained models.

 # llama-2-chat-onnx
+**llama-2-chat-onnx** is an ONNX int4 quantized version of Llama-2-Chat, providing a fast, small inference implementation, optimized for AI PCs and Windows x86-64 architectures.
 [**llama-2-chat**](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) is the official chat finetune of the classic Llama 2 model, one of the most most iconic (and still one the best) 7B instruct trained models.