teleprint-me
/

llama-2-7b-chat

Inference Endpoints

Model card Files Files and versions Community

aberrio commited on Sep 28, 2024

Commit

2753db9

·

verified ·

1 Parent(s): d772508

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -24,7 +24,6 @@ The Llama-2 7B Chat GGUF model is an instance of the Llama-2 architecture, devel
 ## Files and Versions
 - **llama-2-7b-chat.GGUF.q4_0.bin**: 4-bit quantized model (3.6 GB)
-- **llama-2-7b-chat.GGUF.q5_0.bin**: 5-bit quantized model (4.4 GB)
 - **llama-2-7b-chat.GGUF.q8_0.bin**: 8-bit quantized model (6.7 GB)
 The model has been converted and quantized using the GGUF format. Conversion was performed using Georgi Gerganov's llama.cpp library, and quantization was accomplished using the llama-cpp-python tool created by Andrei Betlen.

 ## Files and Versions
 - **llama-2-7b-chat.GGUF.q4_0.bin**: 4-bit quantized model (3.6 GB)
 - **llama-2-7b-chat.GGUF.q8_0.bin**: 8-bit quantized model (6.7 GB)
 The model has been converted and quantized using the GGUF format. Conversion was performed using Georgi Gerganov's llama.cpp library, and quantization was accomplished using the llama-cpp-python tool created by Andrei Betlen.