rubra-ai
/

Meta-Llama-3-70B-Instruct-AWQ

Text Generation

function-calling

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

sanjay920 commited on Jul 4

Commit

2aa006f

•

1 Parent(s): 9c77c03

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -62,6 +62,11 @@ language:
 Original model: [rubra-ai/Meta-Llama-3-70B-Instruct](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct)
 ## Model description
 The model is the result of further post-training [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.

 Original model: [rubra-ai/Meta-Llama-3-70B-Instruct](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct)
+AWQ quant config:
+```
+quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
+```
 ## Model description
 The model is the result of further post-training [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B). This model is designed for high performance in various instruction-following tasks and complex interactions, including multi-turn function calling and detailed conversations.