twhoool02
/

Llama-2-7b-hf-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

twhoool02 commited on Mar 3

Commit

35b5b0e

•

1 Parent(s): 2742689

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +1 -14

README.md CHANGED Viewed

@@ -8,20 +8,7 @@ tags:
 - llama-2
 - llama
 base_model: meta-llama/Llama-2-7b-hf
-model_name: "LlamaAWQForCausalLM(\n  (model): LlamaForCausalLM(\n    (model): LlamaLikeModel(\n\
-  \      (embedding): Embedding(32000, 4096)\n      (blocks): ModuleList(\n      \
-  \  (0-31): 32 x LlamaLikeBlock(\n          (norm_1): FasterTransformerRMSNorm()\n\
-  \          (attn): QuantAttentionFused(\n            (qkv_proj): WQLinear_GEMM(in_features=4096,\
-  \ out_features=12288, bias=False, w_bit=4, group_size=128)\n            (o_proj):\
-  \ WQLinear_GEMM(in_features=4096, out_features=4096, bias=False, w_bit=4, group_size=128)\n\
-  \            (rope): RoPE()\n          )\n          (norm_2): FasterTransformerRMSNorm()\n\
-  \          (mlp): LlamaMLP(\n            (gate_proj): WQLinear_GEMM(in_features=4096,\
-  \ out_features=11008, bias=False, w_bit=4, group_size=128)\n            (up_proj):\
-  \ WQLinear_GEMM(in_features=4096, out_features=11008, bias=False, w_bit=4, group_size=128)\n\
-  \            (down_proj): WQLinear_GEMM(in_features=11008, out_features=4096, bias=False,\
-  \ w_bit=4, group_size=128)\n            (act_fn): SiLU()\n          )\n        )\n\
-  \      )\n      (norm): LlamaRMSNorm()\n    )\n    (lm_head): Linear(in_features=4096,\
-  \ out_features=32000, bias=False)\n  )\n)"
 library:
 - Transformers
 - AWQ

 - llama-2
 - llama
 base_model: meta-llama/Llama-2-7b-hf
+model_name: Llama-2-7b-hf-AWQ
 library:
 - Transformers
 - AWQ