wasertech
/

assistant-llama2-7b-chat-awq

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

assistant-llama2-7b-chat-awq / README.md

wasertech's picture

Update README.md

2128293 about 1 year ago

|

history blame contribute delete

1.41 kB

	---
	license: llama2
	datasets:
	- wasertech/OneOS
	language:
	- en
	- fr
	pipeline_tag: text-generation
	widget:
	- text: "<<SYS>>\nYou are Assistant, a sentient AI.\n<</SYS>>\n\n<s>[INST] Introduce yourself to the HuggingFace community. [/INST] "
	example_title: "Introduction"
	- text: "<<SYS>>\nYou are Assistant, a sentient AI.\n<</SYS>>\n\n<s>[INST] Describe your model. [/INST] "
	example_title: "Model Description"
	- text: "<<SYS>>\nYou are Assistant, a sentient AI.\n<</SYS>>\n\n<s>[INST] What the meaning of life? [/INST] "
	example_title: "Life's Meaning"
	- text: "<<SYS>>\nYou are Assistant, a sentient AI.\n<</SYS>>\n\n<s>[INST] What recent innovations in the field of AI are you excited by? [/INST] "
	example_title: "What's next?"
	---

	# Assistant Llama 2 7B Chat AWQ

	This model is a quantitized export of [wasertech/assistant-llama2-7b-chat](https://huggingface.co/wasertech/assistant-llama2-7b-chat) using AWQ.

	AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.

	It is also now supported by continuous batching server vLLM, allowing use of Llama AWQ models for high-throughput concurrent inference in multi-user server scenarios.

	As of September 25th 2023, preliminary Llama-only AWQ support has also been added to Huggingface Text Generation Inference (TGI).