Qwen
/

Qwen2-7B-Instruct-GPTQ-Int8

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

JustinLin610 commited on Jun 9

Commit

8dfcd82

•

1 Parent(s): 22d9ef9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Qwen2-7B-Instruct-GPTQ-Int8 supports a context length of up to 131,072 tokens, e
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/), [GitHub](https://github.com/QwenLM/Qwen2), and [Documentation](https://qwen.readthedocs.io/en/latest/).
-**Note**: If you encounter ``RuntimeError: probability tensor contains either `inf`, `nan` or element < 0`` during inference with ``transformer``, we recommand [deploying this model with vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html).
 <br>
 ## Model Details

 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/), [GitHub](https://github.com/QwenLM/Qwen2), and [Documentation](https://qwen.readthedocs.io/en/latest/).
+**Note**: If you encounter ``RuntimeError: probability tensor contains either `inf`, `nan` or element < 0`` during inference with ``transformers``, we recommand [deploying this model with vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html).
 <br>
 ## Model Details