Qwen
/

Qwen2-57B-A14B-Instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

hzhwcmhf commited on Jun 5, 2024

Commit

fc2911e

·

verified ·

1 Parent(s): 7f8542c

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ Qwen2 is the new series of Qwen large language models. For Qwen2, we release a n
 Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
-Qwen2-MoE-57B-A14B-Instruct supports a context length of up to 131,072 tokens, enabling the processing of extensive inputs. Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2 for handling long texts.
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/) and [GitHub](https://github.com/QwenLM/Qwen2).
 <br>
@@ -73,7 +73,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ### Processing Long Texts
-To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
 For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
@@ -90,7 +90,7 @@ For deployment, we recommend using vLLM. You can enable the long-context capabil
             // adding the following snippets
             "rope_scaling": {
-                "factor": 4.0,
                 "original_max_position_embeddings": 32768,
                 "type": "yarn"
             }

 Compared with the state-of-the-art opensource language models, including the previous released Qwen1.5, Qwen2 has generally surpassed most opensource models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
+Qwen2-MoE-57B-A14B-Instruct supports a context length of up to 65,536 tokens, enabling the processing of extensive inputs. Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2 for handling long texts.
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/) and [GitHub](https://github.com/QwenLM/Qwen2).
 <br>
 ### Processing Long Texts
+To handle extensive inputs exceeding 65,536 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
 For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
             // adding the following snippets
             "rope_scaling": {
+                "factor": 2.0,
                 "original_max_position_embeddings": 32768,
                 "type": "yarn"
             }