feihu.hf commited on
Commit
357b899
1 Parent(s): 253f7f9

update README.md

Browse files
Files changed (2) hide show
  1. README.md +8 -31
  2. config.json +1 -1
README.md CHANGED
@@ -20,12 +20,11 @@ tags:
20
 
21
  ## Introduction
22
 
23
- Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. All of these models follows the Apache License (except for the 3B); Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
24
 
25
- - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source coderLLM, with its coding abilities matching those of GPT-4o.
26
  - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
27
- - **Long-context Support** up to 128K tokens.
28
-
29
  **This repo contains the instruction-tuned 1.5B Qwen2.5-Coder model**, which has the following features:
30
  - Type: Causal Language Models
31
  - Training Stage: Pretraining & Post-training
@@ -34,8 +33,7 @@ Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (
34
  - Number of Paramaters (Non-Embedding): 1.31B
35
  - Number of Layers: 28
36
  - Number of Attention Heads (GQA): 12 for Q and 2 for KV
37
- - Context Length: Full 131,072 tokens
38
- - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
39
 
40
  For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), [Documentation](https://qwen.readthedocs.io/en/latest/), [Arxiv](https://arxiv.org/abs/2409.12186).
41
 
@@ -87,27 +85,6 @@ generated_ids = [
87
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
88
  ```
89
 
90
- ### Processing Long Texts
91
-
92
- The current `config.json` is set for context length up to 32,768 tokens.
93
- To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
94
-
95
- For supported frameworks, you could add the following to `config.json` to enable YaRN:
96
- ```json
97
- {
98
- ...,
99
- "rope_scaling": {
100
- "factor": 4.0,
101
- "original_max_position_embeddings": 32768,
102
- "type": "yarn"
103
- }
104
- }
105
- ```
106
-
107
- For deployment, we recommend using vLLM.
108
- Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
109
- Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
110
- We advise adding the `rope_scaling` configuration only when processing long contexts is required.
111
 
112
  ## Evaluation & Performance
113
 
@@ -121,10 +98,10 @@ If you find our work helpful, feel free to give us a cite.
121
 
122
  ```
123
  @article{hui2024qwen2,
124
- title={Qwen2. 5-Coder Technical Report},
125
- author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others},
126
- journal={arXiv preprint arXiv:2409.12186},
127
- year={2024}
128
  }
129
  @article{qwen2,
130
  title={Qwen2 Technical Report},
 
20
 
21
  ## Introduction
22
 
23
+ Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
24
 
25
+ - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. Qwen2.5-Coder-32B has become the current state-of-the-art open-source codeLLM, with its coding abilities matching those of GPT-4o.
26
  - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.
27
+
 
28
  **This repo contains the instruction-tuned 1.5B Qwen2.5-Coder model**, which has the following features:
29
  - Type: Causal Language Models
30
  - Training Stage: Pretraining & Post-training
 
33
  - Number of Paramaters (Non-Embedding): 1.31B
34
  - Number of Layers: 28
35
  - Number of Attention Heads (GQA): 12 for Q and 2 for KV
36
+ - Context Length: Full 32,768 tokens
 
37
 
38
  For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), [Documentation](https://qwen.readthedocs.io/en/latest/), [Arxiv](https://arxiv.org/abs/2409.12186).
39
 
 
85
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
86
  ```
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
  ## Evaluation & Performance
90
 
 
98
 
99
  ```
100
  @article{hui2024qwen2,
101
+ title={Qwen2. 5-Coder Technical Report},
102
+ author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Dang, Kai and others},
103
+ journal={arXiv preprint arXiv:2409.12186},
104
+ year={2024}
105
  }
106
  @article{qwen2,
107
  title={Qwen2 Technical Report},
config.json CHANGED
@@ -17,7 +17,7 @@
17
  "num_key_value_heads": 2,
18
  "rms_norm_eps": 1e-06,
19
  "rope_theta": 1000000.0,
20
- "sliding_window": 131072,
21
  "tie_word_embeddings": true,
22
  "torch_dtype": "bfloat16",
23
  "transformers_version": "4.44.0",
 
17
  "num_key_value_heads": 2,
18
  "rms_norm_eps": 1e-06,
19
  "rope_theta": 1000000.0,
20
+ "sliding_window": 32768,
21
  "tie_word_embeddings": true,
22
  "torch_dtype": "bfloat16",
23
  "transformers_version": "4.44.0",