Update README.md
Browse files
README.md
CHANGED
@@ -42,8 +42,8 @@ This is the repository for the base 2.7B version finetuned based on [phi-2](http
|
|
42 |
|
43 |
| Model Size | Base Model |
|
44 |
| --- | ----------------------------------------------------------------------------- |
|
45 |
-
| phi-2 | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
|
46 |
-
|
47 |
|
48 |
|
49 |
## Model Eval
|
@@ -55,7 +55,7 @@ It is impratical for us to manually set specific configurations for each fine-tu
|
|
55 |
Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
|
56 |
To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
|
57 |
|
58 |
-
**For fairness, we evaluated the original and fine-tuned
|
59 |
|
60 |
**Besides, we use the greedy decoding method for each model during evaluation.**
|
61 |
|
@@ -146,7 +146,8 @@ opencsg-phi-2-v0.1是是一系列基于phi-2的通过全参数微调方法进行
|
|
146 |
|
147 |
| 模型大小 | 基座模型 |
|
148 |
| --- | ----------------------------------------------------------------------------- |
|
149 |
-
| phi-2 | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
|
|
|
150 |
|
151 |
|
152 |
## 模型评估
|
@@ -158,7 +159,7 @@ HumanEval 是评估模型在代码生成方面性能的最常见的基准,尤
|
|
158 |
因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
|
159 |
方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
|
160 |
|
161 |
-
**为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的
|
162 |
|
163 |
**除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
|
164 |
|
|
|
42 |
|
43 |
| Model Size | Base Model |
|
44 |
| --- | ----------------------------------------------------------------------------- |
|
45 |
+
| Opencsg-phi-2-2.7B | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
|
46 |
+
| stable-coder-3b-v1-3B |[opencsg/Opencsg-stable-coder-3b-v1](https://huggingface.co/opencsg/opencsg-stable-code-3b-v1)
|
47 |
|
48 |
|
49 |
## Model Eval
|
|
|
55 |
Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
|
56 |
To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
|
57 |
|
58 |
+
**For fairness, we evaluated the original and fine-tuned phi-2 models based only on the prompts from the original cases, without including any other instructions.**
|
59 |
|
60 |
**Besides, we use the greedy decoding method for each model during evaluation.**
|
61 |
|
|
|
146 |
|
147 |
| 模型大小 | 基座模型 |
|
148 |
| --- | ----------------------------------------------------------------------------- |
|
149 |
+
| Opencsg-phi-2-2.7B | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1) |
|
150 |
+
| stable-coder-3b-v1-3B |[opencsg/Opencsg-stable-coder-3b-v1](https://huggingface.co/opencsg/opencsg-stable-code-3b-v1)
|
151 |
|
152 |
|
153 |
## 模型评估
|
|
|
159 |
因此,OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
|
160 |
方便起见,我们选择了Python语言Pass@1指标,但要注意的是,我们的微调数据集是包含多种编程语言。
|
161 |
|
162 |
+
**为了公平起见,我们仅根据原始问题的提示来评估原始和微调过的 phi-2 模型,不包含任何其他说明。**
|
163 |
|
164 |
**除此之外,我们在评估过程中对每个模型都使用贪婪解码方法。**
|
165 |
|