opencsg
/

opencsg-phi-2-v0.1

@@ -42,8 +42,8 @@ This is the repository for the base 2.7B version finetuned based on [phi-2](http
 | Model Size    | Base Model                                                                    |
 | --- | ----------------------------------------------------------------------------- |
-| phi-2 | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1)    |
 ## Model Eval
@@ -55,7 +55,7 @@ It is impratical for us to manually set specific configurations for each fine-tu
 Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
 To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
-**For fairness, we evaluated the original and fine-tuned CodeLlama models based only on the prompts from the original cases, without including any other instructions.**
 **Besides, we use the greedy decoding method for each model during evaluation.**
@@ -146,7 +146,8 @@ opencsg-phi-2-v0.1是是一系列基于phi-2的通过全参数微调方法进行
 | 模型大小    | 基座模型                                                                    |
 | --- | ----------------------------------------------------------------------------- |
-| phi-2 | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1)    |
 ## 模型评估
@@ -158,7 +159,7 @@ HumanEval 是评估模型在代码生成方面性能的最常见的基准，尤
 因此，OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
 方便起见，我们选择了Python语言Pass@1指标，但要注意的是，我们的微调数据集是包含多种编程语言。
-**为了公平起见，我们仅根据原始问题的提示来评估原始和微调过的 CodeLlama 模型，不包含任何其他说明。**
 **除此之外，我们在评估过程中对每个模型都使用贪婪解码方法。**

 | Model Size    | Base Model                                                                    |
 | --- | ----------------------------------------------------------------------------- |
+| Opencsg-phi-2-2.7B | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1)    |
+| stable-coder-3b-v1-3B |[opencsg/Opencsg-stable-coder-3b-v1](https://huggingface.co/opencsg/opencsg-stable-code-3b-v1)
 ## Model Eval
 Therefore, OpenCSG racked their brains to provide a relatively fair method to compare the fine-tuned models on the HumanEval benchmark.
 To simplify the comparison, we chosed the Pass@1 metric for the Python language, but our fine-tuning dataset includes samples in multiple languages.
+**For fairness, we evaluated the original and fine-tuned phi-2 models based only on the prompts from the original cases, without including any other instructions.**
 **Besides, we use the greedy decoding method for each model during evaluation.**
 | 模型大小    | 基座模型                                                                    |
 | --- | ----------------------------------------------------------------------------- |
+| Opencsg-phi-2-2.7B | [opencsg/Opencsg-phi-2-v0.1](https://huggingface.co/opencsg/opencsg-phi-2-v0.1)    |
+| stable-coder-3b-v1-3B |[opencsg/Opencsg-stable-coder-3b-v1](https://huggingface.co/opencsg/opencsg-stable-code-3b-v1)
 ## 模型评估
 因此，OpenCSG 提供了一个相对公平的方法来在 HumanEval 基准上比较各微调模型。
 方便起见，我们选择了Python语言Pass@1指标，但要注意的是，我们的微调数据集是包含多种编程语言。
+**为了公平起见，我们仅根据原始问题的提示来评估原始和微调过的 phi-2 模型，不包含任何其他说明。**
 **除此之外，我们在评估过程中对每个模型都使用贪婪解码方法。**