BAAI
/

Safetensors
xlm-roberta
MonteXiaofeng commited on
Commit
d4f3266
·
verified ·
1 Parent(s): 8f748de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -4,7 +4,7 @@ base_model:
4
  - BAAI/bge-m3
5
  ---
6
 
7
- 本模型为数据集[BAAI/IndustryCorpus2](https://huggingface.co/datasets/BAAI/IndustryCorpus2)的质量评估模型,对预训练语料进行质量评估。
8
 
9
  ## 为什么要筛选低质量的数据
10
 
@@ -25,7 +25,7 @@ base_model:
25
 
26
  数据规模:20k打分数据,中英文比例1:1
27
 
28
- 数据打分prompt
29
 
30
  ```
31
  quality_prompt = """Below is an extract from a web page. Evaluate whether the page has a high natural language value and could be useful in an naturanl language task to train a good language model using the additive 5-point scoring system described below. Points are accumulated based on the satisfaction of each criterion:
 
4
  - BAAI/bge-m3
5
  ---
6
 
7
+ 本模型为数据集[BAAI/IndustryCorpus2](https://huggingface.co/datasets/BAAI/IndustryCorpus2)的质量评估模型,用于从语义一致性,信息密度,教育属性等维度评估预训练数据的质量,,对预训练语料进行质量评估。
8
 
9
  ## 为什么要筛选低质量的数据
10
 
 
25
 
26
  数据规模:20k打分数据,中英文比例1:1
27
 
28
+ **数据prompt**
29
 
30
  ```
31
  quality_prompt = """Below is an extract from a web page. Evaluate whether the page has a high natural language value and could be useful in an naturanl language task to train a good language model using the additive 5-point scoring system described below. Points are accumulated based on the satisfaction of each criterion: