agentlans commited on
Commit
e45cd6c
1 Parent(s): 08d031a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -6,6 +6,17 @@ tags:
6
  model-index:
7
  - name: deberta-v3-xsmall-quality
8
  results: []
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  # English Text Quality Classifier
@@ -27,7 +38,7 @@ The **deberta-v3-xsmall-quality** model is designed to evaluate text quality by
27
 
28
  ## Training and Evaluation Data
29
 
30
- The model was trained on a dataset comprising **100,000 sentences** sourced from five distinct datasets, with **20,000 sentences** drawn from each of the following:
31
 
32
  1. **allenai/c4**
33
  2. **HuggingFaceFW/fineweb-edu**
@@ -37,6 +48,8 @@ The model was trained on a dataset comprising **100,000 sentences** sourced from
37
 
38
  This diverse dataset enables the model to generalize well across different text types and domains.
39
 
 
 
40
  ## How to use
41
 
42
  ```python
 
6
  model-index:
7
  - name: deberta-v3-xsmall-quality
8
  results: []
9
+ license: mit
10
+ datasets:
11
+ - agentlans/text-quality
12
+ - allenai/c4
13
+ - HuggingFaceFW/fineweb-edu
14
+ - monology/pile-uncopyrighted
15
+ - agentlans/common-crawl-sample
16
+ - agentlans/wikipedia-paragraphs
17
+ language:
18
+ - en
19
+ pipeline_tag: text-classification
20
  ---
21
 
22
  # English Text Quality Classifier
 
38
 
39
  ## Training and Evaluation Data
40
 
41
+ The model was trained on the [agentlans/text-quality](https://huggingface.co/datasets/agentlans/text-quality) dataset comprising **100,000 sentences** sourced from five distinct datasets, with **20,000 sentences** drawn from each of the following:
42
 
43
  1. **allenai/c4**
44
  2. **HuggingFaceFW/fineweb-edu**
 
48
 
49
  This diverse dataset enables the model to generalize well across different text types and domains.
50
 
51
+ 90% of the rows were used for training and the remaining 10% for evaluation.
52
+
53
  ## How to use
54
 
55
  ```python