Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,17 @@ tags:
|
|
6 |
model-index:
|
7 |
- name: deberta-v3-xsmall-quality
|
8 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
# English Text Quality Classifier
|
@@ -27,7 +38,7 @@ The **deberta-v3-xsmall-quality** model is designed to evaluate text quality by
|
|
27 |
|
28 |
## Training and Evaluation Data
|
29 |
|
30 |
-
The model was trained on
|
31 |
|
32 |
1. **allenai/c4**
|
33 |
2. **HuggingFaceFW/fineweb-edu**
|
@@ -37,6 +48,8 @@ The model was trained on a dataset comprising **100,000 sentences** sourced from
|
|
37 |
|
38 |
This diverse dataset enables the model to generalize well across different text types and domains.
|
39 |
|
|
|
|
|
40 |
## How to use
|
41 |
|
42 |
```python
|
|
|
6 |
model-index:
|
7 |
- name: deberta-v3-xsmall-quality
|
8 |
results: []
|
9 |
+
license: mit
|
10 |
+
datasets:
|
11 |
+
- agentlans/text-quality
|
12 |
+
- allenai/c4
|
13 |
+
- HuggingFaceFW/fineweb-edu
|
14 |
+
- monology/pile-uncopyrighted
|
15 |
+
- agentlans/common-crawl-sample
|
16 |
+
- agentlans/wikipedia-paragraphs
|
17 |
+
language:
|
18 |
+
- en
|
19 |
+
pipeline_tag: text-classification
|
20 |
---
|
21 |
|
22 |
# English Text Quality Classifier
|
|
|
38 |
|
39 |
## Training and Evaluation Data
|
40 |
|
41 |
+
The model was trained on the [agentlans/text-quality](https://huggingface.co/datasets/agentlans/text-quality) dataset comprising **100,000 sentences** sourced from five distinct datasets, with **20,000 sentences** drawn from each of the following:
|
42 |
|
43 |
1. **allenai/c4**
|
44 |
2. **HuggingFaceFW/fineweb-edu**
|
|
|
48 |
|
49 |
This diverse dataset enables the model to generalize well across different text types and domains.
|
50 |
|
51 |
+
90% of the rows were used for training and the remaining 10% for evaluation.
|
52 |
+
|
53 |
## How to use
|
54 |
|
55 |
```python
|