ryantwolf commited on
Commit
fa229c5
1 Parent(s): 18b498d

Push model using huggingface_hub.

Browse files
Files changed (2) hide show
  1. README.md +6 -58
  2. config.json +28 -0
README.md CHANGED
@@ -1,61 +1,9 @@
1
  ---
2
- license: apache-2.0
 
 
3
  ---
4
 
5
- # Model Overview
6
- This is a text classification model to classify documents into one of 26 domain classes:
7
-
8
- 'Adult', 'Arts_and_Entertainment', 'Autos_and_Vehicles', 'Beauty_and_Fitness', 'Books_and_Literature', 'Business_and_Industrial', 'Computers_and_Electronics', 'Finance', 'Food_and_Drink', 'Games', 'Health', 'Hobbies_and_Leisure', 'Home_and_Garden', 'Internet_and_Telecom', 'Jobs_and_Education', 'Law_and_Government', 'News', 'Online_Communities', 'People_and_Society', 'Pets_and_Animals', 'Real_Estate', 'Science', 'Sensitive_Subjects', 'Shopping', 'Sports', 'Travel_and_Transportation'
9
-
10
- # Model Architecture
11
- The model architecture is Deberta V3 Base
12
-
13
- Context length is 512 tokens
14
-
15
- # Training (details)
16
- ## Training data:
17
- - 1 million Common Crawl samples, labeled using Google Cloud’s Natural Language API: https://cloud.google.com/natural-language/docs/classifying-text
18
- - 500k Wikepedia articles, curated using Wikipedia-API: https://pypi.org/project/Wikipedia-API/
19
-
20
- ## Training steps:
21
- Model was trained in multiple rounds using Wikipedia and Common Crawl data, labeled by a combination of pseudo labels and Google Cloud API.
22
-
23
- # How To Use This Model
24
-
25
- ## Input
26
- The model takes one or several paragraphs of text as input.
27
-
28
- Example input:
29
- ```
30
- q Directions
31
-
32
- 1. Mix 2 flours and baking powder together
33
- 2. Mix water and egg in a separate bowl. Add dry to wet little by little
34
- 3. Heat frying pan on medium
35
- 4. Pour batter into pan and then put blueberries on top before flipping
36
- 5. Top with desired toppings!
37
- ```
38
-
39
- ## Output
40
- The model outputs one of the 26 domain classes as the predicted domain for each input sample.
41
-
42
- Example output:
43
- ```
44
- Food_and_Drink
45
- ```
46
-
47
- # Evaluation Benchmarks
48
- Accuracy on 500 human annotated samples
49
- - Google API 77.5%
50
- - Our model 77.9%
51
-
52
- PR-AUC score on evaluation set with 105k samples
53
- - 0.9873
54
-
55
- # References
56
- https://arxiv.org/abs/2111.09543
57
- https://github.com/microsoft/DeBERTa
58
-
59
-
60
- # License
61
- License to use this model is covered by the Apache 2.0. By downloading the public and release version of the model, you accept the terms and conditions of the Apache License 2.0.
 
1
  ---
2
+ tags:
3
+ - pytorch_model_hub_mixin
4
+ - model_hub_mixin
5
  ---
6
 
7
+ This model has been pushed to the Hub using ****:
8
+ - Repo: [More Information Needed]
9
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -2,5 +2,33 @@
2
  "base_model": "microsoft/deberta-v3-base",
3
  "config_path": null,
4
  "fc_dropout": 0.2,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  "pretrained": true
6
  }
 
2
  "base_model": "microsoft/deberta-v3-base",
3
  "config_path": null,
4
  "fc_dropout": 0.2,
5
+ "label2id": {
6
+ "Adult": 0,
7
+ "Arts_and_Entertainment": 1,
8
+ "Autos_and_Vehicles": 2,
9
+ "Beauty_and_Fitness": 3,
10
+ "Books_and_Literature": 4,
11
+ "Business_and_Industrial": 5,
12
+ "Computers_and_Electronics": 6,
13
+ "Finance": 7,
14
+ "Food_and_Drink": 8,
15
+ "Games": 9,
16
+ "Health": 10,
17
+ "Hobbies_and_Leisure": 11,
18
+ "Home_and_Garden": 12,
19
+ "Internet_and_Telecom": 13,
20
+ "Jobs_and_Education": 14,
21
+ "Law_and_Government": 15,
22
+ "News": 16,
23
+ "Online_Communities": 17,
24
+ "People_and_Society": 18,
25
+ "Pets_and_Animals": 19,
26
+ "Real_Estate": 20,
27
+ "Science": 21,
28
+ "Sensitive_Subjects": 22,
29
+ "Shopping": 23,
30
+ "Sports": 24,
31
+ "Travel_and_Transportation": 25
32
+ },
33
  "pretrained": true
34
  }