sarahyurick commited on
Commit
752ebca
1 Parent(s): 0d45ab1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -55
README.md CHANGED
@@ -8,64 +8,65 @@ license: other
8
  # Model Overview
9
  This is a multilingual text classification model that can enable data annotation, creation of domain-specific blends and the addition of metadata tags. The model classifies documents into one of 26 domain classes:
10
 
11
- 'Adult', 'Arts_and_Entertainment', 'Autos_and_Vehicles', 'Beauty_and_Fitness', 'Books_and_Literature', 'Business_and_Industrial', 'Computers_and_Electronics', 'Finance', 'Food_and_Drink', 'Games', 'Health', 'Hobbies_and_Leisure', 'Home_and_Garden', 'Internet_and_Telecom', 'Jobs_and_Education', 'Law_and_Government', 'News', 'Online_Communities', 'People_and_Society', 'Pets_and_Animals', 'Real_Estate', 'Science', 'Sensitive_Subjects', 'Shopping', 'Sports', 'Travel_and_Transportation'
12
-
13
- It supports 52 languages (English and 51 other languages) : 'ar', 'az', 'bg', 'bn', 'ca', 'cs', 'da', 'de', 'el', 'es', 'et', 'fa', 'fi', 'fr', 'gl', 'he', 'hi', 'hr', 'hu', 'hy', 'id', 'is', 'it', 'ka', 'kk', 'kn', 'ko', 'lt', 'lv', 'mk', 'ml', 'mr', 'ne', 'nl', 'no', 'pl', 'pt', 'ro', 'ru', 'sk', 'sl', 'sq', 'sr', 'sv', 'ta', 'tr', 'uk', 'ur', 'vi', 'ja', 'zh'
14
  ```
15
- Code Language Name
16
- ar Arabic
17
- az Azerbaijani
18
- bg Bulgarian
19
- bn Bengali
20
- ca Catalan
21
- cs Czech
22
- da Danish
23
- de German
24
- el Greek
25
- es Spanish
26
- et Estonian
27
- fa Persian
28
- fi Finnish
29
- fr French
30
- gl Galician
31
- he Hebrew
32
- hi Hindi
33
- hr Croatian
34
- hu Hungarian
35
- hy Armenian
36
- id Indonesian
37
- is Icelandic
38
- it Italian
39
- ka Georgian
40
- kk Kazakh
41
- kn Kannada
42
- ko Korean
43
- lt Lithuanian
44
- lv Latvian
45
- mk Macedonian
46
- ml Malayalam
47
- mr Marathi
48
- ne Nepali
49
- nl Dutch
50
- no Norwegian
51
- pl Polish
52
- pt Portuguese
53
- ro Romanian
54
- ru Russian
55
- sk Slovak
56
- sl Slovenian
57
- sq Albanian
58
- sr Serbian
59
- sv Swedish
60
- ta Tamil
61
- tr Turkish
62
- uk Ukrainian
63
- ur Urdu
64
- vi Vietnamese
65
- ja Japanese
66
- zh Chinese
67
  ```
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  # License
70
  This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
71
 
@@ -126,6 +127,9 @@ Arts_and_Entertainment
126
  ## Evaluation
127
  - Metric: PR-AUC
128
 
 
 
 
129
  # Inference
130
  - Engine: PyTorch
131
  - Test Hardware: V100
 
8
  # Model Overview
9
  This is a multilingual text classification model that can enable data annotation, creation of domain-specific blends and the addition of metadata tags. The model classifies documents into one of 26 domain classes:
10
 
 
 
 
11
  ```
12
+ 'Adult', 'Arts_and_Entertainment', 'Autos_and_Vehicles', 'Beauty_and_Fitness', 'Books_and_Literature', 'Business_and_Industrial', 'Computers_and_Electronics', 'Finance', 'Food_and_Drink', 'Games', 'Health', 'Hobbies_and_Leisure', 'Home_and_Garden', 'Internet_and_Telecom', 'Jobs_and_Education', 'Law_and_Government', 'News', 'Online_Communities', 'People_and_Society', 'Pets_and_Animals', 'Real_Estate', 'Science', 'Sensitive_Subjects', 'Shopping', 'Sports', 'Travel_and_Transportation'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ```
14
 
15
+ It supports 52 languages (English and 51 other languages):
16
+ | Code | Language Name |
17
+ |------|----------------|
18
+ | ar | Arabic |
19
+ | az | Azerbaijani |
20
+ | bg | Bulgarian |
21
+ | bn | Bengali |
22
+ | ca | Catalan |
23
+ | cs | Czech |
24
+ | da | Danish |
25
+ | de | German |
26
+ | el | Greek |
27
+ | es | Spanish |
28
+ | et | Estonian |
29
+ | fa | Persian |
30
+ | fi | Finnish |
31
+ | fr | French |
32
+ | gl | Galician |
33
+ | he | Hebrew |
34
+ | hi | Hindi |
35
+ | hr | Croatian |
36
+ | hu | Hungarian |
37
+ | hy | Armenian |
38
+ | id | Indonesian |
39
+ | is | Icelandic |
40
+ | it | Italian |
41
+ | ka | Georgian |
42
+ | kk | Kazakh |
43
+ | kn | Kannada |
44
+ | ko | Korean |
45
+ | lt | Lithuanian |
46
+ | lv | Latvian |
47
+ | mk | Macedonian |
48
+ | ml | Malayalam |
49
+ | mr | Marathi |
50
+ | ne | Nepali |
51
+ | nl | Dutch |
52
+ | no | Norwegian |
53
+ | pl | Polish |
54
+ | pt | Portuguese |
55
+ | ro | Romanian |
56
+ | ru | Russian |
57
+ | sk | Slovak |
58
+ | sl | Slovenian |
59
+ | sq | Albanian |
60
+ | sr | Serbian |
61
+ | sv | Swedish |
62
+ | ta | Tamil |
63
+ | tr | Turkish |
64
+ | uk | Ukrainian |
65
+ | ur | Urdu |
66
+ | vi | Vietnamese |
67
+ | ja | Japanese |
68
+ | zh | Chinese |
69
+
70
  # License
71
  This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
72
 
 
127
  ## Evaluation
128
  - Metric: PR-AUC
129
 
130
+ PR-AUC by language:
131
+ <img src="https://huggingface.co/nvidia/multilingual-domain-classifier/resolve/main/pr_auc_by_language.PNG" alt="pr_auc_by_language" style="width:750px;">
132
+
133
  # Inference
134
  - Engine: PyTorch
135
  - Test Hardware: V100