sarahyurick
commited on
Commit
•
752ebca
1
Parent(s):
0d45ab1
Update README.md
Browse files
README.md
CHANGED
@@ -8,64 +8,65 @@ license: other
|
|
8 |
# Model Overview
|
9 |
This is a multilingual text classification model that can enable data annotation, creation of domain-specific blends and the addition of metadata tags. The model classifies documents into one of 26 domain classes:
|
10 |
|
11 |
-
'Adult', 'Arts_and_Entertainment', 'Autos_and_Vehicles', 'Beauty_and_Fitness', 'Books_and_Literature', 'Business_and_Industrial', 'Computers_and_Electronics', 'Finance', 'Food_and_Drink', 'Games', 'Health', 'Hobbies_and_Leisure', 'Home_and_Garden', 'Internet_and_Telecom', 'Jobs_and_Education', 'Law_and_Government', 'News', 'Online_Communities', 'People_and_Society', 'Pets_and_Animals', 'Real_Estate', 'Science', 'Sensitive_Subjects', 'Shopping', 'Sports', 'Travel_and_Transportation'
|
12 |
-
|
13 |
-
It supports 52 languages (English and 51 other languages) : 'ar', 'az', 'bg', 'bn', 'ca', 'cs', 'da', 'de', 'el', 'es', 'et', 'fa', 'fi', 'fr', 'gl', 'he', 'hi', 'hr', 'hu', 'hy', 'id', 'is', 'it', 'ka', 'kk', 'kn', 'ko', 'lt', 'lv', 'mk', 'ml', 'mr', 'ne', 'nl', 'no', 'pl', 'pt', 'ro', 'ru', 'sk', 'sl', 'sq', 'sr', 'sv', 'ta', 'tr', 'uk', 'ur', 'vi', 'ja', 'zh'
|
14 |
```
|
15 |
-
|
16 |
-
ar Arabic
|
17 |
-
az Azerbaijani
|
18 |
-
bg Bulgarian
|
19 |
-
bn Bengali
|
20 |
-
ca Catalan
|
21 |
-
cs Czech
|
22 |
-
da Danish
|
23 |
-
de German
|
24 |
-
el Greek
|
25 |
-
es Spanish
|
26 |
-
et Estonian
|
27 |
-
fa Persian
|
28 |
-
fi Finnish
|
29 |
-
fr French
|
30 |
-
gl Galician
|
31 |
-
he Hebrew
|
32 |
-
hi Hindi
|
33 |
-
hr Croatian
|
34 |
-
hu Hungarian
|
35 |
-
hy Armenian
|
36 |
-
id Indonesian
|
37 |
-
is Icelandic
|
38 |
-
it Italian
|
39 |
-
ka Georgian
|
40 |
-
kk Kazakh
|
41 |
-
kn Kannada
|
42 |
-
ko Korean
|
43 |
-
lt Lithuanian
|
44 |
-
lv Latvian
|
45 |
-
mk Macedonian
|
46 |
-
ml Malayalam
|
47 |
-
mr Marathi
|
48 |
-
ne Nepali
|
49 |
-
nl Dutch
|
50 |
-
no Norwegian
|
51 |
-
pl Polish
|
52 |
-
pt Portuguese
|
53 |
-
ro Romanian
|
54 |
-
ru Russian
|
55 |
-
sk Slovak
|
56 |
-
sl Slovenian
|
57 |
-
sq Albanian
|
58 |
-
sr Serbian
|
59 |
-
sv Swedish
|
60 |
-
ta Tamil
|
61 |
-
tr Turkish
|
62 |
-
uk Ukrainian
|
63 |
-
ur Urdu
|
64 |
-
vi Vietnamese
|
65 |
-
ja Japanese
|
66 |
-
zh Chinese
|
67 |
```
|
68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
# License
|
70 |
This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
|
71 |
|
@@ -126,6 +127,9 @@ Arts_and_Entertainment
|
|
126 |
## Evaluation
|
127 |
- Metric: PR-AUC
|
128 |
|
|
|
|
|
|
|
129 |
# Inference
|
130 |
- Engine: PyTorch
|
131 |
- Test Hardware: V100
|
|
|
8 |
# Model Overview
|
9 |
This is a multilingual text classification model that can enable data annotation, creation of domain-specific blends and the addition of metadata tags. The model classifies documents into one of 26 domain classes:
|
10 |
|
|
|
|
|
|
|
11 |
```
|
12 |
+
'Adult', 'Arts_and_Entertainment', 'Autos_and_Vehicles', 'Beauty_and_Fitness', 'Books_and_Literature', 'Business_and_Industrial', 'Computers_and_Electronics', 'Finance', 'Food_and_Drink', 'Games', 'Health', 'Hobbies_and_Leisure', 'Home_and_Garden', 'Internet_and_Telecom', 'Jobs_and_Education', 'Law_and_Government', 'News', 'Online_Communities', 'People_and_Society', 'Pets_and_Animals', 'Real_Estate', 'Science', 'Sensitive_Subjects', 'Shopping', 'Sports', 'Travel_and_Transportation'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
```
|
14 |
|
15 |
+
It supports 52 languages (English and 51 other languages):
|
16 |
+
| Code | Language Name |
|
17 |
+
|------|----------------|
|
18 |
+
| ar | Arabic |
|
19 |
+
| az | Azerbaijani |
|
20 |
+
| bg | Bulgarian |
|
21 |
+
| bn | Bengali |
|
22 |
+
| ca | Catalan |
|
23 |
+
| cs | Czech |
|
24 |
+
| da | Danish |
|
25 |
+
| de | German |
|
26 |
+
| el | Greek |
|
27 |
+
| es | Spanish |
|
28 |
+
| et | Estonian |
|
29 |
+
| fa | Persian |
|
30 |
+
| fi | Finnish |
|
31 |
+
| fr | French |
|
32 |
+
| gl | Galician |
|
33 |
+
| he | Hebrew |
|
34 |
+
| hi | Hindi |
|
35 |
+
| hr | Croatian |
|
36 |
+
| hu | Hungarian |
|
37 |
+
| hy | Armenian |
|
38 |
+
| id | Indonesian |
|
39 |
+
| is | Icelandic |
|
40 |
+
| it | Italian |
|
41 |
+
| ka | Georgian |
|
42 |
+
| kk | Kazakh |
|
43 |
+
| kn | Kannada |
|
44 |
+
| ko | Korean |
|
45 |
+
| lt | Lithuanian |
|
46 |
+
| lv | Latvian |
|
47 |
+
| mk | Macedonian |
|
48 |
+
| ml | Malayalam |
|
49 |
+
| mr | Marathi |
|
50 |
+
| ne | Nepali |
|
51 |
+
| nl | Dutch |
|
52 |
+
| no | Norwegian |
|
53 |
+
| pl | Polish |
|
54 |
+
| pt | Portuguese |
|
55 |
+
| ro | Romanian |
|
56 |
+
| ru | Russian |
|
57 |
+
| sk | Slovak |
|
58 |
+
| sl | Slovenian |
|
59 |
+
| sq | Albanian |
|
60 |
+
| sr | Serbian |
|
61 |
+
| sv | Swedish |
|
62 |
+
| ta | Tamil |
|
63 |
+
| tr | Turkish |
|
64 |
+
| uk | Ukrainian |
|
65 |
+
| ur | Urdu |
|
66 |
+
| vi | Vietnamese |
|
67 |
+
| ja | Japanese |
|
68 |
+
| zh | Chinese |
|
69 |
+
|
70 |
# License
|
71 |
This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
|
72 |
|
|
|
127 |
## Evaluation
|
128 |
- Metric: PR-AUC
|
129 |
|
130 |
+
PR-AUC by language:
|
131 |
+
<img src="https://huggingface.co/nvidia/multilingual-domain-classifier/resolve/main/pr_auc_by_language.PNG" alt="pr_auc_by_language" style="width:750px;">
|
132 |
+
|
133 |
# Inference
|
134 |
- Engine: PyTorch
|
135 |
- Test Hardware: V100
|