Mike0307 commited on
Commit
15d3e5e
1 Parent(s): f73e2e7

Update README.md

Browse files

evaluation results

Files changed (1) hide show
  1. README.md +61 -2
README.md CHANGED
@@ -57,8 +57,8 @@ tags:
57
 
58
 
59
  ### Overview
60
- This model supports the detection of **45** languages, and it's fine-tuned using **multilingual-e5-base** model on the **common-language** dataset.
61
-
62
 
63
  ### Download the model
64
  ```python
@@ -82,6 +82,7 @@ languages = [
82
  ]
83
 
84
  def predict(text, model, tokenizer, device = torch.device('cpu')):
 
85
  model.eval()
86
  tokenized = tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors="pt")
87
  input_ids = tokenized['input_ids']
@@ -110,3 +111,61 @@ print(topk_prob, topk_labels)
110
  # ['Chinese_Taiwan', 'Chinese_Hongkong', 'Chinese_China']
111
  ```
112
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
 
59
  ### Overview
60
+ This model supports the detection of **45** languages, and it's fine-tuned using **multilingual-e5-base** model on the **common-language** dataset.<br>
61
+ The overall accuracy is **98.37%**, and more evaluation results are shown the below.
62
 
63
  ### Download the model
64
  ```python
 
82
  ]
83
 
84
  def predict(text, model, tokenizer, device = torch.device('cpu')):
85
+ model.to(device)
86
  model.eval()
87
  tokenized = tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors="pt")
88
  input_ids = tokenized['input_ids']
 
111
  # ['Chinese_Taiwan', 'Chinese_Hongkong', 'Chinese_China']
112
  ```
113
 
114
+ ### Evaluation Results
115
+ The test datasets refers to the **common_language** test datasets.
116
+
117
+ |language | precision | recall | f1-score | support |
118
+ | --- | --- | ---| --- | --- |
119
+ |Arabic|1.00|1.00|1.00|151|
120
+ | Basque | 0.99 | 1.00 | 1.00 | 111|
121
+ | Breton | 1.00 | 0.90 | 0.95 | 252|
122
+ | Catalan | 0.96 | 0.99 | 0.97 | 96|
123
+ | Chinese_China | 0.98 | 1.00 | 0.99 | 100|
124
+ | Chinese_Hongkong | 0.97 | 0.87 | 0.92 | 115|
125
+ | Chinese_Taiwan | 0.92 | 0.98 | 0.95 | 170|
126
+ | Chuvash | 0.98 | 1.00 | 0.99 | 137|
127
+ | Czech | 0.98 | 1.00 | 0.99 | 128|
128
+ | Dhivehi | 1.00 | 1.00 | 1.00 | 111|
129
+ | Dutch | 0.99 | 1.00 | 0.99 | 144|
130
+ | English | 0.96 | 1.00 | 0.98 | 98|
131
+ | Esperanto | 0.98 | 0.98 | 0.98 | 107|
132
+ | Estonian | 1.00 | 0.99 | 0.99 | 93|
133
+ | French | 0.95 | 1.00 | 0.98 | 106|
134
+ | Frisian | 1.00 | 0.98 | 0.99 | 117|
135
+ | Georgian | 1.00 | 1.00 | 1.00 | 110|
136
+ | German | 1.00 | 1.00 | 1.00 | 101|
137
+ | Greek | 1.00 | 1.00 | 1.00 | 153|
138
+ | Hakha_Chin | 0.99 | 1.00 | 0.99 | 202|
139
+ | Indonesian | 0.99 | 0.99 | 0.99 | 150|
140
+ | Interlingua | 0.96 | 0.97 | 0.96 | 182|
141
+ | Italian | 0.99 | 0.94 | 0.96 | 100|
142
+ | Japanese | 1.00 | 1.00 | 1.00 | 144|
143
+ | Kabyle | 1.00 | 0.96 | 0.98 | 156|
144
+ | Kinyarwanda | 0.97 | 1.00 | 0.99 | 103|
145
+ | Kyrgyz | 0.98 | 1.00 | 0.99 | 129|
146
+ | Latvian | 0.98 | 0.98 | 0.98 | 171|
147
+ | Maltese | 0.99 | 0.98 | 0.98 | 152|
148
+ | Mongolian | 1.00 | 1.00 | 1.00 | 112|
149
+ | Persian | 1.00 | 1.00 | 1.00 | 123|
150
+ | Polish | 0.91 | 0.99 | 0.95 | 128|
151
+ | Portuguese | 0.94 | 0.99 | 0.96 | 124|
152
+ | Romanian | 1.00 | 1.00 | 1.00 | 152|
153
+ |Romansh_Sursilvan | 0.99 | 0.95 | 0.97 | 106|
154
+ | Russian | 0.99 | 0.99 | 0.99 | 100|
155
+ | Sakha | 0.99 | 1.00 | 1.00 | 105|
156
+ | Slovenian | 0.99 | 1.00 | 1.00 | 166|
157
+ | Spanish | 0.96 | 0.95 | 0.95 | 94|
158
+ | Swedish | 0.99 | 1.00 | 0.99 | 190|
159
+ | Tamil | 1.00 | 1.00 | 1.00 | 135|
160
+ | Tatar | 1.00 | 0.96 | 0.98 | 173|
161
+ | Turkish | 1.00 | 1.00 | 1.00 | 137|
162
+ | Ukranian | 0.99 | 1.00 | 1.00 | 126|
163
+ | Welsh | 0.98 | 1.00 | 0.99 | 103|
164
+ ||
165
+ | *macro avg* | 0.98 | 0.99 | 0.98 | 5963|
166
+ | *weighted avg* | 0.98 | 0.98 | 0.98 | 5963|
167
+ ||
168
+ | *overall accuracy* | | | 0.9837 | 5963|
169
+
170
+
171
+