Mike0307
/

multilingual-e5-language-detection

@@ -57,8 +57,8 @@ tags:
 ### Overview
-This model supports the detection of **45** languages, and it's fine-tuned using **multilingual-e5-base** model on the **common-language** dataset.
 ### Download the model
 ```python
@@ -82,6 +82,7 @@ languages = [
 ]
 def predict(text, model, tokenizer, device = torch.device('cpu')):
     model.eval()
     tokenized = tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors="pt")
     input_ids = tokenized['input_ids']
@@ -110,3 +111,61 @@ print(topk_prob, topk_labels)
 # ['Chinese_Taiwan', 'Chinese_Hongkong', 'Chinese_China']
 ```

 ### Overview
+This model supports the detection of **45** languages, and it's fine-tuned using **multilingual-e5-base** model on the **common-language** dataset.<br>
+The overall accuracy is **98.37%**, and more evaluation results are shown the below.
 ### Download the model
 ```python
 ]
 def predict(text, model, tokenizer, device = torch.device('cpu')):
+    model.to(device)
     model.eval()
     tokenized = tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors="pt")
     input_ids = tokenized['input_ids']
 # ['Chinese_Taiwan', 'Chinese_Hongkong', 'Chinese_China']
 ```
+### Evaluation Results
+The test datasets refers to the **common_language** test datasets.
+|language | precision | recall | f1-score | support |
+| --- | --- | ---| --- | --- |
+|Arabic|1.00|1.00|1.00|151|
+|           Basque   |    0.99   |   1.00   |  1.00   |     111|
+|           Breton   |    1.00   |   0.90   |   0.95  |     252|
+|          Catalan   |    0.96   |   0.99   |   0.97  |      96|
+|    Chinese_China   |    0.98   |   1.00   |   0.99  |     100|
+| Chinese_Hongkong   |    0.97   |   0.87   |   0.92  |     115|
+|   Chinese_Taiwan   |    0.92   |   0.98   |   0.95  |     170|
+|          Chuvash   |    0.98   |   1.00   |   0.99  |     137|
+|            Czech   |    0.98   |   1.00   |   0.99  |     128|
+|          Dhivehi   |    1.00   |   1.00   |   1.00  |     111|
+|            Dutch   |    0.99   |   1.00   |   0.99  |     144|
+|          English   |    0.96   |   1.00   |   0.98  |      98|
+|        Esperanto   |    0.98   |   0.98   |   0.98  |     107|
+|         Estonian   |    1.00   |   0.99   |   0.99  |      93|
+|           French   |    0.95   |   1.00   |   0.98  |     106|
+|          Frisian   |    1.00   |   0.98   |   0.99  |     117|
+|         Georgian   |    1.00   |   1.00   |   1.00  |     110|
+|           German   |    1.00   |   1.00   |   1.00  |     101|
+|            Greek   |    1.00   |   1.00   |   1.00  |     153|
+|       Hakha_Chin   |    0.99   |   1.00   |   0.99  |     202|
+|       Indonesian   |    0.99   |   0.99   |   0.99  |     150|
+|      Interlingua   |    0.96   |   0.97   |   0.96  |     182|
+|          Italian   |    0.99   |   0.94   |   0.96  |     100|
+|         Japanese   |    1.00   |   1.00   |   1.00  |     144|
+|           Kabyle   |    1.00   |   0.96   |   0.98  |     156|
+|      Kinyarwanda   |    0.97   |   1.00   |   0.99  |     103|
+|           Kyrgyz   |    0.98   |   1.00   |   0.99  |     129|
+|          Latvian   |    0.98   |   0.98   |   0.98  |     171|
+|          Maltese   |    0.99   |   0.98   |   0.98  |     152|
+|        Mongolian   |    1.00   |   1.00   |   1.00  |     112|
+|          Persian   |    1.00   |   1.00   |   1.00  |     123|
+|           Polish   |    0.91   |   0.99   |   0.95  |     128|
+|       Portuguese   |    0.94   |   0.99   |   0.96  |     124|
+|         Romanian   |    1.00   |   1.00   |   1.00  |     152|
+|Romansh_Sursilvan   |    0.99   |   0.95   |   0.97  |     106|
+|          Russian   |    0.99   |   0.99   |   0.99  |     100|
+|            Sakha   |    0.99   |   1.00   |   1.00  |     105|
+|        Slovenian   |    0.99   |   1.00   |   1.00  |     166|
+|          Spanish   |    0.96   |   0.95   |   0.95  |      94|
+|          Swedish   |    0.99   |   1.00   |   0.99  |     190|
+|            Tamil   |    1.00   |   1.00   |   1.00  |     135|
+|            Tatar   |    1.00   |   0.96   |   0.98  |     173|
+|          Turkish   |    1.00   |   1.00   |   1.00  |     137|
+|         Ukranian   |    0.99   |   1.00   |   1.00  |     126|
+|            Welsh   |    0.98   |   1.00   |   0.99  |     103|
+||
+|        *macro avg*   |    0.98 |     0.99 |     0.98   |   5963|
+|     *weighted avg*   |    0.98 |     0.98 |     0.98   |   5963|
+||
+|  *overall accuracy*   |         |          |    0.9837    |   5963|