Spaces:
Sleeping
Sleeping
zaidmehdi
commited on
Commit
•
361156c
1
Parent(s):
247e98e
Update README.md
Browse files
README.md
CHANGED
@@ -34,7 +34,7 @@ The response should be a json of the form:
|
|
34 |
|
35 |
## How I built this project:
|
36 |
The data used to train the classifier comes from the NADI 2021 dataset for Arabic Dialect Identification [(Abdul-Mageed et al., 2021)](#cite-mageed-2021).
|
37 |
-
It is a corpus of tweets collected using Twitter's API and labeled thanks to the users
|
38 |
|
39 |
I used the language model `https://huggingface.co/moussaKam/AraBART` to extract features from the input text by taking the output of its last hidden layer. I used these vector embeddings as the input for a Multinomial Logistic Regression to classify the input text into one of the 21 dialects (Countries).
|
40 |
|
|
|
34 |
|
35 |
## How I built this project:
|
36 |
The data used to train the classifier comes from the NADI 2021 dataset for Arabic Dialect Identification [(Abdul-Mageed et al., 2021)](#cite-mageed-2021).
|
37 |
+
It is a corpus of tweets collected using Twitter's API and labeled thanks to the users' locations with the country and region.
|
38 |
|
39 |
I used the language model `https://huggingface.co/moussaKam/AraBART` to extract features from the input text by taking the output of its last hidden layer. I used these vector embeddings as the input for a Multinomial Logistic Regression to classify the input text into one of the 21 dialects (Countries).
|
40 |
|